How to make scale on GPU?

M

MusicMan_082018-10-01 11:58:40

GPGPU

MusicMan_08, 2018-10-01 11:58:40

I assemble the mosaic using FFMPEG:

ffmpeg -hide_banner -loglevel warning \
-hwaccel cuvid -f libndi_newtek -thread_queue_size 2048 -i 'SERVER (1)' \
-hwaccel cuvid -f libndi_newtek -thread_queue_size 2048 -i 'SERVER (2)' \
-hwaccel cuvid -f libndi_newtek -thread_queue_size 2048 -ss 00:00:14 -i 'SERVER (3)' \
-hwaccel cuvid -f libndi_newtek -thread_queue_size 2048 -ss 00:00:12 -i 'STREAM (1)' \
-hwaccel cuvid -f libndi_newtek -thread_queue_size 2048 -ss 00:00:08 -i 'STREAM (2)' \
-hwaccel cuvid -f libndi_newtek -thread_queue_size 2048 -ss 00:00:04 -i 'TEST-SERVER (Desktop)' \
-filter_complex "
        nullsrc=size=1920x1080 [base];
        [0:v] scale=640x540 [upperleft];
        [1:v] scale=640x540 [uppercenter];
        [2:v] scale=640x540 [upperright];
        [3:v] scale=640x540 [lowerleft];
        [4:v] scale=640x540 [lowercenter];
        [5:v] scale=640x540 [lowerright];
        [base][upperleft] overlay=shortest=1 [tmp1];
        [tmp1][uppercenter] overlay=shortest=1:x=640 [tmp2];
        [tmp2][upperright] overlay=shortest=1:x=1280 [tmp3];
        [tmp3][lowerleft] overlay=shortest=1:y=540 [tmp4];
        [tmp4][lowercenter] overlay=shortest=1:x=640:y=540 [tmp5];
        [tmp5][lowerright] overlay=shortest=1:x=1280:y=540 \
" -c:v h264_nvenc -pix_fmt yuv420p -preset llhp -an -f nut - | ffmpeg -hide_banner -loglevel warning -i - -f decklink -pix_fmt uyvy422 -rtbufsize 1500M -an -r 25000/1000 'DeckLink Quad (3)' &

Even though I try to use nvenc to encode the stream, I still get decent CPU usage. At the same time, NVIDIA's load is only about 40%. I want to optimize the script so that everything works to the maximum on NVIDIA. Therefore, I would like to ask how to scale in this script on NVIDIA (ffmpeg is built with the --enable-libnpp option)? Apply "iron" scale (stupidly I write instead of scale - scale_npp) does not work, the script crashes with errors. Tell me, is it possible to implement what I want?
And the second question:
in this part of the script:

-hwaccel cuvid -f libndi_newtek -thread_queue_size 2048 -ss 00:00:14 -i 'SERVER (3)' \
-hwaccel cuvid -f libndi_newtek -thread_queue_size 2048 -ss 00:00:12 -i 'STREAM (1)' \
-hwaccel cuvid -f libndi_newtek -thread_queue_size 2048 -ss 00:00:08 -i 'STREAM (2)' \
-hwaccel cuvid -f libndi_newtek -thread_queue_size 2048 -ss 00:00:04 -i 'TEST-SERVER (Desktop)' \

I use the -ss {time} option, because while the streams are being loaded, there is a noticeable delay, and proportionally. The last input has the minimum delay, the penultimate one has significantly more delay, and so on, while the first one has the maximum delay. The -ss {time} option partially solves this problem, but it is unstable. The stream is loaded a little longer and the delay increases. Is it possible to somehow tell ffmpeg to take the data not from the cache, but somehow live?