Jump to content

Need/Want help to make GPU transocoding useful


Carlo

Recommended Posts

Hey guys,

 

I jumped into the GPU transcoding project here: http://mediabrowser.tv/community/index.php?/topic/10723-gpu-transcoding-intel-quicksync-and-nvidia-nvenc/

 

The idea behind this thread is to be able to use ANY GPU for transcocing in order to save CPU resources.  At present we can test Intel QuickSync and nVideo GPUs.  AMD/AMI will come in time in time.

 

The quick overview is that I've written a windows console app that will emulate ffmpeg as an intercept or proxy program.  It will in turn hand off commands to ffmpeg.  It has the ability to change command line arguments on the fly.  So we can essentially take the standard command being sent to ffmpeg and change anything we want to use Intel QuickSync or nVideo GPUs.

 

In theory we should be able to reduce CPU load and off-load some of this processing to a GPU on your system.

 

I have the "intercept" program working and it can be inserted in any public, beta or dev version of the software.  However I REQUIRE a few people (as many as possible) to help me with command lines being sent to ffmpeg (with GPU support).

 

We can do all testing at the command line for now "outside" of MB3 so there is change needed to your system!!!

 

If you have an nVideo Kepler or Maxwell GPU or Intel GPU and willing to help please let me know!

 

Carlo

 

 

Link to comment
Share on other sites

spootdev

Hey Carlo, have an i7 with h4600 or sumthing like that in a xubuntu box I can help test with.  Or swap out the disk and vm it for some "interesting" esxi testing.  ;)

Link to comment
Share on other sites

swhitmore

Will this make it's way into the main release? Interested to see if this makes a difference with Sync converting.

Link to comment
Share on other sites

The first step is to find any type of GPU enabled program that can benefit us and uses a command line.  I played around on one of my i5 notebooks with Integrated Intel video (QS Enabled).

 

First I found a "reference" video to us.  I choose Revenge of the Fallen (no particular reason) and played it via MB3 forcing it to transcode.  Then went into the transcode log file and grabbed the command line.  For me it was:

 

ffmpeg.exe -fflags +genpts -i file:"C:\Transformers Revenge of the Fallen (2009)\Transformers Revenge of the Fallen (2009).mp4" -map 0:0 -map 0:1 -map -0:s -codec:v:0 libx264 -force_key_frames expr:gte(t,n_forced*5) -vf "scale=min(iw\,720):trunc(ow/dar/2)*2" -pix_fmt yuv420p -preset superfast -subq 0 -crf 23 -maxrate 808001 -bufsize 1616002 -vsync vfr -profile:v high -level 41 -map_metadata -1 -threads 0 -codec:a:0 copy -f mp4 -movflags frag_keyframe+empty_moov -y "C:\Users\Carlo\AppData\Roaming\ffmpeg.exe -t 300 -fflags +genpts -i file:"C:\Transformers Revenge of the Fallen (2009)\Transformers Revenge of the Fallen (2009).mp4" -map 0:0 -map 0:1 -map -0:s -codec:v:0 libx264 -force_key_frames expr:gte(t,n_forced*5) -vf "scale=min(iw\,720):trunc(ow/dar/2)*2" -pix_fmt yuv420p -preset placebo -subq 0 -crf 23 -maxrate 808001 -bufsize 1616002 -vsync vfr -profile:v high -level 41 -map_metadata -1 -threads 0 -codec:a:0 copy -f mp4 -movflags frag_keyframe+empty_moov -y "C:\Users\Carlo\AppData\Roaming\MediaBrowser-Server\transcoding-temp\filename-here.mp4"

 
Now what I did was change the "filename-here.mp4" at the end of the command string to a file name that matches the presets I'll mention in a moment.
I also added -t 300 to the beginning of the command line so I could test a 5 minute encode.  The "-t" is time in seconds and 300 seconds is 5 minutes.
 
Here's a sample using placebo preset:
 
ffmpeg.exe -t 300 -fflags +genpts -i file:"C:\Transformers Revenge of the Fallen (2009)\Transformers Revenge of the Fallen (2009).mp4" -map 0:0 -map 0:1 -map -0:s -codec:v:0 libx264 -force_key_frames expr:gte(t,n_forced*5) -vf "scale=min(iw\,720):trunc(ow/dar/2)*2" -pix_fmt yuv420p -preset placebo -subq 0 -crf 23 -maxrate 808001 -bufsize 1616002 -vsync vfr -profile:v high -level 41 -map_metadata -1 -threads 0 -codec:a:0 copy -f mp4 -movflags frag_keyframe+empty_moov -y "C:\Users\Carlo\AppData\Roaming\MediaBrowser-Server\transcoding-temp\1-placebo-5.mp4"
 
The number in front of file name was incremented for each preset so I could sort them easily in the "best" to "worst" ranking based on preset and be able to view them easier.
I ran the following presets and recorded the frames per second each preset ran on this particular machine.
placebo fps=32
veryslow fps=45
slower fps=88
slow fps=107
medium fps=169
fast fps=180
faster fps=187
veryfast fps=206
superfast fps=219
ultrafast fps=270
 

So for this particular MB3 encode which used superfast as the default I need to beet 219 fps via hardware encoding and/or lessen the CPU use of the machine to support more total encodes.

 

Using the version of the ffmpeg encoded in the other thread and listed here: https://github.com/MediaBrowser/MediaBrowser/wiki/GPU-Tanscoding I could not do better.

 

So with the reference stuff out of the way.  Can you guys try and do something similar to get a reference for your own machine.

Then try any command line GPU enabled programs you can find that will run faster and/or lessen the CPU use.

 

Here are a few programs to try:

https://github.com/FFmpeg/FFmpeg/blob/master/README.md

www.tetrachromesoftware.com

www.mediacoderhq.com/cli/  (grab the trial version)  see if you can figure out which internal program is being called in the codec folder.

gstreamer.freedesktop.org

and of course any other GPU enabled encoders you can find with a command line that would could somehow use

 

Thanks,

Carlo

Edited by cayars
Link to comment
Share on other sites

CBers

I only play transcoded files to my chromecast so I get the following in the transcoded log file:

ffmpeg.exe -fflags +genpts -i file:"\\Server\TV\Marvel's Agent Carter\Season 1\Marvel's Agent Carter - 1x06 - A Sin to Err.mkv" -map_metadata -1 -threads 0 -map 0:0 -map 0:1 -map -0:s -codec:v:0 libx264 -pix_fmt yuv420p -preset superfast -crf 18 -maxrate 3027772 -bufsize 6055544 -vsync vfr -profile:v high -level 41 -force_key_frames expr:gte(t,n_forced*6) -vf "scale=trunc(min(iw\,1920)/2)*2:trunc(min((iw/dar)\,1080)/2)*2" -copyts -flags -global_header -codec:a:0 aac -strict experimental -ac 6 -ab 320000 -af "adelay=1,aresample=async=1" -hls_time 6 -start_number 0 -hls_list_size 0 -y "C:\Users\Media\AppData\Roaming\MediaBrowser-Server\transcoding-temp\17013990f0fd07d0118aca99252521f3.m3u8"

This does not include all of the flags that you mention above.

 

Thoughts ??

 

.

Edited by CBers
Link to comment
Share on other sites

It will be different for each device and/or movie depending on the source and destination.  

 

The two important things are to get an MP4 file generated since this is what the GPU encoding will primarily work for and secondly to test these other programs to see if you can get them to run faster than the native software only encoding.

 

Carlo

 

PS just use Internet Explorer or similar to get it to encode.  No need to use any "special" devices. :)

Link to comment
Share on other sites

If I got this right (likely not), here is the output from using qstranscode,

Common transcoding time is  109.52 sec
MFX session 0 transcoding PASSED:
Processing time: 109.52 sec
Number of processed frames: 111165
 
So ~ 1000 fps? Unfortunately it doesn't seem to output a detailed / normal log file, so can't see fps directly.
Link to comment
Share on other sites

dark_slayer

I'll add some results probably tomorrow

 

Moving my gtx 660 to the server soon (been saying that for a while though). I'll then have an unused i7-3770k iGPU. Also I only need NVENC for limelight. If I were able to offload a transcode to NVENC and QS, I wouldn't need it to be faster than 200fps. I'd really only need it to be about 40-60 (though in at least some short tests with h264_qsv it can do 300 for me.

 

I know this type of complexity in your intercept may either take a long time or never happen but I hope it isn't discounted on the premise that it needs to beat software-superfast. While that would be true for a server that only transcodes mine does much more and more to come. I'd love to help but I know about as much about coding as a cat does about the Apollo moon landing (maybe less)

Link to comment
Share on other sites

Hi,

 

Just a couple thoughts / questions here - I admit, partially making sure I'm on the same page ... by all means yell at me if I'm not!

 

It seems like the purpose here is to get QuickSync and NVENC working in ffmpeg - so outside of MB, then once it's all working set up encoding.xml so that ffmpeg inside MB works as well ... correct? This makes sense to me, just making sure I'm getting the purpose (i.e. get ffmpeg working with QS and NVENC standalone).

 

FYI, to the post I had above about QSTranscode rates - I re-ran it, and it does seem right. The file is 1:01:49 in length, so 111165 frames is about right (29.97 fps). So processing in ~ 110 sec really is ~ 1000 fps. Feel free to disagree, but this seems awesome to me. BTW, this is on an Intel HD Graphics 4000. Given this result, does it make sense to see if we can get the ffmpeg that comes with QSTranscode working (perhaps with a bit of help / pointers from Babgvant?)? I can't seem to get that ffmpeg working, but could be me ... but it seems to show promise, agreed?

 

On the NVENC front, has anyone had any luck finding an (ffmpeg based?) transcoder? I haven't so far - I have downloaded the NVENC SDK though, but the samples are failing to run right (they to build in Visual Studio). Very likely me though.

 

And one last thought - for testing, should we have a few video files that we share, so folks all use the same files? I could offer up some DropBox space, if that helps.

 

Thanks!!!

Link to comment
Share on other sites

dark_slayer

And one last thought - for testing, should we have a few video files that we share, so folks all use the same files? I could offer up some DropBox space, if that helps.

I think for us to do this casual test properly we should certainly share a few 5 minute clips. I should be able to host a VC1 clip in my dropbox as well

 

On the NVENC front, I thought mjb2000's ffmpeg pulled in support for that. I never tested it and my gaming box's SSD is in a "lost" state after a sudden power loss :(  Until I RMA that or move my gpu over to the server I won't be able to try nvenc, kicking myself for not trying it sooner

Link to comment
Share on other sites

Crap! Sorry to hear about your GPU ... :(

 

Well, it seems to work - please don't take that the wrong way. I only say "seems" because of the frame rate. I admit, I'm not sure what to expect from NVENC vs. QuickSync, but if things are all working as expected QuickSync is ~5x (or more) faster? Hard to believe that, though it could be true. I'm trying to build the NVENC samples (from the SDK), to see if I can get a feel for this.

 

Make sense? By all means yell if I'm out to lunch - don't need to spend a bunch of time getting this to build if it doesn't make sense.

Link to comment
Share on other sites

FYI, check this out - and see Table 5.

http://developer.download.nvidia.com/compute/nvenc/v5.0_beta/NVENC_DA-06209-001_v06.pdf

 

I am running a 750 Ti (first generation Maxwell), so at CBR (as MB uses, right?) I expect to see something like 800+ FPS. That would be awesome ... ;). Not sure if it's real or not, but it does say the much lower values I'm seeing may not be quite right.

Link to comment
Share on other sites

dark_slayer

FYI, check this out - and see Table 5.

http://developer.download.nvidia.com/compute/nvenc/v5.0_beta/NVENC_DA-06209-001_v06.pdf

 

I am running a 750 Ti (first generation Maxwell), so at CBR (as MB uses, right?) I expect to see something like 800+ FPS. That would be awesome ... ;). Not sure if it's real or not, but it does say the much lower values I'm seeing may not be quite right.

No I hadn't tested it at all. I didn't see what fps you were getting, but it should be up there along with those specs in the pdf. It does great real time transcoding for the nvidia gamestream protocol. Perhaps it isnt fully implemented in that version of ffmpeg Edited by dark_slayer
Link to comment
Share on other sites

Completely agree with you - I get the feeling that a portion of the processing is being offloaded, but not fully there yet. That's OK, we'll work through it.

 

My frame rates tie to CPU load - which does also tend to say we're not offloading enough / fully. Let's see though. But my frame rates range from ~ 60 to 200 fps ... still quite low (as I can get almost this same range without NVENC).

 

Thanks!

Link to comment
Share on other sites

Good info guys, keep it coming!

 

I'm pulling down QSTranscode now to try that out.

 

The last couple of days I've been feeling lousy but made some progress in a different direction.  I got thinking about doing distributed encoding (using more than one computer) so I've been laying out the low-level stuff to do this and making progress.

 

Carlo

Link to comment
Share on other sites

Speaking of test videos I was going to mention this earlier but forgot about it.

 

Ideally, some type of public domain test video would be idea.  

 

 

Ideally, 1080p would be good for a source file.

 

Any suggestions?

Link to comment
Share on other sites

Hi,

 

I'm all for the test videos - I can put in my DropBox, or someone else by all means. Whatever folks want is good with me.

 

FYI, just a quick test here today, using ffmpeg from mjb2000 - and a few notes,

- testing all done with input and output video at 720x388 (just for a quick test, to start). Bit rate ~ 1.5 -> 1.0 Mbps.

- this is done on two different machines, so don't compare the CPU numbers directly (they just help to show how things are running)

- QuickSync is Intel HD Graphics 4600, NVENC is 750 Ti

- note that presets don't seem to work for either h264_qsv or libnvenc (or at least, no impact it seems)

 

 

Results:

- h264_qsv: ~ 630 fps, 60% GPU load, 60% CPU load (hangs when running ... :(). With QSTranscode it seems that GPU load is ~ 85%, and fps may be more like 900 ... which does align, and just says this can be optimized a bit yet?

- libnvenc: ~ 180 fps, 5% GPU load, 60% CPU load. This is much lower than advertised by NVidia, and perhaps just says there is more to be offloaded to the GPU?

 

By all means comment!

Link to comment
Share on other sites

This should be good.  It's 2:16 in length.

 

www.dvdloc8.com/clip.php?movieid=19420&clipid=1

 

Download and unzip to get:

 

Transformers - Revenge of the Fallen - Teaser.mp4

 

It has lots of good scenes in it to be able to check quality.

 

Let's make this our "reference file".

 

Carlo

Link to comment
Share on other sites

Results:

- h264_qsv: ~ 630 fps, 60% GPU load, 60% CPU load (hangs when running ... :(). With QSTranscode it seems that GPU load is ~ 85%, and fps may be more like 900 ... which does align, and just says this can be optimized a bit yet?

- libnvenc: ~ 180 fps, 5% GPU load, 60% CPU load. This is much lower than advertised by NVidia, and perhaps just says there is more to be offloaded to the GPU?

 

By all means comment!

 

Could you by chance try running these same tests against the Transformers trailer I just posted.  Also give us an overview of file sizes after the encode using CPU, QS and NVENC as well as how each looked after conversion/transcode?

 

I agree with you on the surface from what I've seen of the ffmpeg with QS/NVENC built in.  Seems like there is a LOT of room left for optimization.  We are on the "cutting edge" testing this stuff for real-time encoding so it's only natural for us to experience this type of thing.

 

I've got the feeling that we aren't going to be able to use ffmpeg for QS right off the bat and might need to use a different program to achieve the FPS we are going to need.  This might/could mean that the Intercept/proxy method won't work as a simple drop in replacement as MB3 reads information back from it constantly.  However, I don't think it would be to hard to add a "test flag" plus some additional code which I could write for MB3 that would allow the hand off of transcoding to be handled differently if the flag was set.  I could make this extremely simple for Luke to implement so it would function as normal or with the flag set run a different set of code that Luke would not need to worry about.  But let's cross this path when the time comes after we have a working solution worth pursuing.

 

For now we need to figure out what command line tools work best and give us the best FPS with the least CPU use. We of course need to verify the size of the end result and that our output files follow any bitrates we give it and that the quality is good enough for us to use.  Boy that's a lot. :)

 

We are off to a good start!

 

Carlo

Edited by cayars
Link to comment
Share on other sites

Agree with you 100%! This is new stuff, cutting edge as you say - I'm actually impressed by where we are at currently, just need to keep working at it. Not a bad thing at all.

 

Will test with the file tonight - but to that end ... what output resolution and bitrate (video and audio)?

 

Thanks!

Link to comment
Share on other sites

Good point.  I believe that trailer is in 1080p.

 

So lets put "transcoding" to work like it would in the real world.  See if you can transcode down to 720 with 3mbit overall bitrate.

 

That should "flex" the process a bit and give us something of a real world challenge.

 

Carlo

 

PS I picked those setting because on my Plex servers I see this as common setting being used for transcodes.  I think people with lower end Rokus or with limited bandwidth find that 720 resolution with 3mbit bitrate works and is as good as any to start with.  This should help us figure out what happens when both reducing bitrate and when reducing resolution which are both common when streaming.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...