Some months ago I started some ffmpeg experiments I would like to slowly note here in my blog.
It’s not the first time I’ve tinkering with ffmpeg. But this time I was trying to fix a problem I like call “the VOD bias”. So allow me to start from there.
There are lots of multimedia FLOSS out there, from libraries to complete tools suites, being ffmpeg by far -and by merit- the most recognized of them all. However, it’s pretty common that all those tools and libraries are often designed with files use cases in mind. Which is about common sense, and it usually works great for most people. But since I work in the Live Streaming field, that means trouble, and it actually bites me quite often.
In Live Streaming, your challenge is to generate a constant, non-interrupted multimedia stream: the output of your work must have no gaps, no stops, no discontinuities. There are many reasons for that, only visible when you’re already working in this field -like a usual absolute disregard of responsability from players when anything happens with the stream, too sensible hardware, too anxious multimedia consumers, synthetic metrics that exaggerate any glitch consequences and turn it into a business problem, the draconic 24/7/365 uptime regime, etc-. But very, very rarely such reasons are part of the rationale behind multimedia software. And there are of course many examples, but a paradigmatic one is the ffmpeg behaviour when its job it to produce a stream -NOT a file-.
You see, ffmpeg is able to create mostly anything multimedia related. You can, for example, feed ffmpeg with an input multimedia stream, and do stuff with it. You can also convert an input file into an output stream -looped or not-, and you can even create a stream from several multimedia sources ffmpeg allows you to use -like hardware inputs, or “source filters” that creates media on-the-fly-. It calls itself quite acuratelly “media converter”. But surprisingly for the people who doesn’t work in live streaming, ffmpeg always works with a file rationale -not “streams”-, even when it allows you to do so many things with streams.
The consequences of this are quite a few. For example, if you tell ffmpeg to create an output stream with x and y characteristics, let’s say using another stream as input, and you set parameters like “constant bitrate”, “constant framerate”, “x frames per seconds”, etc, it will gladly do all of that with excellent performance and allowing you to do lots and lots of stuff with your multimedia pipeline… while the input stream is coming. If your input streams has a gap, or whoever is transmiting that stream to your ffmpeg instance loses his/her network connection, or there’s some glitch somewhere in the network and your input pays the price, then ffmpeg will do no output: sometimes will just write garbage -input in, input out-, but often will just not write anything at all. It usually also just stops working, notifying you about some error. But it will not honor all those parameters stated before like “constant framerate” or “x frames per second”: it will only honor it while there’s constant input to work with. It DOES have the ability to forge video and audio frames, and COULD do it in such cases to honor those asked output regime parameters, but it will never do it. And that’s actually by design.
If you think about it, it’s of course -again- common sense: “Media converter”, remember? Not “stream generator”, not “live streamer”, even when it can be used for those tasks. “why would I output something if I have no input?”. But even when common sense may win you lots of battles, it’s actually always contextual, and in the context of Live Streaming the output regime is at least as important as the input one.
This is not a writting against ffmpeg, of course. We’re all blessed by projects like that. I’m using ffmpeg as a representation of a bigger problem: most multimedia tools are designed with similar rationales. Their canonical inputs and outputs are files, “streaming” today also means “a video available online, in a format you can consume it with any web player in any device” -think about podcasts or youtube videos-, and the common name such thing has today is “Video On Demand” or “VOD”. Live Streaming has different problems than VOD, and most FLOSS multimedia tools are biased towards VOD use cases. That’s my point, and my experience. The consequence is that professional Live Streaming tools become all privative, very expensive, and usually even done as hardware appliances instead of software.
So, with that in mind, a few months ago I actually made some experiments trying to modify ffmpeg so it can solve more Live Streaming use cases. And in particular, I started with the continuity problem: I wanted that my ffmpeg output stream kept working even when its input stream could have gaps or disconnections.
So I tried some hacks here and there, but ffmpeg code is actually quite complex -because multimedia inner workings often are, and ffmpeg is so powerful that it also comes with a complexity price-, so I had to lower my expectations: since I was already familiar with filters creation, I made another filter called “clockfiller”, intended to fill the output with cloned frames while there’s no input video. This is what I would expect from the “fps” filter -and, I believe, what it actually does in some cases-, but sadly the VOD bias makes it stop while there’s no input. So my filter would have it’s own clock, and with its clock ticks would be able to detect “input down” situation, so it could start filling the output with copies of the last frame at a regular pace. The consequence would be: when the input is down, the output is a video with a static image, and that continues that way until the input comes back.
So my first step was to create an unstable input. I did it with a simple oneliner bash loop and ffmpeg itself:
while true ; do \
timeout 16 ffmpeg -hide_banner -y \
-f lavfi -re -i 'testsrc=s=640x360:r=30,format=yuv420p,fps=fps=30' \
-f lavfi -re -i "anullsrc" \
-c:v h264 -g 10 -strict_gop 1 -b:v 8M -r:v 30 \
-c:a aac -ar 48k -ac 2 -b:a 64k -t 15 -f mpegts \
'udp://127.0.0.1:12345?overrun_nonfatal=1&fifo_size=1000000000&buffer_size=1000000000' ; \
echo "[$(date)] - sleeping 15 secs" ; \
sleep 15 ; \
echo "[$(date)] - running again"; \
done
Then I recorded that input also with ffmpeg -as a reference of the problem I want to solve-, like this:
ffmpeg -hide_banner -y -re -i \
'udp://127.0.0.1:12345?overrun_nonfatal=1&fifo_size=10M&buffer_size=10M&listen=1' \
-map 0 -c:v h264 -preset slow -pix_fmt yuv420p -t 30 test_no_clockfiller_360.mp4
Here’s the video:
Note how the number immediately jumps back to zero after 14. The bash loop waits for 15 seconds, but during that time ffmpeg was not generating any output. That was part of the problems I described before while talking about the VOD bias.
So then I tested my “clockfiller” filter -in a custom built ffmpeg-, like this:
./ffmpeg -hide_banner -y -re -i \
'udp://127.0.0.1:12345?overrun_nonfatal=1&fifo_size=10M&buffer_size=10M&listen=1' \
-map 0:v -vf "format=yuv420p,clockfiller" \
-c:v h264 -preset slow -pix_fmt yuv420p -t 30 \
test_clockfiller_360.mp4
And here’s the video:
This time, while the input is stopped, there’s still output frames. That’s exactly the proof of concept of what I wanted to achieve.
However, it’s not an entire success. I was very happy with the recorded result. But then tested it live -instead of saving to a file and then play that later- and could see that ffmpeg doesn’t output anything while the input is down: my filter just buffers frames until ffmpeg decides to start outputting them again. This has to do with the ffmpeg inner workings and its pipeline of tasks and events, which a filter submits to and can’t escape from.
So my next target will be to just not use ffmpeg, but its inner libraries in a different, custom program that does what I need. This was a cool experiment with some degree of success, and wanted to share it in case someone may be working in similar stuff.