Video decoding in OSH JS Toolkit

One of the components provided by OSH Javascript Web Client Toolkit is a video viewer that can be used to visualize video streams produced by an OSH node (or other sources). The screenshot above shows the video player wrapped in a dialog and playing a raw H264 stream (including time stamps).

As you may know, handling video in a web browser can be quite a challenge on its own, and even more so when you are trying to synchronize video playback with other data streams (including other videos).

In fact, time synchronization of data streams is one of our main requirement in the Javascript Toolkit because the goal when one visualize sensor data is usually to get a temporally coherent operating picture of a scene, whether it’s in real-time or replaying historical data. You can see an example of this in one of our YouTube videos. In order to achieve this, all sensor data streams must be carefully time stamped with a common clock, and the Javascript client must process and display individual records of data in a timely manner in order to reproduce the sequence and paste of events as they were observed in the real world.

In the case of video, time stamping is done by the camera itself or the sensor driver running in the OSH node. The client is then responsible for playing individual video frames in a way that they show up at the right time relative to other sensor data streams that we are visualizing. This implies having great control on the video receiving and decoding process and this is where we got into trouble with existing video decoding techniques in web browsers.

So here is a discussion about why the following techniques worked or didn’t work for us:

HTML Image Tag

This actually works well (and only) with MJPEG video streams. It was pretty easy to extract time stamp and video frame data from websocket packets sent by the OSH server, and refresh the image data using a URL generated from a Blob object.

Obviously this technique didn’t work for displaying videos encoded with the other codecs we are interested in such as H264.

HTML Video Tag

This is the first technique we tried. The only format that’s really supported by all browsers using this technique is MP4 (with the H264 codec). To be precise, at the beginning of our experiments, support for fragmented MP4 (we need this for real-time video playback) was sparse but this seems to have been solved in most recent browsers by now.

So our first step was to wrap H264 video streams generated by OSH with the MP4 container format using track fragments. We were able to achieve this with the help of the Mp4Parser Java library (despite the name the library actually allows writing MP4 streams as well!), although we had to modify it for our needs in order to reduce latency and improve support for live streaming in general.

The next step was to play the stream in the browser. Although the video showed up fine, we discovered soon enough that HTML Video doesn’t provide fine control over the playback process, especially when using fragmented MP4 streams. First the only way you can feed the video player is buy giving it a URL so there was no way to use websockets which we like much better than HTTP for real-time persistent streams. Second, the decoding process is entirely black boxed, and events were not properly generated when using fragmented MP4, so we had no way to know how much caching was done, when playback really started or at what rate the frames were decoded. This made it clearly impossible to synchronize the video with other data streams.

MediaSource API

So we tried the MediaSource API, thinking that we may be able to feed it raw H264 data and better control the decoding process. As it turns out, that didn’t work so good either, although we haven’t finished exploring all possibilities.

Well, the first thing is that MediaSource clearly doesn’t provide an API that abstracts video codecs and demuxers as a library like FFMPEG does. It’s simply an API allowing better control on the way the video data is fetched and provided to the browser HTML Video Tag, but it’s still limited to the same video formats, that is to say MP4. So although this technique solved our first problem as we were now able to fetch video obtained from websocket packets, we still didn’t have great control over how frames were decoded and we still had to extract time stamps from the MP4 format.

Extracting individual frame time stamps from MP4 is harder than it seems because the format doesn’t not provide a time stamp for each frame. Instead, we had to compute the time stamp based on the video creation time (some weird integer format in seconds past Jan 1st 1904, so not super accurate to start with), the time stamp of movie segments that is given relative to the start of the video, and the theoretical frame rate. The computation seemed to work OK for clean videos but things got even more complicated when some frames are missing

Broadway JS

Broadway JS is a pure Javascript library for decoding H264 streams directly in the browser. It’s actually a port of the Android H264 decoder compiled to Javascript using Emscripten.

This library solved all of the problems above because we were responsible for feeding it with individual H264 packets (actually called NAL units in H264 jargon) . This meant we had full control over video data retrieval and decoding speed.

This worked very well until we discovered that the library doesn’t support all H264 flavors: it is essentially limited to H264 Baseline Profile, and some of our cameras were generating H264 using Main or High profiles.

Another draw back of using such a Javascript library is that it doesn’t benefit from hardware acceleration like a browser’s native video player does. However, don’t think this is unusable: performances are surprisingly good and we are able to decode several Full HD videos at 30 Hz on fairly recent laptops, and even reached 25Hz on a Nexus5 Android phone.

FFMPEG JS

Inspired by Broadway JS (see above), the last thing we tried was compiling FFMPEG to Javascript ourselves using Emscripten.

Fortunately, somebody paved the way with the ffmpeg.js project, so we started our fork and modified it to compile only the parts we needed, and in a way that we can feed it H264 data directly as byte arrays from Javascript (i.e. not reading from a file as required by the original ffmpeg.js).

This whole process was actually easier than expected and we were able to integrate it in the current OSH JS Toolkit (look at the OSH.UI.FFMPEGView class to see how it was done, including an implementation with web workers to get better performances when decoding multiple videos simultaneously).

The advantages of this technique are the same as with Broadway JS except we are not limited to H264 Basic Profile. In fact, even though our current build includes only the H264 decoder that supports all profiles, we could include any of the codecs supported by FFMPEG (e.g. MPEG2, H265, VP8, VP9, etc.)!!

As with Broadway JS, hardware acceleration is not feasible but performances are good enough on recent hardware for our application.

1 comment

Brad says:

November 17, 2016 at 8:16 pm

Ffmpeg js looks incredible and could be popular everywhere.

I would love to see example HTML that shows how to display live video given an rtsp URL., and display width and height in a form or hard coded.

LikeLike