As you may know, handling video in a web browser can be quite a challenge on its own, and even more so when you are trying to synchronize video playback with other data streams (including other videos).
In the case of video, time stamping is done by the camera itself or the sensor driver running in the OSH node. The client is then responsible for playing individual video frames in a way that they show up at the right time relative to other sensor data streams that we are visualizing. This implies having great control on the video receiving and decoding process and this is where we got into trouble with existing video decoding techniques in web browsers.
So here is a discussion about why the following techniques worked or didn’t work for us:
HTML Image Tag
This actually works well (and only) with MJPEG video streams. It was pretty easy to extract time stamp and video frame data from websocket packets sent by the OSH server, and refresh the image data using a URL generated from a Blob object.
Obviously this technique didn’t work for displaying videos encoded with the other codecs we are interested in such as H264.
HTML Video Tag
This is the first technique we tried. The only format that’s really supported by all browsers using this technique is MP4 (with the H264 codec). To be precise, at the beginning of our experiments, support for fragmented MP4 (we need this for real-time video playback) was sparse but this seems to have been solved in most recent browsers by now.
So our first step was to wrap H264 video streams generated by OSH with the MP4 container format using track fragments. We were able to achieve this with the help of the Mp4Parser Java library (despite the name the library actually allows writing MP4 streams as well!), although we had to modify it for our needs in order to reduce latency and improve support for live streaming in general.
The next step was to play the stream in the browser. Although the video showed up fine, we discovered soon enough that HTML Video doesn’t provide fine control over the playback process, especially when using fragmented MP4 streams. First the only way you can feed the video player is buy giving it a URL so there was no way to use websockets which we like much better than HTTP for real-time persistent streams. Second, the decoding process is entirely black boxed, and events were not properly generated when using fragmented MP4, so we had no way to know how much caching was done, when playback really started or at what rate the frames were decoded. This made it clearly impossible to synchronize the video with other data streams.
So we tried the MediaSource API, thinking that we may be able to feed it raw H264 data and better control the decoding process. As it turns out, that didn’t work so good either, although we haven’t finished exploring all possibilities.
Well, the first thing is that MediaSource clearly doesn’t provide an API that abstracts video codecs and demuxers as a library like FFMPEG does. It’s simply an API allowing better control on the way the video data is fetched and provided to the browser HTML Video Tag, but it’s still limited to the same video formats, that is to say MP4. So although this technique solved our first problem as we were now able to fetch video obtained from websocket packets, we still didn’t have great control over how frames were decoded and we still had to extract time stamps from the MP4 format.
Extracting individual frame time stamps from MP4 is harder than it seems because the format doesn’t not provide a time stamp for each frame. Instead, we had to compute the time stamp based on the video creation time (some weird integer format in seconds past Jan 1st 1904, so not super accurate to start with), the time stamp of movie segments that is given relative to the start of the video, and the theoretical frame rate. The computation seemed to work OK for clean videos but things got even more complicated when some frames are missing
This library solved all of the problems above because we were responsible for feeding it with individual H264 packets (actually called NAL units in H264 jargon) . This meant we had full control over video data retrieval and decoding speed.
This worked very well until we discovered that the library doesn’t support all H264 flavors: it is essentially limited to H264 Baseline Profile, and some of our cameras were generating H264 using Main or High profiles.
This whole process was actually easier than expected and we were able to integrate it in the current OSH JS Toolkit (look at the OSH.UI.FFMPEGView class to see how it was done, including an implementation with web workers to get better performances when decoding multiple videos simultaneously).
The advantages of this technique are the same as with Broadway JS except we are not limited to H264 Basic Profile. In fact, even though our current build includes only the H264 decoder that supports all profiles, we could include any of the codecs supported by FFMPEG (e.g. MPEG2, H265, VP8, VP9, etc.)!!
As with Broadway JS, hardware acceleration is not feasible but performances are good enough on recent hardware for our application.