H.264 "zero" latency video encoding and decoding for time-critical applications
Early digital video compression solutions primarily focused on applications that do not require real-time interaction, such as in TV broadcast, video-on-demand and DVDplayback. In these applications the latency between the source and the decoded video is not important and can easily extend to several seconds. However, in applications where there is a closed feedback loop, such as video conference and videophone, latency is the most crucial aspect of the system, as it determines whether the system will be stable or not. Keeping the latency of a video codec in such systems as minimal as possible is the proper approach. In many such applications latency measured in sub 10 milliseconds is crucial and it takes a radically different approach from traditional ones to achieve a low latency implementation of the popular H.264/MPEG-4 AVC (Part 10) video coding standard.
Latency and zero latency defined
Simply put, video codec latency is defined here as the time lapse between the first pixel of video appearing in the source and the first pixel of decoded video appearing at the destination. Latency-sensitive video applications require that the time lapse between source and decoded video is extremely small. How small depends on the application, but as a guideline, keeping latency down to sub 10ms is a good idea. For convenience we will call such low latency "zero" latency. This is in contrast with the orders of magnitude higher latency found in non latency-sensitive applications.
Figure 1: Latency between source and decoded video
Latency sensitive video codec applications
In video conferencing and video telephony, noticeable delay makes a conversation impossible, unless a "walky-talky" like protocol is strictly followed. This makes the conversation unnatural and cumbersome. In these applications sub 33ms latency for the video codec is required.
Figure 2: Implications of latency in video conferencing
Home networks and other applications
An emerging application with high sensitivity to latency is wireless video networking in the home. This application has recently gained a lot of interest from CE manufacturers and aims to eliminate the HDMI cable between the HDTV set and video source, such as a settop box, DVD player or game box. A similar compelling case exists for the computer industry, where the link between laptop and flat panel monitoris replaced by a wireless connection.
In these applications user interaction with the remote control, game pad, keyboard or mouse, must result in instant screen updates, otherwise the solution is rendered entirely useless. Since transmission at multi-gigabit per second rates over a highly unpredictable RF link is impractical, video compression is required. In these applications sub 10ms latency for the video codec is a critical requirement.
View full size
Figure 3: Implications of latency in wireless video networking applications
Another example with a high emphasis on the importance of low latency is digital video surveillance for mission critical applications. The challenge here is to match the inherently low latency of analog video surveillance systems, as their digital counter parts replace them. In case of securing valuables, such as money in a bank, priceless artifacts in a museum, or merchandise in a store, it is important that the area or building where the intrusion occurs is instantly secured.
In multiple-camera-tracking, video feeds from several cameras are stitched together chronologically into a single feed, which tracks one or more moving objects of interest. Too much latency in the video feeds makes stitching these together a complicated task and renders the application useless for rapid response action. In all these surveillance applications sub 10ms latency for the video codec is a critical requirement.
View full size
Figure 4: Implications of latency in video surveillance applications
Lastly, a less obvious example is in electronic newsgathering or ENG. In these applications cameras in the field capture live action and transmit the video for live broadcast to a nearby satellite uplink truck, where it is edited in real-time prior to up linking. Video feeds from multiple cameras and camera panning/zooming actions need to be interpreted in real-time by the production crew.
Figure 5: Implications of latency in ENG applications
Very low latency in the video feeds is necessary to provide inherent synchronization between all the different video feeds and with panning/zooming actions of the cameras. In this application sub 33ms latency for the video codec is highly desirable.
Unexpected benefits of "zero" latency video codecs
"Zero" latency can drastically simplify systems design in applications where added latency due to other parts in the system, such as transmitters, receivers, video capture an rendering subsystems is negligible. In these cases complicated A/V time stamping and synchronization schemes are not needed as the extremely low latency of the video stream with respect to the audio stream provides inherent synchronization between the two streams. Such extremely low latency A/V systems strongly mimic the way A/V communication occurs in the natural world -- without complicated time stamping and synchronization.
Inside zero latency H.264 video encode-decode processing
In traditional approaches the encoding process starts when a complete frame of video is present, introducing at least 33ms of latency at the encoder and another 33ms at the decoder. In combination with multi-pass motion estimation, multi-pass rate control and frame-based noise filtering, traditional methods of implementation can easily exhibit in excess of 200ms encode-decode latency.