August 20th, 2009 by Howdy Pierce Managing Partner
Something I end up explaining relatively often has to do with all the various ways you can stream video encapsulated in the Real Time Protocol, or RTP, and still claim to be standards compliant.
Some background: RTP is used primarily to stream either H.264 or MPEG-4 video. RTP is a system protocol that provides mechanisms to synchronize the presentation different streams – for instance audio and video. As such, it performs some of the same functions as an MPEG-2 transport or program stream.
RTP – which you can read about in great detail in RFC 3550 – is codec-agnostic. This means it is possible to carry a large number of codec types inside RTP; for each protocol, the IETF defines an RTP profile that specifies any codec-specific details of mapping data from the codec into RTP packets. Profiles are defined for H.264, MPEG-4 video and audio, and many more. Even VC-1 – the “standardized” form of Windows Media Video – has an RTP profile.
In my opinion, the standards are a mess in this area. It should be possible to meet all the various requirements on streaming video with one or at most two different methods for streaming. But ultimately standards bodies are committees: Each person puts in a pretty color, and the result comes out grey.
In fact, the original standards situation around MPEG-4 video was so confused that a group of large companies formed the Internet Streaming Media Alliance, or ISMA. ISMA’s role is basically to wade into all the different options presented in the standards and create a meta-standard – currently ISMA 2.0 – that ties a number of other standards documents together and tells you how to build a working system that will interoperate with other systems.
In any event, there are a number of predominant ways to send MPEG-4 or H.264 video using RTP, all of which follow some relevant standards. If you’re writing a decoder, you’ll normally need to address all of them, so here’s a quick overview.
Multicast delivery: RTP over UDP
In an environment where there is one source of a video stream and many viewers, ideally each frame of video and audio would only transit the network once. This is how multicast delivery works. In a multicast network, each viewer must retrieve an SDP file through some unspecified mechanism, which in practice is usually HTTP. Once retrieved, the SDP file gives enough information for the viewer to find the multicast streams on the network and begin playback.
In the Multicast delivery scenario, each individual stream is sent on a pair of different UDP ports – one for data and the second for the related Real Time Control Protocol or RTCP. That means for a video program consisting of a video stream and two audio streams, you’ll actually see packets being delivered to six UDP ports:
- Video data delivered over RTP
- The related RTCP port for the video stream
- Primary audio data delivered over RTP
- The related RTCP port for the primary audio stream
- Secondary audio data delivered over RTP
- The related RTCP port for the secondary audio stream
Timestamps in the RTP headers can be used to synchronize the presentation of the various streams.
As a side note, RTCP is almost vestigial for most applications. It’s specified in RFC 3550 along with RTP. If you’re implementing a decoder you’ll need to listen on the RTCP ports, but you can almost ignore any data sent to you. The exceptions are the sender report, which you’ll need in order to match up the timestamps between the streams, and the BYE, which some sources will send as they tear down a stream.
Multicast video delivery works best for live content. Because each viewer is viewing the same stream, it’s not possible for individual viewers to be able to pause, seek, rewind or fast-forward the stream.
Unicast delivery: RTP over UDP
It’s also possible to send unicast video over UDP, with one copy of the video transiting the network for each client. Unicast delivery can be used for both live and stored content. In the stored content case, additional control commands can be used to pause, seek, and enter fast forward and rewind modes.
Normally in this case, the player first establishes a control connection to a server using the Real Time Streaming Protocol, or RTSP. In theory RTSP can be used over either UDP or TCP, but in practice it is almost always used over TCP.
The player is normally started with an rtsp:// URL, and this causes it to connect over TCP to the RTSP server. After some back-and-forth between the player and the RTSP server, during which the server sends the client an SDP file describing the stream, the server begins sending video to the client over UDP. As with the multicast delivery case, a pair of UDP ports is used for each of the elementary streams.
For seekable streams, once the video is playing, the player has additional control using RTSP: It can cause playback to pause, or seek to a different position, or enter fast forward or rewind mode.
RTSP Interleaved mode: RTP and RTSP over TCP
I’m not a fan of streaming video over TCP. In the event a packet is lost in the network, it’s usually worse to wait for a retransmission (which is what happens with TCP’s guaranteed delivery) than it is just to allow the resulting video glitch to pass through to the user (which is what happens with UDP).
However, there are a handful of different networking configurations that would block UDP video; in particular, firewalls historically have interacted badly with the two modes of UDP delivery summarized above.
So the RTSP RFC, in section 10.12, briefly outlines a mode of interleaving the RTP and RTCP packets onto the existing TCP connection being used for RTSP. Each RTP and RTCP packet is given a four-byte prefix and dropped onto the TCP stream. The result is that the player connects to the RTSP server, and all communication flows over a single TCP connection between the two.
HTTP Tunneled mode: RTP and RTSP over HTTP over TCP
You would think RTSP Interleaved mode, being designed to transmit video across firewalls, would be the end, but it turns out that many firewalls aren’t configured to allow connections to TCP port 554, the well-known port for an RTSP server.
So Apple invented a method of mapping the entire RTSP Interleaved communication on top of HTTP, meaning the video ultimately flows across TCP port 80. To my knowledge, this HTTP Tunneled mode is not standardized in any official RFC, but it’s so widely implemented that it has become a de-facto standard.
This entry was posted on Thursday, August 20th, 2009 at 4:09 pm
Norbert - I agree with your summary on the brand winner. Just as there are many computer software winners, each for its particular application, so it is with video analytics. People understand they cannot buy any camera -each camera is good for a particular purpose so it is with video analytics packages. Also video analytics is part of a system so its capabilities are in tandum with the system. I do not think anyone can say with security which brand is best, even more so it is with analytics.
Dick Salzman, CPP
Keeneo
dick.salzman@keeneo.com
posted 7 days ago
Norbert, I agree with your last two comments about customer expectation and using the right software in the right situation. VCA adds extra value to camerasystems and can let operators work more effective, but it still has some limitations also. Most of the wellknown VCA companies work at quite the same level so it's hard to say who is the best company.
Regards,
Matthijs Vrisekoop
vrisekoop@hillson.nl
posted 6 days ago
Further to Dick’s comment about “each camera is good for a specific purpose” we need to understand that any Analytics software can only be as good as the raw date provided (in this case Video). Often the Analytics is added as an “afterthought” to an existing system and even when this is not the case it is uncommon to find detailed specifications as to what is expected from each camera in a system. A detailed specification and survey must be provided before any Analytics system is designed. Camera type, location, lighting (and other environmental considerations) must be considered in the basic design. If we do not clearly define what we expect from each camera (or group of cameras) then the chances of meeting expectations is slim. Systems often fail because these issues are not addressed in the basic design. Often, the “Best System” will be the system installed by the best integrator, who understands the system requirements and plans accordingly.
posted 20 hours ago