RTP

Next: Case Studies Up: Multimedia Transport Protocols Previous: Requirements and Benefits

RTP

RTP Overview

RTP is designed as an end-to-end transport protocol that is suitable for unicast or multicast transmission of real-time data, such as interactive audio or video.

RTP follows the two principles outlined by Clark and Tennenhouse in [CT90]:

Application Level Framing.: The key architectural principle is Application Level Framing. The idea is that the application should break down the data into aggregates, which are meaningful to the application, and which do not dependent on a specific network technology. These data aggregates are called Application Data Units (ADUs). The frame boundaries of ADUs should be preserved by lower levels, that is, by the network system.
The rule, by which the size of an ADU is chosen, states that an application must be able to process each ADU separately and potentially out of order with respect to other ADUs. So, the loss of some ADUs, even if a retransmission is triggered, does not stop the receiving application from processing other ADUs. Therefore and to express data loss in terms meaningful to the application, each ADU must contain a name that allows the receiver to understand the place of an ADU in the sequence of ADUs produced by the sender. Hence, RTP data units carry sequence numbers and timestamps, so that a receiver can determine the time and sequence relation between ADUs.
The ADU is also the main unit of error recovery. Because each ADU is a meaningful data entity to the receiving application, the application itself can decide about how to cope with a lost data unit: real-time applications, such as digital video broadcasting, might prefer to simply ignore a few lost frames instead of delaying the presentation until the lost frames are retransmitted, whereas file transfer applications cannot accept the loss of a single data unit. In addition, if the application has the choice of how to deal with a lost unit, the sender can decide whether to buffer the data units for potential retransmission, to recompute the lost units if requested again, or to send new data that which diminishes the harm done by the loss of the original ADU.
Integrated Layer Processing: Because application level framing breaks down the data in pieces that an application can handle separately from other data units, all processing of a single, complete ADU can be done in one integrated processing step for reasons of efficiency. This engineering principle is called Integrated Layer Processing.
While the authors of [CT90] agree that layered isolation, as employed in conventional protocol stacks, is suitable for the network layer and below, they argue that many of the functions of the transport and above layers could be structured in a way that would permit the use of the more efficient Integrated Layer Processing.
Especially for RISC systems, whose performance is substantially limited by the costs of memory cycles, an integrated processing loop is more efficient than several, isolated steps that each read the data from memory, possibly process it in someway, and write it back to memory.

RTP is used in combination with other network or transport protocols---typically on top of UDP---as visualised in Figure . The RTP specification recommends the use of two different generalised destination ports within one RTP session, one port for the reception of RTP payload packets and one for RTCP control packets. If RTP is used in conjunction with RSVP, one RTP session corresponds to two RSVP sessions, which are identified by the addresses of the RTP data and control destination ports, respectively.

Figure: RTP in relation to network and transport protocols.

RTP supplements the data delivery functions of the underlying protocols with the following services:

Payload type identification.
Sequence numbering.
Timestamping.

It should be noted that RTP itself does not make any QoS commitments, does not guarantee reliable, timely, or in-order delivery, and does not enforce any error treatment measures. However, extensions, such as those described by Parnes in [Par96], add reliability to RTP for applications that cannot tolerate packet loss, for instance white-board applications.

The accompanying RTP Control Protocol (RTCP) facilitates monitoring of the delivery QoS, and conveys rudimentary information about the participants of an RTP session.

RTP and RTCP are defined in [RFC1889]. However, this specification serves only as a protocol framework; RTP is intended to be tailored to the needs of particular applications by additional specifications:

Payload format specification documents, which define how a particular payload is to be carried in RTP. Currently, payload format specification RFCs exists for H.261 video streams [RFC2032], for CellB video encoding [RFC2029], for JPEG-compressed video [RFC2035], and for MPEG video [RFC2038].
A profile specification document, which defines a set of payload type codes and their mapping to payload formats, as well as any extension or modification of the RTP message headers that are necessary for a specific class of applications. A profile for audio and video conferences is given in [RFC1890].

RTP Payload Transmission

[RFC1889] introduces the term of an RTP session to describe a set of participants that communicate via a particular pair of destination transport address, for example a multicast IP address together with two UDP ports for control and payload data. Within one session, only a single payload type and the associated control information are transmitted; this rule has several consequences:

Multiple RTP sessions must be used for multimedia applications.
Demultiplexing of the received data streams by the payload type is avoided, following the principle of integrated layer processing.
Different QoS reservations can be established for each session in accordance with the different needs of each medium and each receiver's possibilities.

An RTP packet consists of a fixed header, a variable-length list of sources that contributed to the payload in the packet, a potential header extension as defined in a profile specification, and the actual payload, which is encoded according to a payload format specification.

The fields of highest importance in the RTP packet header are:

Synchronisation source (SSRC) identifier. A receiver uses the synchronisation source identifier to demultiplex the packets received via one destination port. Therefore, each SSRC identifier must be unique within one RTP session.
All packets from one synchronisation source fit into the same timing and sequence numbering space.
Sequence number. The synchronisation source increments by one the sequence number of each RTP data packet sent in one session. This number can be used by a receiver to restore the sequence of packets disordered during the transmission, to detect packet loss, and to compute the number of expected and lost packets.
Timestamp. The timestamp describes the sampling time of the first octet of the payload data carried by the packet. The sampling instant must be measured by means of a clock, which produces timestamps increase monotonically and linearly in time. The clock frequency that is used for the mapping of timestamps to wall clock time should be layed down in a protocol or payload specification, or could be recomputed with the help of NTP timestamps sent within RTCP packets.
A receiver can use the timestamps for intra- and inter-stream playout synchronisation as well as for inter-arrival jitter estimation.

An RTP data packet does not contain any length indication; an application must either rely on the protocol used beneath RTP to provide framing mechanisms, or, in case the underlying protocol provides a continuous octet stream abstraction, define a method of encapsulating RTP packets in the octet stream.

Control and Feedback Information

RTCP control packets are sent periodically (with small random time variations to avoid traffic bursts) to the same (multicast) host address as the RTP data packets. Their content facilitates:

QoS monitoring and congestion control. The primary function of RTCP messages is to provide feedback on the quality of the data distribution. The conveyed information can be used for flow and congestion control, to control adaptive encodings, and to detect network faults.
Applications that produce payload data generate RTCP sender reports. These reports contain counters of sent packets and octets that allow other session participants to estimate a sender's data rate.
Applications that have recently received packets issue RTCP receiver reports, which contain the highest sequence number received, loss estimations, jitter measures, and timing information needed to calculate the round-trip delay between the sender and the receiver.
Media synchronisation. Sender reports contain an NTP timestamp and an RTP timestamp. The RTP timestamp describes the same instant as the NTP timestamp but is measured in the same units as the timestamps issued in the data packets of the sender. These two timestamps allow to synchronise a receiver's playout clock rate with the sampling clock rate of the sender. In addition, a receiver can use this RTP/NTP timestamp correspondence information to synchronise the playout offset delay of related streams.
Member identification. RTP data packets and most RTCP packets carry only an SSRC identifier but convey no other context data about a session participant. Therefore, RTCP source description packets are sent by both data senders and receivers. They contain additional information about the session members, especially the obligatory canonical name (cname), which identifies a participant throughout all related sessions independently from the SSRC identifiers. The representation of the canonical name must be understandable by both humans as well as machines. Other conveyed information includes email addresses, telephone numbers, the application name as well as optional application-specific information. An application that decides to leave a session must transmit an RTCP bye packet to indicate that it will not participate in the session anymore.
Session size estimation and control traffic scaling. The total control traffic generated by all participants in a session should not exceed a small percentage of the data traffic, typically 5 percent. The control messages from other session members enable a participant to estimate the session size and to adapt its control packet rate, so as to remain within the limit of the control traffic share. So, RTP scales up to large numbers of participants.

Different schemes have been developed to exploit delivery feedback information in order to estimate the network congestion state and to adjust the application bandwidth accordingly. These schemes work for applications that can scale their media quality and their data rates to a given bandwidth value.
Busse, Deffner, and Schulzrinne propose a simple though interesting and workable mechanism [BDS96]. The fraction of lost packets, conveyed by normal RTCP receiver reports, is used as the primary indicator of the congestion state of the network: a data sender classifies the network path to each receiver as UNLOADED, LOADED, or CONGESTED. Depending either on the average classification of all paths or on the worst path, the sender makes an adjustment decision about its sending behaviour: it may increase, hold, or decrease its application bandwidth.
Bolot, Turletti, and Wakeman take a similar but more complex approach [BTW94]. Instead of using the information from standard RTCP messages, a sender solicits congestion state information from a subset of the receivers, and then decides about appropriate bandwidth adjustments.

Next: Case Studies Up: Multimedia Transport Protocols Previous: Requirements and Benefits

tspeuker@cip.informatik.uni-erlangen.de