Multimedia Protocols

Up to this point, we've been discussing methods of exchanging real-time messages in text. There are also real-time messaging systems that allow the exchange of other kinds of data; these include Internet telephones, video conferencing systems, and application-sharing systems. These types of data require a great deal more bandwidth than plain text and often have more security implications.

Multimedia protocols tend to have several common characteristics. First, they normally use more than one port. They use multiple data streams in order to separate data with different characteristics and in order to maximize the efficiency with which they use network resources. Thus, they normally separate audio data from video data and use different channels for data going in different directions. They also separate the actual data from administrative commands, so that the port used to send video is not the same as the port used to say "Stop sending me video, I can't take it any more"; this maximizes the chances that the administrative commands will actually get through. The administrative functions are normally known as call control.

Most multimedia protocols use different lower-level protocols for data and for call control. Data is almost always sent over UDP, while call control is almost always sent over TCP. This is because the data needs a maximum of speed. It's not important if some packets are lost, as long as all the packets that get through are used as soon as they arrive. The call control, on the other hand, happens less often but must not get lost; it's worth the higher overhead of TCP in order to be guaranteed that commands will arrive.

Multimedia protocols are very difficult to protect adequately with firewalls. It would be hard to support any protocol that involved a large number of channels, going in both directions, and using both connection-oriented and connectionless protocols, but multimedia protocols further complicate the picture by requiring very high performance.

T.120 and H.323

T.120 and H.323^[30] are International Telecommunications Union (ITU) standards for conferencing. T.120 covers file transfer, chat, whiteboard, and application sharing; H.323 covers audio and video conferencing. These are both higher-level standards that use a number of lower-level protocols for various purposes, and you will occasionally hear people talk about Q.931, G.711, H.245, H.261, and H.263 in particular as parts of H.323, and T.122 through T.127 as parts of T.120. For most purposes, you don't need to worry about these lower-level protocols, which are used in conjunction with the higher-level protocols.

Neither the H.323 nor the T.120 standard requires implementors to provide any security. H.323 is used to carry audio and video data that will be presented to the user. Although this presents a risk of information leaks, it's not directly dangerous to the client except in the ways all protocols are dangerous to clients. Because H.323 sets up a large number of incoming data channels, both UDP and TCP, there's a significant risk that allowing H.323 will allow people to attack other, more vulnerable services.

T.120, on the other hand, is inherently dangerous. Both file transfer and application sharing are directly attackable applications.

Packet filtering characteristics of T.120

When running over TCP/IP, T.120 uses a straightforward TCP connection on port 1503. (This is actually specified by T.123, which is the transport standard associated with T.120.)

Direction	SourceAddr.	Dest.Addr.	Protocol	SourcePort	Dest.Port	ACKSet	Notes
In	Ext	Int	TCP	>1023	1503	^[31]	External client contacting internal server
Out	Int	Ext	TCP	1503	>1023	Yes	Internal server answering external client
Out	Int	Ext	TCP	>1023	1503		Internal client contacting external server
In	Ext	Int	TCP	1503	>1023	Yes	External server answering internal client
^[31]ACK is not set on the first packet of this type (establishing connection) but will be set on the rest.

Direction	SourceAddr.	Dest.Addr.	Protocol	SourcePort	Dest.Port	ACKSet	Notes
In	Ext	Int	TCP	>1023	1720	^[32]	External caller contacting internal callee
Out	Int	Ext	TCP	1720	>1023	Yes	Internal callee responding to external caller
Out	Int	Ext	TCP	>1023	1720	^[32]	Internal caller contacting external callee
In	Ext	Int	TCP	1720	>1023	Yes	External callee responding to internal caller
Out	Int	Ext	TCP	>1023	>1023	^[34]	Call control for data going internal to external
In	Ext	Int	TCP	>1023	>1023	Yes	Responses to call control for data going internal to external
In	Ext	Int	TCP	>1023	>1023	^[34]	Call control for data going external to internal
Out	Int	Ext	TCP	>1023	>1023	Yes	Responses to call control for data going external to internal
Out	Int	Ext	UDP	>1023	>1023	^[34]	Data going internal to external
In	Ext	Int	UDP	>1023	>1023	^[34]	Data going external to internal
^[32]ACK is not set on the first packet of this type (establishing connection) but will be set on the rest. ^[34]UDP has no ACK equivalent.

The extensive use of dynamically allocated ports makes H.323 very hard to deal with via packet filtering; in fact, Microsoft's instructions for NetMeeting (which is based upon H.323 and mentioned later) suggest allowing all UDP and TCP connections in either direction where both ends are above 1024. This configuration is extremely insecure, and we don't recommend it. However, it is the only way to allow H.323 through a nonstateful packet filtering firewall.

A stateful packet filter that can monitor the H.323 port negotiation would be capable of allowing only the needed data ports. Note that straightforward tricks like allowing only UDP responses will not work for H.323 because the incoming data streams from the remote host will not meet the normal criteria to be considered a response; the packet filtering must be H.323-aware. Unfortunately, H.323 is not particularly easy to parse, so H.323-aware packet filters are rare, although high-end packet filtering systems do offer them.

Because H.323 does not have any built-in authentication, allowing H.323 through a packet filter is not very secure, even if you use a dynamic packet filtering system that understands H.323. If you are concerned about transmitting confidential data, or about the security of your clients, you would be better off using a proxy that provides authentication features.

Proxying characteristics of H.323

H.323 has almost every characteristic that makes a protocol hard to proxy; it uses both TCP and UDP, it uses multiple ports, it uses dynamically allocated ports, it creates connections in both directions, and it embeds address information inside packets. The only good news is that the protocol provides a space where clients can specify a desired destination, making it easy for a proxy to figure out where connections should be directed.

One way of getting around the problems with proxying H.323 is to use what the standard calls a Multipoint Control Unit (MCU) and place it in a publicly accessible part of your network. These systems are designed primarily to control many-to-many connections, but they do it by having each person in the conference connect to them. It means that if you put one on a bastion-host network, you can allow both internal and external callers to connect to it, and only to it, and still get conferencing going. If this machine is well configured, it is relatively safe. However, it's not a true proxy. The external users have to be able to connect directly to the multipoint control unit; one multipoint control unit will not connect to another. The end result is that two sites that both use this workaround can't talk to each other. It works only if exactly one site in the conversation uses it. Several systems are available that provide this functionality, under various names.

It is also possible to get true H.323 proxies, which usually provide multipoint control and security features as well. In general, these are special-purpose products, not included with generic proxying packages. As we've pointed out, proxying H.323 is considerable work; it's not a minor modification to a normal proxy. However, vendors like Cisco and Microsoft that offer wide product ranges do offer H.323 proxying as part of specialized video conferencing products.

Network address translation characteristics of H.323

Because H.323 uses embedded IP addresses to set up the server-to-client connections, it will not work with straightforward network address translation. You will need a network address translator that is H.323-aware. These translators are rare because the IP address is not embedded in a fixed location; the network address translator has to actually parse the packets in order to be able to do the translation. This functionality is included in some of the H.323 proxies.

Summary of recommendations for T.120 and H.323

Do not allow T.120 through your firewall.
Use a special-purpose H.323 proxy that provides security features to allow H.323.

The Real-Time Transport Protocol (RTP) and the RTP Control Protocol (RTCP)

RTP is an IETF standard for transmitting real-time data (notably, audio and video). The most common use of RTP is actually as a lower-level protocol in conjunction with H.323. The standard for RTP actually details a pair of protocols; RTP transfers data, and RTCP is the control protocol. Some products that talk about RTP mean RTP in conjunction with RTCP, while others truly mean that they use RTP only, using some other protocol for control.

Packet filtering characteristics of RTP and RTCP

RTP and RTCP may use any underlying protocol. In TCP/IP implementations, they are normally UDP-based; they may use any pair of UDP ports, but RTP is supposed to use an even-numbered port with RTCP at the next higher port number. If RTP is at an odd-numbered port, RTCP will use the next lower port number instead, so that they are always at two successive ports with the lower one being even numbered. RTP is assigned port number 5004 and RTCP 5005, but they also often use 24032 and 24033.

Direction	SourceAddr.	Dest.Addr.	Protocol	SourcePort	Dest.Port	ACKSet	Notes
In	Ext	Int	UDP	>1023	5004^[38]	^[39]	External RTP client to internal server
Out	Int	Ext	UDP	5004^[38]	>1023	^[39]	Internal RTP server to external client
In	Ext	Int	UDP	>1023	5005^[42]	^[39]	External RTCP client to internal server
Out	Int	Ext	UDP	5005^[42]	>1023	^[39]	Internal RTCP server to external client
Out	Int	Ext	UDP	>1023	5004^[38]	^[39]	Internal RTP client to external server
In	Ext	Int	UDP	5004^[38]	>1023	^[39]	External RTP server to internal client
Out	Int	Ext	UDP	>1023	5005^[42]	^[39]	Internal RTCP client to external server
In	Ext	Int	UDP	5005^[42]	>1023	^[39]	External RTCP server to internal client
^[38]Or 24032, or any other port number, preferably even; see text for further explanation. ^[39]UDP has no ACK equivalent. ^[42]Or 24033, or any other port number, preferably odd; see text for further explanation.

Proxying characteristics of RTP and RTCP

RTP and RTCP are straightforward protocols, based on UDP. It would not be particularly difficult for a generic proxy system that supported UDP to allow them, but dedicated proxies for them are not widely available.

Network address translation of RTP and RTCP

RTCP may contain embedded hostnames and/or IP addresses as part of the sender description. This is not used to set up the connection but may reveal information that you wished to conceal. Aside from that, network address translation does not pose a problem for RTP or RTCP.

Summary of recommendations for RTP and RTCP

You are unlikely to encounter RTP and RTCP being used by themselves; they are normally used in conjunction with other protocols as part of a larger package. They are not inherently terribly dangerous, so your approach to them will depend on your approach to the rest of the package.

^[30] In case you're curious, the letters "T" and "H" are the designators for the ITU subcommittees that produced the standard, and subcommittee designators are just given out in alphabetical order. They're not short for anything.