Up to this point, we've been discussing methods of exchanging real-time messages in text. There are also real-time messaging systems that allow the exchange of other kinds of data; these include Internet telephones, video conferencing systems, and application-sharing systems. These types of data require a great deal more bandwidth than plain text and often have more security implications.
Multimedia protocols tend to have several common characteristics. First, they normally use more than one port. They use multiple data streams in order to separate data with different characteristics and in order to maximize the efficiency with which they use network resources. Thus, they normally separate audio data from video data and use different channels for data going in different directions. They also separate the actual data from administrative commands, so that the port used to send video is not the same as the port used to say "Stop sending me video, I can't take it any more"; this maximizes the chances that the administrative commands will actually get through. The administrative functions are normally known as call control.
Most multimedia protocols use different lower-level protocols for data and for call control. Data is almost always sent over UDP, while call control is almost always sent over TCP. This is because the data needs a maximum of speed. It's not important if some packets are lost, as long as all the packets that get through are used as soon as they arrive. The call control, on the other hand, happens less often but must not get lost; it's worth the higher overhead of TCP in order to be guaranteed that commands will arrive.
Multimedia protocols are very difficult to protect adequately with firewalls. It would be hard to support any protocol that involved a large number of channels, going in both directions, and using both connection-oriented and connectionless protocols, but multimedia protocols further complicate the picture by requiring very high performance.
T.120 and H.323[30] are International Telecommunications Union (ITU) standards for conferencing. T.120 covers file transfer, chat, whiteboard, and application sharing; H.323 covers audio and video conferencing. These are both higher-level standards that use a number of lower-level protocols for various purposes, and you will occasionally hear people talk about Q.931, G.711, H.245, H.261, and H.263 in particular as parts of H.323, and T.122 through T.127 as parts of T.120. For most purposes, you don't need to worry about these lower-level protocols, which are used in conjunction with the higher-level protocols.
Neither the H.323 nor the T.120 standard requires implementors to provide any security. H.323 is used to carry audio and video data that will be presented to the user. Although this presents a risk of information leaks, it's not directly dangerous to the client except in the ways all protocols are dangerous to clients. Because H.323 sets up a large number of incoming data channels, both UDP and TCP, there's a significant risk that allowing H.323 will allow people to attack other, more vulnerable services.
T.120, on the other hand, is inherently dangerous. Both file transfer and application sharing are directly attackable applications.
When running over TCP/IP, T.120 uses a straightforward TCP connection on port 1503. (This is actually specified by T.123, which is the transport standard associated with T.120.)
Direction | SourceAddr. | Dest.Addr. | Protocol | SourcePort | Dest.Port | ACKSet | Notes |
---|---|---|---|---|---|---|---|
In | Ext | Int | TCP | >1023 | 1503 | [31] | External client contacting internal server |
Out | Int | Ext | TCP | 1503 | >1023 | Yes | Internal server answering external client |
Out | Int | Ext | TCP | >1023 | 1503 | Internal client contacting external server | |
In | Ext | Int | TCP | 1503 | >1023 | Yes | External server answering internal client |
[31] ACK is not set on the first packet of this type (establishing connection) but will be set on the rest. |
Because T.120 uses a single TCP connection on a well-defined port, it is quite easy to allow through proxies. However, since T.120 allows both relatively safe uses (chat and whiteboard) and dangerous uses (file transfer and application sharing), it would be wise to have a T.120-aware proxy to enforce some security. Such proxies do not appear to be available yet.
T.120 will work transparently with network address translation.
H.323 uses at least three ports per connection. A TCP connection at port 1720 is used for call setup. In addition, each data stream requires one dynamically allocated TCP port (for call control) and one dynamically allocated UDP port (for data). Audio and data are sent separately, and data streams are one-way; this means that a normal video conference will require no less than eight dynamically allocated ports (a TCP control port and a UDP data port for outgoing video, another pair for outgoing audio, another pair for incoming video, and a final pair for incoming audio). Figure 19.3 shows the connections involved in a generic H.323 conference. Note that four of the dynamically allocated ports will be established from the outside to the inside (regardless of which side initiated the conversation).
Direction | SourceAddr. | Dest.Addr. | Protocol | SourcePort | Dest.Port | ACKSet | Notes |
---|---|---|---|---|---|---|---|
In | Ext | Int | TCP | >1023 | 1720 | [32] | External caller contacting internal callee |
Out | Int | Ext | TCP | 1720 | >1023 | Yes | Internal callee responding to external caller |
Out | Int | Ext | TCP | >1023 | 1720 | [32] | Internal caller contacting external callee |
In | Ext | Int | TCP | 1720 | >1023 | Yes | External callee responding to internal caller |
Out | Int | Ext | TCP | >1023 | >1023 | [34] | Call control for data going internal to external |
In | Ext | Int | TCP | >1023 | >1023 | Yes | Responses to call control for data going internal to external |
In | Ext | Int | TCP | >1023 | >1023 | [34] | Call control for data going external to internal |
Out | Int | Ext | TCP | >1023 | >1023 | Yes | Responses to call control for data going external to internal |
Out | Int | Ext | UDP | >1023 | >1023 | [34] | Data going internal to external |
In | Ext | Int | UDP | >1023 | >1023 | [34] | Data going external to internal |
[32] ACK is not set on the first packet of this type (establishing connection) but will be set on the rest. [34] UDP has no ACK equivalent. |
The extensive use of dynamically allocated ports makes H.323 very hard to deal with via packet filtering; in fact, Microsoft's instructions for NetMeeting (which is based upon H.323 and mentioned later) suggest allowing all UDP and TCP connections in either direction where both ends are above 1024. This configuration is extremely insecure, and we don't recommend it. However, it is the only way to allow H.323 through a nonstateful packet filtering firewall.
A stateful packet filter that can monitor the H.323 port negotiation would be capable of allowing only the needed data ports. Note that straightforward tricks like allowing only UDP responses will not work for H.323 because the incoming data streams from the remote host will not meet the normal criteria to be considered a response; the packet filtering must be H.323-aware. Unfortunately, H.323 is not particularly easy to parse, so H.323-aware packet filters are rare, although high-end packet filtering systems do offer them.
Because H.323 does not have any built-in authentication, allowing H.323 through a packet filter is not very secure, even if you use a dynamic packet filtering system that understands H.323. If you are concerned about transmitting confidential data, or about the security of your clients, you would be better off using a proxy that provides authentication features.
H.323 has almost every characteristic that makes a protocol hard to proxy; it uses both TCP and UDP, it uses multiple ports, it uses dynamically allocated ports, it creates connections in both directions, and it embeds address information inside packets. The only good news is that the protocol provides a space where clients can specify a desired destination, making it easy for a proxy to figure out where connections should be directed.
One way of getting around the problems with proxying H.323 is to use what the standard calls a Multipoint Control Unit (MCU) and place it in a publicly accessible part of your network. These systems are designed primarily to control many-to-many connections, but they do it by having each person in the conference connect to them. It means that if you put one on a bastion-host network, you can allow both internal and external callers to connect to it, and only to it, and still get conferencing going. If this machine is well configured, it is relatively safe. However, it's not a true proxy. The external users have to be able to connect directly to the multipoint control unit; one multipoint control unit will not connect to another. The end result is that two sites that both use this workaround can't talk to each other. It works only if exactly one site in the conversation uses it. Several systems are available that provide this functionality, under various names.
It is also possible to get true H.323 proxies, which usually provide multipoint control and security features as well. In general, these are special-purpose products, not included with generic proxying packages. As we've pointed out, proxying H.323 is considerable work; it's not a minor modification to a normal proxy. However, vendors like Cisco and Microsoft that offer wide product ranges do offer H.323 proxying as part of specialized video conferencing products.
Because H.323 uses embedded IP addresses to set up the server-to-client connections, it will not work with straightforward network address translation. You will need a network address translator that is H.323-aware. These translators are rare because the IP address is not embedded in a fixed location; the network address translator has to actually parse the packets in order to be able to do the translation. This functionality is included in some of the H.323 proxies.
RTP is an IETF standard for transmitting real-time data (notably, audio and video). The most common use of RTP is actually as a lower-level protocol in conjunction with H.323. The standard for RTP actually details a pair of protocols; RTP transfers data, and RTCP is the control protocol. Some products that talk about RTP mean RTP in conjunction with RTCP, while others truly mean that they use RTP only, using some other protocol for control.
RTP and RTCP may use any underlying protocol. In TCP/IP implementations, they are normally UDP-based; they may use any pair of UDP ports, but RTP is supposed to use an even-numbered port with RTCP at the next higher port number. If RTP is at an odd-numbered port, RTCP will use the next lower port number instead, so that they are always at two successive ports with the lower one being even numbered. RTP is assigned port number 5004 and RTCP 5005, but they also often use 24032 and 24033.
Direction | SourceAddr. | Dest.Addr. | Protocol | SourcePort | Dest.Port | ACKSet | Notes |
---|---|---|---|---|---|---|---|
In | Ext | Int | UDP | >1023 | 5004[38] | [39] | External RTP client to internal server |
Out | Int | Ext | UDP | 5004[38] | >1023 | [39] | Internal RTP server to external client |
In | Ext | Int | UDP | >1023 | 5005[42] | [39] | External RTCP client to internal server |
Out | Int | Ext | UDP | 5005[42] | >1023 | [39] | Internal RTCP server to external client |
Out | Int | Ext | UDP | >1023 | 5004[38] | [39] | Internal RTP client to external server |
In | Ext | Int | UDP | 5004[38] | >1023 | [39] | External RTP server to internal client |
Out | Int | Ext | UDP | >1023 | 5005[42] | [39] | Internal RTCP client to external server |
In | Ext | Int | UDP | 5005[42] | >1023 | [39] | External RTCP server to internal client |
[38] Or 24032, or any other port number, preferably even; see text for further explanation. [39] UDP has no ACK equivalent. [42] Or 24033, or any other port number, preferably odd; see text for further explanation. |
RTP and RTCP are straightforward protocols, based on UDP. It would not be particularly difficult for a generic proxy system that supported UDP to allow them, but dedicated proxies for them are not widely available.
RTCP may contain embedded hostnames and/or IP addresses as part of the sender description. This is not used to set up the connection but may reveal information that you wished to conceal. Aside from that, network address translation does not pose a problem for RTP or RTCP.
You are unlikely to encounter RTP and RTCP being used by themselves; they are normally used in conjunction with other protocols as part of a larger package. They are not inherently terribly dangerous, so your approach to them will depend on your approach to the rest of the package.
[30] In case you're curious, the letters "T" and "H" are the designators for the ITU subcommittees that produced the standard, and subcommittee designators are just given out in alphabetical order. They're not short for anything.