Cache Communication Protocols

When we discussed proxying and HTTP, we also discussed caching, which is one of the primary uses of web proxies. Caching is very important as a way of speeding up transfers and reducing the amount of data transferred across crowded links. Once cache servers are set up, the next logical step is to use multiple cache servers and have them coordinate operations. A lot of active development is going on, and it's not at all clear what protocol is going to win out in the long run.

Internet Cache Protocol (ICP)

ICP is the oldest of the cache management protocols in current use and is supported by the largest number of caches, including Netscape Proxy, Harvest, and Squid. The principle behind ICP is that cache servers operate independently, but when a cache server gets a request for a document that it does not have cached, it asks other cache servers for the document, and retrieves the document from its source only if no other cache server has the document. ICP has a number of drawbacks; it requires a considerable amount of communication between caches, it slows down document retrieval, it provides no security or authentication, and it searches the cache based only on URL, not on document header information, which may cause it to return incorrect document versions. On the other hand, it has the noticeable advantage of being both standardized (it is documented in IETF RFCs 2186 and 2187) and in widespread use.

Packet filtering characteristics of ICP

ICP normally uses UDP; the port number is configurable but defaults to 3130. ICP can also be run over TCP, once again at any port. Caches exchange documents via HTTP. Once again, the port used for HTTP is configurable, but it defaults to 3128.

Direction	Source Addr.	Dest. Addr.	Protocol	Source Port	Dest. Port	ACK Set	Notes
In	Ext	Int	UDP	>1023	3130^[9]	^[10]	ICP request or response, external cache to internal cache
Out	Int	Ext	UDP	3130^[9]	>1023	^[10]	ICP request or response, internal cache to external cache
In	Ext	Int	TCP	>1023	3128^[13]	^[14]	HTTP request, external cache to internal cache
Out	Int	Ext	TCP	3128^[13]	>1023	Yes	HTTP response, internal cache to external cache
Out	Int	Ext	TCP	>1023	3128^[13]	^[14]	HTTP request, internal cache to external cache
In	Ext	Int	TCP	3128^[13]	>1023	Yes	HTTP response, external cache to internal cache
^[9]3130 is the standard port number for ICP, but some servers run on different port numbers. ^[10]UDP has no ACK equivalent. ^[13]3128 is the standard port number for intercache HTTP servers, but some servers run on different port numbers. ^[14]ACK is not set on the first packet of this type (establishing connection) but will be set on the rest.

Proxying characteristics of ICP

ICP, like SMTP and NNTP, is a self-proxying protocol, one that allows for queries to be passed from server to server. In general, if you are configuring ICP in a firewall environment, you will use this facility and set all internal cache servers to peer with a cache server that's part of the firewall and serves as a proxy.

Since ICP is a straightforward TCP-based protocol, it would also be possible to proxy it through a proxy system like SOCKS; the only difficulty is that you would end up with a one-way relationship, since the external cache would not be able to send queries to the internal cache. This would slow down performance without providing any more security than doing self-proxying, and no current implementations support it.

Network address translation characteristics of ICP

ICP does contain embedded IP addresses, but they aren't actually used for anything. It will work without problems through network address translation systems, as long as you configure a static translation (to allow for requests from other peers) and don't mind the fact that the internal address will be visible to anybody watching traffic.

Cache Array Routing Protocol (CARP)

CARP uses a completely different approach. Rather than having caches communicate with each other, CARP does load balancing between multiple cache servers by having a client or a proxy server use different caches for different requests, depending on the URL being requested and published information about the cache server. The information about available cache servers is distributed through HTTP, so CARP adds no extra protocol complexity. For both packet filtering and proxying, CARP is identical to other uses of HTTP. However, CARP does have difficulties with network address translation, since the documents it uses are guaranteed to have IP addresses in them (the addresses of the cache servers). Netscape and Microsoft both support CARP as well as ICP.

Web Cache Coordination Protocol (WCCP)

WCCP is a protocol developed by Cisco, which takes a third completely different approach. In order to use WCCP, you need a router that is placed so that it can intercept all HTTP traffic that should be handled by your cache servers. The router will detect any packet addressed to TCP port 80 at any destination and redirect the packet to a cache server. The cache server then replies directly to the requestor as if the request had been received normally. WCCP is used for communication between the router and the cache servers, so that the router knows what cache servers are currently running, what load each one is running under, and which URLs should be directed to which servers, and can appropriately balance traffic.

Packet filtering characteristics of WCCP

WCCP uses UDP at port 2048. In addition, routers that use WCCP redirect HTTP traffic to cache servers by encapsulating it in GRE packets (GRE is a form of IP over IP, discussed in Chapter 4). WCCP uses GRE protocol type hexadecimal 883E. Note that neither UDP nor GRE uses ACK bits.

Direction	Source Addr.	Dest. Addr.	Protocol	Source Port	Dest. Port	Notes
In	Ext	Int	UDP	^[19]	2048	WCCP update, external participant to internal participant
Out	Int	Ext	UDP	^[19]	2048	WCCP update, internal participant to external participant
In	Ext	Int	GRE	^[21]	^[21]	HTTP query redirected by external router to internal cache server
Out	Int	Ext	GRE	^[21]	^[21]	HTTP query redirected by internal router to external cache server
^[19]The WCCP protocol does not define a source port; it is likely to be 2048. ^[21]GRE does not have source or destination ports, only protocol types. WCCP uses protocol type hexadecimal 883E.

Proxying characteristics of WCCP

Because WCCP uses both UDP and GRE, it is going to be difficult to proxy. Although UDP proxies have become relatively common, GRE is still unknown territory for proxy servers.

Network address translation characteristics of WCCP

WCCP communications include embedded IP addresses and will not work through network address translation. The architecture of WCCP assumes that your router and your cache servers are near each other (in network terms) in any case.

Summary of Recommendations for Cache Communication Protocols

Cache management should either be private (between internal cache servers) or public (between a bastion host used to access the external world and external caches). Cache management protocols may cross parts of a firewall to reach a bastion host but should not go completely across the firewall between external and internal networks.