Memcached

If you want to be serious about caching, Memcached is a very popular and battle-hardened solution. This cache server is used by big applications, including Facebook and Wikipedia, to scale their websites. Among simple caching features, it has clustering capabilities that make it possible for you to set up an efficiently distributed cache system in no time.

Memcached is a multi-platform service, and there are a handful of libraries for communicating with it available in multiple programming languages. There are many Python clients that differ slightly from each other, but the basic usage is usually the same. The simplest interaction with Memcached almost always consists of the following three methods:

The following code snippet is an example of integration with Memcached using one popular Python package, pymemcached:

from pymemcache.client.base import Client 
 
# setup Memcached client running under 11211 port on localhost 
client = Client(('localhost', 11211)) 
 
# cache some value under some key and expire it after 10 seconds 
client.set('some_key', 'some_value', expire=10) 
 
# retrieve value for the same key 
result = client.get('some_key') 

One of the downsides of Memcached is that it is designed to store values either as strings or binary blobs, and this isn't compatible with every native Python type. In fact, it is only compatible with one-strings. This means that more complex types need to be serialized in order to be successfully stored in Memcached. A common serialization choice for simple data structures is JSON. An example of how to use JSON serialization with pymemcached is as follows:

import json 
from pymemcache.client.base import Client 

def json_serializer(key, value): if type(value) == str: return value, 1 return json.dumps(value), 2
def json_deserializer(key, value, flags): if flags == 1: return value if flags == 2: return json.loads(value) raise Exception("Unknown serialization format")
client = Client(('localhost', 11211), serializer=json_serializer, deserializer=json_deserializer) client.set('key', {'a':'b', 'c':'d'}) result = client.get('key')

Another problem that is very common when working with a caching service that works on the key/value storage principle is how to choose key names.

For cases when you are caching simple function invocations that have basic parameters, the solution is usually simple. Here, you can convert the function name and its arguments into strings and then concatenate them together. The only thing you need to worry about is making sure there are no collisions between keys that have been created for different functions if you are caching in different places within an application.

A more problematic case is when cached functions have complex arguments that consist of dictionaries or custom classes. In that case, you will need to find a way to convert invocation signatures into cache keys in a consistent manner.

The last problem is that Memcached, like many other caching services, does not tend to like very long key strings. The shorter is better. Long keys may either reduce performance, or will simply not fit the hardcoded service limits. For instance, if you cache whole SQL queries, the query strings themselves are generally suitable unique identifiers that can be used as keys. On the other hand, complex queries are generally too long to be stored in a caching service such as Memcached. A common practice is to instead calculate the MD5, SHA, or any other hash function and use that as a cache key instead. The Python standard library has a hashlib module that provides implementation for a few popular hash algorithms. One important thing to note when using hashing functions are hash collisions. There is no hash function that guarantees that collisions will never occur, so always be sure to know and mitigate any potential risks.