Non-deterministic caching

Caching non-deterministic functions is trickier that memoization. Due to the fact that every execution of such a function may give different results, it is usually impossible to use previous values for an arbitrarily long amount of time. What you need to do instead is to decide for how long a cached value can be considered valid. After a defined period of time passes, the stored results are considered stale and the cache will need refreshing with a new value.

Non-deterministic functions that are usually a subject of caching often depend on some external state that is hard to track inside your application code. Typical examples of components include the following:

Relational databases, or generally any type of structured data storage engine
Third-party services accessible through network connection (web APIs)
Filesystems

So, in other words, non-deterministic caching is performed in any situation where pre-computed results are used temporarily. These results often represent a state that is consistent with the state of other system components—usually, the backing service.

Note that such an implementation is obviously a trade-off, and is therefore related to the techniques we looked at in the Using architectural trade-offs section. If you resign from running part of your code whenever necessary, and instead use historical results, you are risking using data that is stale or represents an inconsistent state of your system. In this case, you are trading the accuracy and/or completeness for speed and performance.

Of course, such caching is efficient as long as the time taken to interact with the cache is less than the time the cached function takes to execute. If it's faster to simply recalculate the value, by all means do so! That's why setting up a cache has to be done only if it's worth it; setting it up properly has a cost.

Things that can actually be cached are usually the results of interactions with other components of your system. If you want to save time and resources when communicating with the database, it is worth caching expensive queries. If you want to reduce the number of I/O operations, you may want to cache the content of files that are most-frequently accessed.

Techniques for caching non-deterministic functions are actually very similar to those used in caching deterministic ones. The most notable difference is that they usually require the option of invalidating cached values by their age. This means that the lru_cache() decorator from the functools module has limited use; however, it should not be too difficult to extend this function to provide the expiration feature. As this is a very common problem that has been solved numerous times by many developers, you should be able to find multiple libraries on PyPI that have been designed for caching non-deterministic values.

In the next section, we will take a look at cache services.