objgraph is a simple module for creating diagrams of object references that should be useful when hunting memory leaks in Python. It is available on PyPI, but it is not a completely standalone tool and requires Graphviz in order to create memory usage diagrams. For developer-friendly systems like macOS or Linux, you can easily obtain it using your preferred system package manager (for example, brew for macOS, apt-get for Debian/Ubuntu). For Windows, you need to download the Graphviz installer from the project page (refer to http://www.graphviz.org/) and install it manually.
objgraph provides multiple utilities that allow you to list and print various statistics about memory usage and object counts. An example of such utilities in use is shown in the following transcript of interpreter sessions:
>>> import objgraph >>> objgraph.show_most_common_types() function 1910 dict 1003 wrapper_descriptor 989 tuple 837 weakref 742 method_descriptor 683 builtin_function_or_method 666 getset_descriptor 338 set 323 member_descriptor 305 >>> objgraph.count('list') 266 >>> objgraph.typestats(objgraph.get_leaking_objects()) {'Gt': 1, 'AugLoad': 1, 'GtE': 1, 'Pow': 1, 'tuple': 2, 'AugStore': 1, 'Store': 1, 'Or': 1, 'IsNot': 1, 'RecursionError': 1, 'Div': 1, 'LShift': 1, 'Mod': 1, 'Add': 1, 'Invert': 1, 'weakref': 1, 'Not': 1, 'Sub': 1, 'In': 1, 'NotIn': 1, 'Load': 1, 'NotEq': 1, 'BitAnd': 1, 'FloorDiv': 1, 'Is': 1, 'RShift': 1, 'MatMult': 1, 'Eq': 1, 'Lt': 1, 'dict': 341, 'list': 7, 'Param': 1, 'USub': 1, 'BitOr': 1, 'BitXor': 1, 'And': 1, 'Del': 1, 'UAdd': 1, 'Mult': 1, 'LtE': 1}
As we mentioned previously, objgraph allows you to create diagrams of memory usage patterns and cross-references that link all the objects in the given namespace. The most useful diagramming utilities of that library are objgraph.show_refs() and objgraph.show_backrefs(). They both accept a reference to the object being inspected and save a diagram image to file using the Graphviz package. Examples of such graphs are presented in Figure 2 and Figure 3. Here is the code that was used to create these diagrams:
from collections import Counter
import objgraph
def graph_references(*objects):
objgraph.show_refs(
objects,
filename='show_refs.png',
refcounts=True,
# additional filtering for the sake of brevity
too_many=5,
filter=lambda x: not isinstance(x, dict),
)
objgraph.show_backrefs(
objects,
filename='show_backrefs.png',
refcounts=True
)
if __name__ == "__main__":
quote = """
People who think they know everything are a
great annoyance to those of us who do.
"""
words = quote.lower().strip().split()
counts = Counter(words)
graph_references(words, quote, counts)
The following diagram shows the diagram of all references held by words and quote, and counts objects:
The following diagram shows only objects that hold references to the objects that we passed to the show_backrefs() function. They are called back references and are really helpful in finding objects that stop other objects from being deallocated:
In order to show how objgraph may be used in practice, let's review an example of code that may create memory issues under certain versions of Python. As we already noted multiple times in this book, CPython has its own garbage collector that exists independently from its reference counting mechanism. It's not used for general purpose memory management, and its sole purpose is to solve the problem of cyclic references. In many situations, objects may reference each other in a way that would make it impossible to remove them using simple techniques based on tracking the number of references. Here is the simplest example:
x = [] y = [x] x.append(y)
Such a situation is visually presented in the following diagram. In the preceding case, even if all external references to x and y objects will be removed (for instance, by returning from the local scope of a function), these two objects cannot be removed through reference counting because there will always be two cross-references owned by these two objects. This is the situation where the Python garbage collector steps in. It can detect cyclic references to objects and trigger their deallocation if there are no other valid references to these objects outside of the cycle:
The real problem starts when at least one of the objects in such a cycle has the custom __del__() method defined. It is a custom deallocation handler that will be called when the object's reference count finally goes to zero. It can execute any arbitrary Python code and thus can also create new references to featured objects. This is the reason why the garbage collector prior to Python 3.4 could not break reference cycles if at least one of the objects provided the custom __del__() method implementation. PEP 442 introduced safe object finalization to Python and became a part of the language standard, starting from Python 3.4. Anyway, this may still be a problem for packages that worry about backwards compatibility and target a wide spectrum of Python interpreter versions. The following snippet of code allows you to show difference in behavior of the cyclic garbage collector in different Python versions:
import gc import platform import objgraph class WithDel(list): """ list subclass with custom __del__ implementation """ def __del__(self): pass def main(): x = WithDel() y = [] z = [] x.append(y) y.append(z) z.append(x) del x, y, z print("unreachable prior collection: %s" % gc.collect()) print("unreachable after collection: %s" % len(gc.garbage)) print("WithDel objects count: %s" % objgraph.count('WithDel')) if __name__ == "__main__": print("Python version: %s" % platform.python_version()) print() main()
The following output of the preceding code, when executed under Python 3.3, shows that the cyclic garbage collector in the older versions of Python cannot collect objects that have the __del__() method defined:
$ python3.3 with_del.py Python version: 3.3.5 unreachable prior collection: 3 unreachable after collection: 1 WithDel objects count: 1
With a newer version of Python, the garbage collector can safely deal with the finalization of objects, even if they have the __del__() method defined, as follows:
$ python3.5 with_del.py Python version: 3.5.1 unreachable prior collection: 3 unreachable after collection: 0 WithDel objects count: 0
Although custom finalization is no longer a memory threat in the latest Python releases, it still poses a problem for applications that need to work under different environments. As we mentioned earlier, the objgraph.show_refs() and objgraph.show_backrefs() functions allow you to easily spot problematic objects that take part in unbreakable reference cycles. For instance, we can easily modify the main() function to show all back references to the WithDel instances in order to see if we have leaking resources, as follows:
def main(): x = WithDel() y = [] z = [] x.append(y) y.append(z) z.append(x) del x, y, z print("unreachable prior collection: %s" % gc.collect()) print("unreachable after collection: %s" % len(gc.garbage)) print("WithDel objects count: %s" % objgraph.count('WithDel')) objgraph.show_backrefs( objgraph.by_type('WithDel'), filename='after-gc.png' )
Running the preceding example under Python 3.3 will result in a diagram, which shows that gc.collect() could not succeed in removing x, y, and z object instances.
Additionally, objgraph highlights all the objects that have the custom __del__() method defined in red to make spotting such issues easier:
In the next section, we will discuss C code memory leaks.