objgraph

objgraph is a simple module for creating diagrams of object references that should be useful when hunting memory leaks in Python. It is available on PyPI, but it is not a completely standalone tool and requires Graphviz in order to create memory usage diagrams. For developer-friendly systems like macOS or Linux, you can easily obtain it using your preferred system package manager (for example, brew for macOS, apt-get for Debian/Ubuntu). For Windows, you need to download the Graphviz installer from the project page (refer to http://www.graphviz.org/) and install it manually.

objgraph provides multiple utilities that allow you to list and print various statistics about memory usage and object counts. An example of such utilities in use is shown in the following transcript of interpreter sessions:

>>> import objgraph
>>> objgraph.show_most_common_types()
function                   1910
dict                       1003
wrapper_descriptor         989
tuple                      837
weakref                    742
method_descriptor          683
builtin_function_or_method 666
getset_descriptor          338
set                        323
member_descriptor          305
>>> objgraph.count('list')
266
>>> objgraph.typestats(objgraph.get_leaking_objects())
{'Gt': 1, 'AugLoad': 1, 'GtE': 1, 'Pow': 1, 'tuple': 2, 'AugStore': 1, 'Store': 1, 'Or': 1, 'IsNot': 1, 'RecursionError': 1, 'Div': 1, 'LShift': 1, 'Mod': 1, 'Add': 1, 'Invert': 1, 'weakref': 1, 'Not': 1, 'Sub': 1, 'In': 1, 'NotIn': 1, 'Load': 1, 'NotEq': 1, 'BitAnd': 1, 'FloorDiv': 1, 'Is': 1, 'RShift': 1, 'MatMult': 1, 'Eq': 1, 'Lt': 1, 'dict': 341, 'list': 7, 'Param': 1, 'USub': 1, 'BitOr': 1, 'BitXor': 1, 'And': 1, 'Del': 1, 'UAdd': 1, 'Mult': 1, 'LtE': 1}
Note that the preceding numbers of allocated objects displayed by objgraph are already high due to the fact that a lot of Python built-in functions and types are ordinary Python objects that live in the same process memory. Also, objgraph itself creates some objects that are included in this summary.

As we mentioned previously, objgraph allows you to create diagrams of memory usage patterns and cross-references that link all the objects in the given namespace. The most useful diagramming utilities of that library are objgraph.show_refs() and objgraph.show_backrefs(). They both accept a reference to the object being inspected and save a diagram image to file using the Graphviz package. Examples of such graphs are presented in Figure 2 and Figure 3. Here is the code that was used to create these diagrams:

from collections import Counter
import objgraph


def graph_references(*objects):
objgraph.show_refs(
objects,
filename='show_refs.png',
refcounts=True,
# additional filtering for the sake of brevity
too_many=5,
filter=lambda x: not isinstance(x, dict),
)
objgraph.show_backrefs(
objects,
filename='show_backrefs.png',
refcounts=True
)


if __name__ == "__main__":
quote = """
People who think they know everything are a
great annoyance to those of us who do.
"""
words = quote.lower().strip().split()
counts = Counter(words)
graph_references(words, quote, counts)

The following diagram shows the diagram of all references held by words and quote, and counts objects:

Figure 2: An example result of the show_refs() diagram from the graph_references() function

The following diagram shows only objects that hold references to the objects that we passed to the show_backrefs() function. They are called back references and are really helpful in finding objects that stop other objects from being deallocated:

Figure 3: An example result of the show_backrefs() diagram from the graph_references() function
A basic installation of the objgraph package does not install the Graphviz software that is required to generate diagrams in bitmap form. Without Graphviz, it will output diagrams in DOT format special graph description language. Graphviz is a very popular piece of software that is often found in operating system package repositories. You can also download it from https://www.graphviz.org/.

In order to show how objgraph may be used in practice, let's review an example of code that may create memory issues under certain versions of Python. As we already noted multiple times in this book, CPython has its own garbage collector that exists independently from its reference counting mechanism. It's not used for general purpose memory management, and its sole purpose is to solve the problem of cyclic references. In many situations, objects may reference each other in a way that would make it impossible to remove them using simple techniques based on tracking the number of references. Here is the simplest example:

x = [] 
y = [x] 
x.append(y)

Such a situation is visually presented in the following diagram. In the preceding case, even if all external references to x and y objects will be removed (for instance, by returning from the local scope of a function), these two objects cannot be removed through reference counting because there will always be two cross-references owned by these two objects. This is the situation where the Python garbage collector steps in. It can detect cyclic references to objects and trigger their deallocation if there are no other valid references to these objects outside of the cycle:

Figure 4: An example diagram of cyclic references between two objects

The real problem starts when at least one of the objects in such a cycle has the custom __del__() method defined. It is a custom deallocation handler that will be called when the object's reference count finally goes to zero. It can execute any arbitrary Python code and thus can also create new references to featured objects. This is the reason why the garbage collector prior to Python 3.4 could not break reference cycles if at least one of the objects provided the custom __del__() method implementation. PEP 442 introduced safe object finalization to Python and became a part of the language standard, starting from Python 3.4. Anyway, this may still be a problem for packages that worry about backwards compatibility and target a wide spectrum of Python interpreter versions. The following snippet of code allows you to show difference in behavior of the cyclic garbage collector in different Python versions:

import gc 
import platform 
import objgraph 
 
 
class WithDel(list): 
    """ list subclass with custom __del__ implementation """ 
    def __del__(self): 
        pass 
 
 
def main(): 
    x = WithDel() 
    y = [] 
    z = [] 
 
    x.append(y) 
    y.append(z) 
    z.append(x) 
 
    del x, y, z 
 
    print("unreachable prior collection: %s" % gc.collect()) 
    print("unreachable after collection: %s" % len(gc.garbage)) 
    print("WithDel objects count:        %s" % 
          objgraph.count('WithDel')) 
 
 
if __name__ == "__main__": 
    print("Python version: %s" % platform.python_version()) 
    print() 
    main() 

The following output of the preceding code, when executed under Python 3.3, shows that the cyclic garbage collector in the older versions of Python cannot collect objects that have the __del__() method defined:

$ python3.3 with_del.py 
Python version: 3.3.5
    
unreachable prior collection: 3
unreachable after collection: 1
WithDel objects count:        1  

With a newer version of Python, the garbage collector can safely deal with the finalization of objects, even if they have the __del__() method defined, as follows:

$ python3.5 with_del.py 
Python version: 3.5.1

unreachable prior collection: 3
unreachable after collection: 0
WithDel objects count:        0

Although custom finalization is no longer a memory threat in the latest Python releases, it still poses a problem for applications that need to work under different environments. As we mentioned earlier, the objgraph.show_refs() and objgraph.show_backrefs() functions allow you to easily spot problematic objects that take part in unbreakable reference cycles. For instance, we can easily modify the main() function to show all back references to the WithDel instances in order to see if we have leaking resources, as follows:

def main(): 
    x = WithDel() 
    y = [] 
    z = [] 
 
    x.append(y) 
    y.append(z) 
    z.append(x) 
 
    del x, y, z 
 
    print("unreachable prior collection: %s" % gc.collect()) 
    print("unreachable after collection: %s" % len(gc.garbage)) 
    print("WithDel objects count:        %s" % 
          objgraph.count('WithDel')) 
 
    objgraph.show_backrefs( 
        objgraph.by_type('WithDel'), 
        filename='after-gc.png' 
    ) 

Running the preceding example under Python 3.3 will result in a diagram, which shows that gc.collect() could not succeed in removing x, y, and z object instances.

Additionally, objgraph highlights all the objects that have the custom __del__() method defined in red to make spotting such issues easier:

Figure 5: A diagram showing an example of cyclic references that can't be picked by the Python garbage collector prior to version 3.4

In the next section, we will discuss C code memory leaks.