Reference counting

Finally, we come to the important topic of memory management in Python. Python has its own garbage collector, but it is designed only to solve the issue of cyclic references in the reference counting algorithm. Reference counting is the primary method of managing the deallocation of objects that are no longer needed.

The Python/C API documentation introduces ownership of references to explain how it deals with the deallocation of objects. Objects in Python are never owned and they are always shared. The actual creation of objects is managed by Python's memory manager. It is the component of CPython interpreter that is the only one responsible for allocating and deallocating memory for objects that are stored in a private heap. What can be owned instead is a reference to the object.

Every object in Python that is represented by a reference (PyObject* pointer) has an associated reference count. When it goes to zero, it means that no one holds any valid references to that object and the deallocator associated with its type can be called. Python/C API provides two macros for increasing and decreasing reference counts—Py_INCREF() and Py_DECREF(). But before we discuss their details, we need to understand the following terms related to reference ownership:

Keeping an eye on the reference counts is one of the hardest things when writing complex extensions. Some of the non-obvious issues may not be noticed until the code is run in multithreaded setup.

The other common problem is caused by the very nature of Python's object model and the fact that some functions return borrowed references. When the reference count goes to 0, the deallocation function is executed. For user-defined classes, it is possible to define a __del__() method that will be called at that moment. This can be any Python code and it is possible that it will affect other objects and their reference counts. The official Python documentation gives the following example of code that may be affected by this problem:

void bug(PyObject *list) { 
    PyObject *item = PyList_GetItem(list, 0); 
 
    PyList_SetItem(list, 1, PyLong_FromLong(0L)); 
    PyObject_Print(item, stdout, 0); /* BUG! */ 
}

It looks completely harmless, but the problem is in fact that we cannot know what elements the list object contains. When PyList_SetItem() sets a new value on the list[1] index, the ownership of the object that was previously stored at that index is disposed of. If it was the only existing reference, the reference count will become 0 and the object may become deallocated. It is possible that it was some user-defined class with custom implementation of the __del__() method. A serious issue will occur if in the result of such __del__() execution the item[0] will be removed from the list. Note that PyList_GetItem() returns a borrowed reference! It does not call Py_INCREF() before returning a reference. So in that code, it is possible that PyObject_Print() will be called with a reference to an object that no longer exists. This will cause a segmentation fault and crash the Python interpreter.

The proper approach is to protect borrowed references for the whole time that we need them because there is a possibility that any call in between may cause deallocation of that object. This can happen even if they are seemingly unrelated, as shown in the following code:

void no_bug(PyObject *list) { 
    PyObject *item = PyList_GetItem(list, 0); 
 
    Py_INCREF(item); 
    PyList_SetItem(list, 1, PyLong_FromLong(0L)); 
    PyObject_Print(item, stdout, 0); 
    Py_DECREF(item); 
}

In the next section, we will learn how to write extensions using Cython instead of bare Python/C API.