Finally, we come to the important topic of memory management in Python. Python has its own garbage collector, but it is designed only to solve the issue of cyclic references in the reference counting algorithm. Reference counting is the primary method of managing the deallocation of objects that are no longer needed.
The Python/C API documentation introduces ownership of references to explain how it deals with the deallocation of objects. Objects in Python are never owned and they are always shared. The actual creation of objects is managed by Python's memory manager. It is the component of CPython interpreter that is the only one responsible for allocating and deallocating memory for objects that are stored in a private heap. What can be owned instead is a reference to the object.
Every object in Python that is represented by a reference (PyObject* pointer) has an associated reference count. When it goes to zero, it means that no one holds any valid references to that object and the deallocator associated with its type can be called. Python/C API provides two macros for increasing and decreasing reference counts—Py_INCREF() and Py_DECREF(). But before we discuss their details, we need to understand the following terms related to reference ownership:
- Passing of ownership: Whenever we say that the function passes the ownership over a reference, it means that it has already increased the reference count and it is the responsibility of the caller to decrease the count when the reference to the object is no longer needed. Most of the functions that return the newly created objects, such as Py_BuildValue, are doing that. If that object is going to be returned from our function to another caller, then the ownership is passed again. We do not decrease the reference count in that case because it is no longer our responsibility. This is why the fibonacci_py() function does not call Py_DECREF() on the result variable.
- Borrowed references: The borrowing of references happens when the function receives a reference to some Python object as an argument. The reference count for such a reference should never be decreased in that function unless it was explicitly increased in its scope. In our fibonacci_py() function, the self and args arguments are such borrowed references and thus we do not call PyDECREF() on them. Some of the Python/C API functions may also return borrowed references. The notable examples are PyTuple_GetItem() and PyList_GetItem(). It is often said that such references are unprotected. There is no need to dispose of their ownership unless they will be returned as a function's return value. In most cases, extra care should be taken if we use such borrowed references as arguments of other Python/C API calls. It may be necessary for some circumstances to additionally protect such references with separate Py_INCREF() before using it as an argument to other functions and then calling Py_DECREF() when it is no longer needed.
- Stolen references: It is also possible for the Python/C API function to steal the reference instead of borrowing it when provided as a call argument. This is the case of exactly two functions—PyTuple_SetItem() and PyList_SetItem(). They fully take over the responsibility of the reference passed to them. They do not increase the reference count by themselves but will call Py_DECREF() when the reference is no longer needed.
Keeping an eye on the reference counts is one of the hardest things when writing complex extensions. Some of the non-obvious issues may not be noticed until the code is run in multithreaded setup.
The other common problem is caused by the very nature of Python's object model and the fact that some functions return borrowed references. When the reference count goes to 0, the deallocation function is executed. For user-defined classes, it is possible to define a __del__() method that will be called at that moment. This can be any Python code and it is possible that it will affect other objects and their reference counts. The official Python documentation gives the following example of code that may be affected by this problem:
void bug(PyObject *list) { PyObject *item = PyList_GetItem(list, 0); PyList_SetItem(list, 1, PyLong_FromLong(0L)); PyObject_Print(item, stdout, 0); /* BUG! */ }
It looks completely harmless, but the problem is in fact that we cannot know what elements the list object contains. When PyList_SetItem() sets a new value on the list[1] index, the ownership of the object that was previously stored at that index is disposed of. If it was the only existing reference, the reference count will become 0 and the object may become deallocated. It is possible that it was some user-defined class with custom implementation of the __del__() method. A serious issue will occur if in the result of such __del__() execution the item[0] will be removed from the list. Note that PyList_GetItem() returns a borrowed reference! It does not call Py_INCREF() before returning a reference. So in that code, it is possible that PyObject_Print() will be called with a reference to an object that no longer exists. This will cause a segmentation fault and crash the Python interpreter.
The proper approach is to protect borrowed references for the whole time that we need them because there is a possibility that any call in between may cause deallocation of that object. This can happen even if they are seemingly unrelated, as shown in the following code:
void no_bug(PyObject *list) { PyObject *item = PyList_GetItem(list, 0); Py_INCREF(item); PyList_SetItem(list, 1, PyLong_FromLong(0L)); PyObject_Print(item, stdout, 0); Py_DECREF(item); }
In the next section, we will learn how to write extensions using Cython instead of bare Python/C API.