A closer look at Python/C API

Since we know how to properly package, compile, and install custom C extensions and we are sure that it works as expected, now it is the right time to discuss our code in detail.

The extensions module starts with the following single C preprocessor directive that includes the Python.h header file:

#include <Python.h> 

This pulls the whole Python/C API and is everything you need to include to be able to write your extensions. In more realistic cases, your code will require a lot more preprocessor directives to benefit from the C standard library functions or to integrate other source files. Our example was simple, so no more directives were required.

Next, we have the core of our module as follows:

long long fibonacci(unsigned int n) { 
    if (n < 2) { 
        return 1; 
    } else { 
        return fibonacci(n - 2) + fibonacci(n - 1); 
    } 
} 

The preceding fibonacci() function is the only part of our code that does something useful. It is pure C implementation that Python by default can't understand. The rest of our example will create the interface layer that will expose it through the Python/C API.

The first step of exposing this code to Python is the creation of the C function that is compatible with the CPython interpreter. In Python, everything is an object. This means that C functions called in Python also need to return real Python objects. Python/C APIs provide a PyObject type and every callable must return the pointer to it. The signature of our function is as follows:

static PyObject* fibonacci_py(PyObject* self, PyObject* args)

Note that the preceding signature does not specify the exact list of arguments but only PyObject* args will hold the pointer to the structure that contains the tuple of the provided values. The actual validation of the argument list must be performed inside the function body and this is exactly what fibonacci_py() does. It parses the args argument list assuming it is the single unsigned int type and uses that value as an argument to the fibonacci() function to retrieve the Fibonacci sequence element as shown in the following code:

static PyObject* fibonacci_py(PyObject* self, PyObject* args) { 
    PyObject *result = NULL; 
    long n; 
 
    if (PyArg_ParseTuple(args, "l", &n)) { 
        result = Py_BuildValue("L", fibonacci((unsigned int)n)); 
    } 
 
    return result; 
} 
The preceding example function has a serious bug, which the eyes of an experienced developer should spot very easily. Try to find it as an exercise in working with C extensions. For now, we leave it as it is for the sake of brevity. We will try to fix it later when discussing details of dealing with errors in the Exception handling section.

The "l" string in the PyArg_ParseTuple(args, "l", &n) call means that we expect args to contain only a single long value. In case of failure, it will return NULL and store information about the exception in the per thread interpreter state. The details of exception handling will be described a bit later in the Exception handling section.

The actual signature of the parsing function is int PyArg_ParseTuple(PyObject *args, const char *format, ...) and what goes after the format string is a variable length list of arguments that represents parsed value output (as pointers). This is analogous to how the scanf() function from the C standard library works. If our assumption fails and the user provides an incompatible arguments list, then PyArg_ParseTuple() will raise the proper exception. This is a very convenient way to encode function signatures once you get used to it but has a huge downside when compared to plain Python code. Such Python call signatures implicitly defined by the PyArg_ParseTuple() calls cannot be easily inspected inside the Python interpreter. You need to remember this fact when using the code provided as extensions.

As already said, Python expects objects to be returned from callables. This means that we cannot return a raw long value obtained from the fibonacci() function as a result of fibonacci_py(). Such an attempt would not even compile and there is no automatic casting of basic C types to Python objects. The Py_BuildValue(*format, ...) function must be used instead. It is the counterpart of PyArg_ParseTuple() and accepts a similar set of format strings. The main difference is that the list of arguments is not a function output but an input, so actual values must be provided instead of pointers.

After fibonacci_py() is defined, most of the heavy work is done. The last step is to perform module initialization and add metadata to our function that will make usage a bit simpler for the users. This is the boilerplate part of our extension code that for some simple examples, such as this one, can take more place than the actual functions that we want to expose. In most cases, it simply consists of some static structures and one initialization function that will be executed by the interpreter on module import.

At first, we create a static string that will be the content of the Python docstring for the fibonacci_py() function as follows:

static char fibonacci_docs[] = 
    "fibonacci(n): Return nth Fibonacci sequence number " 
    "computed recursively\n"; 

Note that this could be inlined somewhere later in fibonacci_module_methods, but it is a good practice to have docstrings separated and stored in close proximity to the actual function definition that they refer to.

The next part of our definition is the array of the PyMethodDef structures that define methods (functions) that will be available in our module. This structure contains exactly the four following fields:

Such an array must always end with a sentinel value of {NULL, NULL, 0, NULL}. This sentinel value simply indicates the end of the structure. In our simple case, we created the static PyMethodDef fibonacci_module_methods[] array that contains only two elements (including sentinel value) as follows:

static PyMethodDef fibonacci_module_methods[] = { 
    {"fibonacci", (PyCFunction)fibonacci_py, 
     METH_VARARGS, fibonacci_docs}, 
    {NULL, NULL, 0, NULL} 
}; 

And this is how the first entry maps to the PyMethodDef structure:

When an array of function definitions is complete, we can create another structure that contains the definition of the whole module. It is described using the PyModuleDef type and contains multiple fields. Some of them are useful only for more complex scenarios, where fine-grained control over the module initialization process is required. Here, we are interested only in the first five of them:

The other fields are explained in detail in the official Python documentation (refer to https://docs.python.org/3/c-api/module.html) but are not needed in our example extension. They should be set to NULL if not required and they will be initialized with that value implicitly when not specified. This is why our module description contained in the fibonacci_module_definition variable can take the following simple form:

static struct PyModuleDef fibonacci_module_definition = { 
    PyModuleDef_HEAD_INIT, 
    "fibonacci", 
    "Extension module that provides fibonacci sequence function", 
    -1, 
    fibonacci_module_methods 
}; 

The last piece of code that crowns our work is the module initialization function. This must follow a very specific naming convention, so the Python interpreter can easily pick it when the dynamic/shared library is loaded. It should be named PyInit_<name>, where <name> is the name of your module. So it is exactly the same string that was used as the m_base field in the PyModuleDef definition and as the first argument of the setuptools.Extension() call. If you don't require a complex initialization process for the module, it takes a very simple form, exactly like in our example:

PyMODINIT_FUNC PyInit_fibonacci(void) { 
    return PyModule_Create(&fibonacci_module_definition); 
} 

The PyMODINIT_FUNC macro is a preprocessor macro that will declare the return type of this initialization function as PyObject* and add any special linkage declarations if required by the platform.

In the next section, we will see how we can call and bind conventions.