The big O notation

The most canonical method of defining function complexity is the big O notation. This metric defines how an algorithm is affected by the size of input data. For instance, does an algorithm scale linearly with the size of the input data, or quadratically?

Manually calculating the big O notation for an algorithm is the best approach when trying to achieve an overview of how its performance is related to the size of input data. Knowing the complexity of your application's components gives you the ability to detect and focus on aspects that will significantly slow down code.

To measure the big O notation, all constants and low-order terms are removed in order to focus on the portion that really matters when the size of input data grows. The idea is to try and categorize the algorithm in one of the following categories, even if it is an approximation:

Notation	Type
O(1)	Constant; does not depend on the input data
O(n)	Linear; will grow as n grows
O(n log n)	Quasi linear
O(n²)	Quadratic complexity
O(n³)	Cubic complexity
O(n!)	Factorial complexity

For instance, we already know from Chapter 3, Modern Syntax Elements - Below the Class Level, that a dict lookup has an average complexity of O(1); it is considered constant, regardless of how many elements are in the dict. However, looking through a list for a particular item is O(n).

To better understand this concept, let's take look at the following example:

>>> def function(n):
...     for i in range(n):
...         print(i)
...

In the preceding function, the print statement will be executed n times. Loop speed will depend on n, so its complexity that's expressed using the big O notation will be O(n).

If the function has conditions, the correct notation to keep is the highest one, as follows:

>>> def function(n, print_count=False):
...     if print_count:
...         print(f'count: {n}')
...     else:
...         for i in range(n):
...             print(i)
...

In this example, the function could be O(1) or O(n), depending on the value of the print_count argument. The worst case is O(n), so the whole function complexity is O(n).

When discussing complexity expressed in big O notation, we usually review the worst-case scenario. While this is the best method for defining complexity when comparing two independent algorithms, it may not be the best approach in every practical situation. Many algorithms change the runtime performance, depending on the statistical characteristic of input data, or amortize the cost of worst-case operations by doing clever tricks. This is why, in many cases, it may be better to review your implementation in terms of average complexity or amortized complexity.

For example, take a look at the operation of appending a single element to Python's list type instance. We know that list in CPython uses an array with overallocation for the internal storage instead of linked lists (see Chapter 3, Modern Syntax Elements - Below the Class Level, for more information). If an array is already full, appending a new element requires the allocation of a new array and to copy all existing elements (references) to a new area in the memory. If we look at this from the point of worst-case complexity, it is clear that the list.append() method has O(n) complexity, which is a bit expensive compared to a typical implementation of the linked list structure. We also know, however, that the CPython list type implementation uses the mechanism of overallocation (it allocates more space than is required at a given time) to mitigate the complexity of occasional reallocation. If we evaluate this complexity over a sequence of operations, we will see that the average complexity of list.append() is O(1)—and this is actually a great result.

When solving problems, we often already know about our input data in great detail, including its size or statistical distribution. When optimizing an application, it is always worth using every bit of available knowledge about the input data. This is where another problem of worst-case complexity can start to show up. The big O notation is intended to analyze the limiting behavior of a function when input tends toward large values or infinity, rather than offer a reliable performance approximation for real-life data. Asymptotic notation is a great tool for defining the growth rate of a function, but it won't give a direct answer to the simple question of which implementation will take the least time. Worst-case complexity dumps all the details about both an implementation and its data characteristics to show you how your program will behave asymptotically. It works for arbitrarily large inputs that you may not even need to consider.

For instance, let's assume that you have a problem with data consisting of n independent elements. Let's also suppose that you know two different ways of solving this problem: program A and program B. You know that program A requires 100n² operations to complete the task, and program B requires 5n³ operations to provide a solution. Which one would you choose?

When speaking about very large inputs, program A is the better choice because it behaves better asymptotically. It has O(n²) complexity compared to program B's O(n³) complexity. However, by solving a simple 100 n² > 5 n³ inequality, we can find that program B will take fewer operations when n is less than 20. Therefore, if we know a bit more about our input bounds, we can make slightly better decisions.

In the next section, we will take a look at how to reduce complexity by selecting appropriate data structures.