When the slow function is found, it is sometimes necessary to do more profiling work that tests just a part of the program. This is done by manually instrumenting a part of the code in a speed test.
For instance, the cProfile module can be used in a form of decorator, as in following example:
import time
import tempfile
import cProfile
import pstats
def profile(column='time', list=3):
def parametrized_decorator(function):
def decorated(*args, **kw):
s = tempfile.mktemp()
profiler = cProfile.Profile()
profiler.runcall(function, *args, **kw)
profiler.dump_stats(s)
p = pstats.Stats(s)
print("=" * 5, f"{function.__name__}() profile", "=" * 5)
p.sort_stats(column).print_stats(list)
return decorated
return parametrized_decorator
def medium():
time.sleep(0.01)
@profile('time')
def heavy():
for i in range(100):
medium()
medium()
time.sleep(2)
@profile('time')
def main():
for i in range(2):
heavy()
if __name__ == '__main__':
main()
This approach allows for testing only selected parts of the application and sharpens the statistics output. This way, you can collect many isolated and precisely targeted profiles on a single application run, as follows:
$ python3 cprofile_decorator.py
===== heavy() profile =====
Wed Apr 10 03:11:53 2019 /var/folders/jy/wy13kx0s7sb1dx2rfsqdvzdw0000gq/T/tmpyi2wejm5
403 function calls in 4.330 seconds
Ordered by: internal time
List reduced from 4 to 3 due to restriction <3>
ncalls tottime percall cumtime percall filename:lineno(function)
201 4.327 0.022 4.327 0.022 {built-in method time.sleep}
200 0.002 0.000 2.326 0.012 cprofile_decorator.py:24(medium)
1 0.001 0.001 4.330 4.330 cprofile_decorator.py:28(heavy)
===== heavy() profile =====
Wed Apr 10 03:11:57 2019 /var/folders/jy/wy13kx0s7sb1dx2rfsqdvzdw0000gq/T/tmp8mubgwjw
403 function calls in 4.328 seconds
Ordered by: internal time
List reduced from 4 to 3 due to restriction <3>
ncalls tottime percall cumtime percall filename:lineno(function)
201 4.324 0.022 4.324 0.022 {built-in method time.sleep}
200 0.002 0.000 2.325 0.012 cprofile_decorator.py:24(medium)
1 0.001 0.001 4.328 4.328 cprofile_decorator.py:28(heavy)
===== main() profile =====
Wed Apr 10 03:11:57 2019 /var/folders/jy/wy13kx0s7sb1dx2rfsqdvzdw0000gq/T/tmp6c0y2oxj
62 function calls in 8.663 seconds
Ordered by: internal time
List reduced from 27 to 3 due to restriction <3>
ncalls tottime percall cumtime percall filename:lineno(function)
1 8.662 8.662 8.662 8.662 {method 'enable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 {built-in method posix.lstat}
8 0.000 0.000 0.000 0.000 /usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/random.py:224(_randbelow)
But at this stage, having a list of callees is probably not interesting, as the function has already been pointed out as the one to optimize. The only interesting information is to know how fast it is, and then enhance it.
timeit is a useful module that provides a simple way to measure the execution time of a small code snippet, with the best underlying timer the host system provides (time.time or time.clock), as shown in the following example:
>>> from myapp import light >>> import timeit >>> t = timeit.Timer('main()') >>> t.timeit(number=5) 10000000 loops, best of 3: 0.0269 usec per loop 10000000 loops, best of 3: 0.0268 usec per loop 10000000 loops, best of 3: 0.0269 usec per loop 10000000 loops, best of 3: 0.0268 usec per loop 10000000 loops, best of 3: 0.0269 usec per loop 5.6196951866149902
This module allows you to repeat the call multiple times, and can be easily used to try out isolated code snippets. This is very useful outside the application context—in a prompt, for instance—but is not really handy to use within an existing application.
The results of timeit should be used with caution. It is a very good tool to objectively compare two short snippets of code, but it also allows you to easily make dangerous mistakes that will lead you to confusing conclusions. Here, for example, is the comparison of two innocent snippets of code with the timeit module that could make you think that string concatenation by addition is faster than the str.join() method:
$ python3 -m timeit -s 'a = map(str, range(1000))' '"".join(a)' 1000000 loops, best of 3: 0.497 usec per loop $ python3 -m timeit -s 'a = map(str, range(1000)); s=""' 'for i in a: s += i' 10000000 loops, best of 3: 0.0808 usec per loop
From Chapter 3, Modern Syntax Elements - Below the Class Level, we know that string concatenation by addition is not a good pattern. Despite some minor CPython micro-optimizations that were designed exactly for such use cases, it will eventually lead to quadratic runtime. The problem lies in nuances about the setup argument of the timeit() call (or the -s parameter in the command line) and how the range in Python 3 works. I won't discuss the details of the problem, but will leave it to you as an exercise. Anyway, here is the correct way to compare string concatenation in addition to the str.join() idiom under Python 3:
$ python3 -m timeit -s 'a = [str(i) for i in range(10000)]' 's="".join(a)' 10000 loops, best of 3: 128 usec per loop $ python3 -m timeit -s 'a = [str(i) for i in range(10000)]' ' s = "" for i in a: s += i ' 1000 loops, best of 3: 1.38 msec per loop
Profiling memory usage is explained in the next section.