Other profilers: OProfile and gprof

These two statistical profilers predate perf. They are both subsets of the functionality of perf, but they are still quite popular. I will mention them only briefly.

OProfile is a kernel profiler that started out in 2002. Originally, it had its own kernel sampling code, but recent versions use the perf_events infrastructure for that purpose. There is more information about it at http://oprofile.sourceforge.net. OProfile consists of a kernel-space component and a user space daemon and analysis commands.

OProfile needs these two kernel options to be enabled:

If you are using the Yocto Project, the user-space components are installed as part of the tools-profile image feature. If you are using Buildroot, the package is enabled by BR2_PACKAGE_OPROFILE.

You can collect samples by using this command:

# operf <program>

Wait for your application to finish, or press Ctrl + C, to stop profiling. The profile data is stored in <cur-dir>/oprofile_data/samples/current.

Use opreport to generate a profile summary. There are various options which are documented in the OProfile manual.

gprof is part of the GNU toolchain and was one of the earliest open source code profiling tools. It combines compile-time instrumentation and sampling techniques, using a 100 Hz sample rate. It has the advantage that it does not require kernel support.

To prepare a program for profiling with gprof, you add -pg to the compile and link flags, which injects code that collects information about the call tree into the function preamble. When you run the program, samples are collected and stored in a buffer, which is written to a file named gmon.out, when the program terminates.

You use the gprof command to read the samples from gmon.out and the debug information from a copy of the program.

As an example, if you wanted to profile the BusyBox grep applet. you would rebuild BusyBox with the -pg option, run the command, and view the results:

# busybox grep "linux" *
# ls -l gmon.out
-rw-r--r-- 1 root root   473 Nov 24 14:07 gmon.out

Then, you would analyze the captured samples on either the target or the host, using the following:

# gprof busybox
Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

  %   cumulative   self              self     total
 time   seconds   seconds    calls  Ts/call  Ts/call  name
 0.00     0.00     0.00      688     0.00     0.00  xrealloc
 0.00     0.00     0.00      345     0.00     0.00  bb_get_chunk_from_file
 0.00     0.00     0.00      345     0.00     0.00  xmalloc_fgetline
 0.00     0.00     0.00       6      0.00     0.00  fclose_if_not_stdin
 0.00     0.00     0.00       6      0.00     0.00  fopen_for_read
 0.00     0.00     0.00       6      0.00     0.00  grep_file
[...]
    Call graph

granularity: each sample hit covers 2 byte(s) no time propagated

index  % time    self  children    called     name
                 0.00    0.00      688/688  bb_get_chunk_from_file [2]
[1]      0.0     0.00    0.00      688         xrealloc [1]
----------------------------------------------------------
                 0.00    0.00      345/345  xmalloc_fgetline [3]
[2]      0.0     0.00    0.00      345      bb_get_chunk_from_file [2]
                 0.00    0.00      688/688  xrealloc [1]
---------------------------------------------------------
                 0.00    0.00      345/345  grep_file [6]
[3]      0.0     0.00    0.00     345       xmalloc_fgetline [3]
                 0.00    0.00     345/345   bb_get_chunk_from_file [2]
--------------------------------------------------------
                 0.00    0.00       6/6     grep_main [12]
[4]      0.0     0.00    0.00       6       fclose_if_not_stdin [4]
[...]

Note that the execution times are all shown as zero, because most of the time was spent in system calls, which are not traced by gprof.