top
is a simple tool that doesn't require any special kernel options or symbol tables. There is a basic version in BusyBox, and a more functional version in the procps
package which is available in the Yocto Project and Buildroot. You may also want to consider using htop
which is functionally similar to top
but has a nicer user interface (some people think).
To begin with, focus on the summary line of top
, which is the second line if you are using BusyBox and the third line if using procps
top
. Here is an example, using BusyBox top
:
Mem: 57044K used, 446172K free, 40K shrd, 3352K buff, 34452K cached CPU: 58% usr 4% sys 0% nic 0% idle 37% io 0% irq 0% sirq Load average: 0.24 0.06 0.02 2/51 105 PID PPID USER STAT VSZ %VSZ %CPU COMMAND 105 104 root R 27912 6% 61% ffmpeg -i track2.wav [...]
The summary line shows the percentage of time spent running in various states, as shown in this table:
procps |
Busybox | |
---|---|---|
|
|
User space programs with default nice value |
|
|
Kernel code |
|
|
User space programs with non-default nice value |
|
|
Idle |
|
|
I/O wait |
|
|
Hardware interrupts |
|
|
Software interrupts |
|
|
Steal time: only relevant in virtualized environments |
In the preceding example, almost all of the time (58%) is spent in user mode, with a small amount (4%) in system mode, so this is a system that is CPU-bound in user space. The first line after the summary shows that just one application is responsible: ffmpeg
. Any efforts towards reducing CPU usage should be directed there.
Here is another example:
Mem: 13128K used, 490088K free, 40K shrd, 0K buff, 2788K cached CPU: 0% usr 99% sys 0% nic 0% idle 0% io 0% irq 0% sirq Load average: 0.41 0.11 0.04 2/46 97 PID PPID USER STAT VSZ %VSZ %CPU COMMAND 92 82 root R 2152 0% 100% cat /dev/urandom [...]
This system is spending almost all of the time in kernel space, as a result of cat
reading from /dev/urandom
. In this artificial, case, profiling cat
by itself would not help, but profiling the kernel functions that cat
calls might be.
The default view of top
shows only processes, so the CPU usage is the total of all the threads in the process. Press H to see information for each thread. Likewise, it aggregates the time across all CPUs. If you are using procps top
, you can see a summary per CPU by pressing the 1 key.
Imagine that there is a single user space process taking up most of the time and look at how to profile that.
You can profile an application just by using GDB to stop it at arbitrary intervals and see what it is doing. This is the poor man's profiler. It is easy to set up and it is one way of gathering profile data.
The procedure is simple and explained here:
gdbserver
(for a remote debug) or gbd (for a native debug). The process stops.backtrace GDB
command to see the call stack.continue
so that the program resumes.If you repeat steps 2 to 4 several times, you will quickly get an idea of whether it is looping or making progress and, if you repeat them often enough, you will get an idea of where the hotspots in the code are.
There is a whole web page dedicated to the idea at http://poormansprofiler.org, together with scripts which make it a little easier. I have used this technique many times over the years with various operating systems and debuggers.
This is an example of statistical profiling, in which you sample the program state at intervals. After a number of samples, you begin to learn the statistical likelihood of the functions being executed. It is surprising how few you really need. Other statistical profilers are perf record
, OProfile
, and gprof
.
Sampling using a debugger is intrusive because the program is stopped for a significant period while you collect the sample. Other tools can do that with much lower overhead.
I will now consider how to use perf
to do statistical profiling.