When looking at the entire system, a good place to start is with a simple tool like top
, which gives you an overview very quickly. It shows you how much memory is being used, which processes are eating CPU cycles, and how this is spread across different cores and time.
If top
shows that a single application is using up all the CPU cycles in user space then you can profile that application using perf
.
If two or more processes have a high CPU usage, there is probably something that is coupling them together, perhaps data communication. If a lot of cycles are spent in system calls or handling interrupts, then there may be an issue with the kernel configuration or with a device driver. In either case you need to start by taking a profile of the whole system, again using perf
.
If you want to find out more about the kernel and the sequencing of events there, you would use Ftrace
or LTTng
.
There could be other problems that top
will not help you with. If you have multi-threaded code and there are problems with lockups, or if you have random data corruption then Valgrind plus the Helgrind plug-in might be helpful. Memory leaks also fit into this category: I covered memory-related diagnosis in Chapter 11, Managing Memory.