Debugging kernel code

Debugging application code helps you gain insight into the way code works and what is happening when it misbehaves and you can do the same with the kernel, with some limitations.

You can use kgdb for source level debugging, in a manner similar to remote debugging with gdbserver. There is also a self-hosted kernel debugger, kdb, that is handy for lighter weight tasks such as seeing if an instruction is executed and getting the backtrace to find out how it got there. Finally, there are kernel oops messages and panics, which tell you a lot about the cause of a kernel exception.

Debugging kernel code with kgdb

When looking at kernel code using a source debugger, you must remember that the kernel is a complex system, with real-time behaviors. Don't expect debugging to be as easy as it is for applications. Stepping through code that changes the memory mapping or switches context is likely to produce odd results.

kgdb is the name given to the kernel GDB stubs that have been part of mainline Linux for many years now. There is a user manual in the kernel DocBook and you can find an online version at https://www.kernel.org/doc/htmldocs/kgdb/index.html.

The widely supported way to connect to kgdb is over the serial interface, which is usually shared with the serial console, and so this implementation is called kgdboc, meaning kgdb over console. To work, it requires a platform tty driver that supports I/O polling instead of interrupts, since kgdb has to disable interrupts when communicating with GDB. A few platforms support kgdb over USB and there have been versions that work over Ethernet but, unfortunately, none of those have found their way into mainline Linux.

The same caveats about optimization and stack frames apply to the kernel, with the limitation that the kernel is written to assume an optimization level of at least -O1. You can override the kernel compile flags by setting KCGLAGS before running make.

These, then, are the kernel configuration options you will need for kernel debugging:

CONFIG_DEBUG_INFO is in the Kernel hacking | Compile-time checks and compiler options | Compile the kernel with debug info menu
CONFIG_FRAME_POINTER may be an option for your architecture, and is in the Kernel hacking | Compile-time checks and compiler options | Compile the kernel with frame pointers menu
CONFIG_KGDB is in the Kernel hacking | KGDB: kernel debugger menu
CONFIG_KGDB_SERIAL_CONSOLE is in the Kernel hacking | KGDB: kernel debugger | KGDB: use kgdb over the serial console menu

In addition to the uImage or zImage compressed kernel image, you will need the kernel image in ELF object format so that GDB can load the symbols into memory. That is the file called vmlinux that is generated in the directory where Linux is built. In the Yocto Project, you can request that a copy be included in the target image, which is convenient for this and other debug tasks. It is built into a package named kernel-vmlinux, which you can install like any other, for example by adding it to the IMAGE_INSTALL_append list. The file is put into the boot directory, with a name like this:

boot/vmlinux-3.14.26ltsi-yocto-standard

In Buildroot, you will find vmlinux in the directory where the kernel was built, which is in output/build/linux-<version string>/vmlinux.

A sample debug session

The best way to show you how it works is with a simple example.

You need to tell kgdb which serial port to use, either through the kernel command line or at runtime via sysfs. For the first option, add kgdboc=<tty>,<baud rate> to the command line, as shown:

kgdboc=ttyO0,115200

For the second option, boot the device up and write the terminal name to the /sys/module/kgdboc/parameters/kgdboc file, as shown:

# echo ttyO0 > /sys/module/kgdboc/parameters/kgdboc

Note that you cannot set the baud rate in this way. If it is the same tty as the console then it is set already, if not use stty or a similar program.

Now you can start GDB on the host, selecting the vmlinux file that matches the running kernel:

$ arm-poky-linux-gnueabi-gdb ~/linux/vmlinux

GDB loads the symbol table from vmlinux and waits for further input.

Next, close any terminal emulator that is attached to the console: you are about to use it for GDB and, if both are active at the same time, some of the debug strings might get corrupted.

Now, you can return to GDB and attempt to connect to kgdb. However, you will find that the response you get from target remote at this time is unhelpful:

(gdb) set remotebaud 115200
(gdb) target remote /dev/ttyUSB0
Remote debugging using /dev/ttyUSB0
Bogus trace status reply from target: qTStatus

The problem is that kgdb is not listening for a connection at this point. You need to interrupt the kernel before you can enter into an interactive GDB session with it. Unfortunately, just typing Ctrl + C in GDB, as you would with an application, does not work. You have to force a trap into the kernel by launching another shell on the target, via ssh, for example, and writing a g to /proc/sysrq-trigger on the target board:

# echo g > /proc/sysrq-trigger

The target stops dead at this point. Now you can connect to kgdb via the serial device at the host end of the cable:

(gdb) set remotebaud 115200
(gdb) target remote /dev/ttyUSB0
Remote debugging using /dev/ttyUSB0
0xc009a59c in arch_kgdb_breakpoint ()

At last, GDB is in charge. You can set breakpoints, examine variables, look at backtraces, and so on. As an example, set a break on sys_sync, as follows:

(gdb) break sys_sync
Breakpoint 1 at 0xc0128a88: file fs/sync.c, line 103.
(gdb) c
Continuing.

Now the target comes back to life. Typing sync on the target calls sys_sync and hits the breakpoint.

[New Thread 87]
[Switching to Thread 87]

Breakpoint 1, sys_sync () at fs/sync.c:103

If you have finished the debug session and want to disable kgdboc, just set the kgdboc terminal to null:

# echo "" >  /sys/module/kgdboc/parameters/kgdboc

Debugging early code

The preceding example works in cases where the code you are interested in is executed when the system is fully booted. If you need to get in early, you can tell the kernel to wait during boot by adding kgdbwait to the command line, after the kgdboc option:

kgdboc=ttyO0,115200 kgdbwait

Now, when you boot, you will see this on the console:

    1.103415] console [ttyO0] enabled
[    1.108216] kgdb: Registered I/O driver kgdboc.
[    1.113071] kgdb: Waiting for connection from remote gdb...

At this point, you can close the console and connect from GDB in the usual way.

Debugging modules

Debugging kernel modules presents an additional challenge because the code is relocated at runtime and so you need to find out at what address it resides. The information is presented via sysfs. The relocation addresses for each section of the module are stored in /sys/module/<module name>/sections. Note that, since ELF sections begin with a dot, '.', they appear as hidden files and you will have to use ls -a if you want to list them. The important ones are .text, .data, and .bss.

Take as an example a module named mbx:

# cat /sys/module/mbx/sections/.text
0xbf000000
# cat /sys/module/mbx/sections/.data
0xbf0003e8
# cat /sys/module/mbx/sections/.bss
0xbf0005c0

Now you can use these numbers in GDB to load the symbol table for the module at those addresses:

(gdb) add-symbol-file /home/chris/mbx-driver/mbx.ko 0xbf000000 \
-s .data 0xbf0003e8 -s .bss 0xbf0005c0
add symbol table from file "/home/chris/mbx-driver/mbx.ko" at
  .text_addr = 0xbf000000
  .data_addr = 0xbf0003e8
  .bss_addr = 0xbf0005c0

Everything should now work as normal: you can set breakpoints and inspect global and local variables in the module just as you can in vmlinux:

(gdb) break mbx_write

Breakpoint 1 at 0xbf00009c: file /home/chris/mbx-driver/mbx.c, line 93.

(gdb) c
Continuing.

Then, force the device driver to call mbx_write and it will hit the breakpoint:

Breakpoint 1, mbx_write (file=0xde7a71c0, buffer=0xadf40 "hello\n\n",
    length=6, offset=0xde73df80)
    at /home/chris/mbx-driver/mbx.c:93

Debugging kernel code with kdb

Although kdb does not have the features of kgdb and GDB, it does have its uses and, being self-hosted, there are no external dependencies to worry about. kdb has a simple command-line interface which you can use on a serial console. You can use it to inspect memory, registers, process lists, dmesg, and even set breakpoints to stop in a certain location.

To configure kgd for access via a serial console, enable kgdb as shown previously and then enable this additional option:

CONFIG_KGDB_KDB, which is in the KGDB: Kernel hacking | kernel debugger | KGDB_KDB: include kdb frontend for kgdb menu

Now, when you force the kernel to a trap, instead of entering into a GDB session, you will see the kdb shell on the console:

# echo g > /proc/sysrq-trigger
[   42.971126] SysRq : DEBUG

Entering kdb (current=0xdf36c080, pid 83) due to Keyboard Entry
kdb>

There are quite a few things you can do in the kdb shell. The help command will print all of the options. Here is an overview.

Getting information:

ps: displays active processes
ps A: displays all processes
lsmod: lists modules
dmesg: displays the kernel log buffer

Breakpoints:

bp: sets a breakpoint
bl: lists breakpoints
bc: clears a breakpoint
bt: prints a backtrace
go: continues execution

Inspect memory and registers:

md: displays memory
rd: displays registers

Here is a quick example of setting a break point:

kdb> bp sys_sync
Instruction(i) BP #0 at 0xc01304ec (sys_sync)
  is enabled  addr at 00000000c01304ec, hardtype=0 installed=0

kdb> go

The kernel returns to life and the console shows the normal bash prompt. If you type sync, it hits the breakpoint and enters kdb again:

Entering kdb (current=0xdf388a80, pid 88) due to Breakpoint @ 0xc01304ec

kdb is not a source debugger so you can't see the source code, or single step. However, you can display a backtrace using the bt command, which is useful to get an idea of program flow and call hierarchy.

When the kernel performs an invalid memory access or executes an illegal instruction, a kernel oops message is written to the kernel log. The most useful part of this is the backtrace, and I want to show you how to use the information there to locate the line of code that caused the fault. I will also address the problem of preserving oops messages if they cause the system to crash.

Looking at an oops

An oops message looks like this:

[   56.225868] Unable to handle kernel NULL pointer dereference at virtual address 00000400[   56.229038] pgd = cb624000[   56.229454] [00000400] *pgd=6b715831, *pte=00000000, *ppte=00000000[   56.231768] Internal error: Oops: 817 [#1] SMP ARM[   56.232443] Modules linked in: mbx(O)[   56.233556] CPU: 0 PID: 98 Comm: sh Tainted: G   O  4.1.10 #1[   56.234234] Hardware name: ARM-Versatile Express[   56.234810] task: cb709c80 ti: cb71a000 task.ti: cb71a000[   56.236801] PC is at mbx_write+0x14/0x98 [mbx][   56.237303] LR is at __vfs_write+0x20/0xd8[   56.237559] pc : [<bf0000a0>]    lr : [<c0307154>]  psr: 800f0013[   56.237559] sp : cb71bef8  ip : bf00008c  fp : 00000000[   56.238183] r10: 00000000  r9 : cb71a000  r8 : c02107c4[   56.238485] r7 : cb71bf88  r6 : 000afb98  r5 : 00000006  r4 : 00000000[   56.238857] r3 : cb71bf88  r2 : 00000006  r1 : 000afb98  r0 : cb61d600
[   56.239276] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user[   56.239685] Control: 10c5387d  Table: 6b624059  DAC: 00000015[   56.240019] Process sh (pid: 98, stack limit = 0xcb71a220)

PC is at mbx_write+0x14/0x98 [mbx] tells you most of what you want to know: the last instruction was in the mbx_write function in a kernel module named mbx. Furthermore, it was at offset 0x14 bytes from the start of the function, which is 0x98 bytes long.

Next, take a look at the backtrace:

[   56.240363] Stack: (0xcb71bef8 to 0xcb71c000)[   56.240745] bee0:                                                       cb71bf88 cb61d600[   56.241331] bf00: 00000006 c0307154 00000000 c020a308 cb619d88 00000301 00000000 00000042[   56.241775] bf20: 00000000 cb61d608 cb709c80 cb709c78 cb71bf60 c0250a54 00000000 cb709ee0[   56.242190] bf40: 00000003 bef4f658 00000000 cb61d600 cb61d600 00000006 000afb98 cb71bf88[   56.242605] bf60: c02107c4 c030794c 00000000 00000000 cb61d600 cb61d600 00000006 000afb98[   56.243025] bf80: c02107c4 c0308174 00000000 00000000 00000000 000ada10 00000001 000afb98[   56.243493] bfa0: 00000004 c0210640 000ada10 00000001 00000001 000afb98 00000006 00000000[   56.243952] bfc0: 000ada10 00000001 000afb98 00000004 00000001 00000020 000ae274 00000000[   56.244420] bfe0: 00000000 bef4f49c 0000fcdc b6f1aedc 600f0010 00000001 00000000 00000000[   56.245653] [<bf0000a0>] (mbx_write [mbx]) from [<c0307154>] (__vfs_write+0x20/0xd8)[   56.246368] [<c0307154>] (__vfs_write) from [<c030794c>] (vfs_write+0x90/0x164)[   56.246843] [<c030794c>] (vfs_write) from [<c0308174>] (SyS_write+0x44/0x9c)[   56.247265] [<c0308174>] (SyS_write) from [<c0210640>] (ret_fast_syscall+0x0/0x3c)[   56.247737] Code: e5904090 e3520b01 23a02b01 e1a05002 (e5842400)[   56.248372] ---[ end trace 999c378e4df13d74 ]---

In this case, we don't learn much more, merely that mbx_write is called from the virtual filesystem code.

It would be very nice to find the line of code that relates to mbx_write+0x14, for which we can use objdump. We can see from objdump -S that mbx_write is at offset 0x8c in mbx.ko, so that last instruction executed is at 0x8c + 0x14 = 0xa0. Now, we just need to look at that offset and see what is there:

$ arm-poky-linux-gnueabi-objdump -S mbx.kostatic ssize_t mbx_write(struct file *file,const char *buffer, size_t length, loff_t * offset){  8c:   e92d4038        push    {r3, r4, r5, lr}  struct mbx_data *m = (struct mbx_data *)file->private_data;  90:   e5904090        ldr     r4, [r0, #144]  ; 0x90  94:   e3520b01        cmp     r2, #1024       ; 0x400  98:   23a02b01        movcs   r2, #1024       ; 0x400  if (length > MBX_LEN)    length = MBX_LEN;    m->mbx_len = length;  9c:   e1a05002        mov     r5, r2  a0:   e5842400        str     r2, [r4, #1024] ; 0x400

This shows the instruction where it stopped. The last line of code is shown here:

m->mbx_len = length;

You can see that m has the type struct mbx_data *. Here is the place where that structure is defined:

#define MBX_LEN 1024 struct mbx_data {  char mbx[MBX_LEN];  int mbx_len;};

So, it looks like the m variable is a null pointer, and that is causing the oops.

Preserving the oops

Decoding an oops is only possible if you can capture it in the first place. If the system crashes during boot before the console is enabled, or after a suspend, you won't see it. There are mechanisms to log kernel oops and messages to an MTD partition or to persistent memory, but here is a simple technique that works in many cases and needs little prior thought.

So long as the contents of memory are not corrupted during a reset (and usually they are not), you can reboot into the bootloader and use it to display memory. You need to know the location of the kernel log buffer, remembering that it is a simple ring buffer of text messages. The symbol is __log_buf. Look this up in System.map for the kernel:

$ grep __log_buf System.mapc0f72428 b __log_buf

Then, map that kernel logical address into a physical address that U-Boot can understand by subtracting PAGE_OFFSET, 0xc0000000, and adding the physical start of RAM, 0x80000000 on a BeagleBone, so c0f72428 – 0xc0000000 + 0x80000000 = 80f72428.

Then use the U-Boot md command to show the log:

U-Boot# md 80f7242880f72428: 00000000 00000000 00210034 c6000000    ........4.!.....80f72438: 746f6f42 20676e69 756e694c 6e6f2078    Booting Linux on80f72448: 79687020 61636973 5043206c 78302055     physical CPU 0x80f72458: 00000030 00000000 00000000 00730084    0.............s.80f72468: a6000000 756e694c 65762078 6f697372    ....Linux versio80f72478: 2e34206e 30312e31 68632820 40736972    n 4.1.10 (chris@80f72488: 6c697562 29726564 63672820 65762063    builder) (gcc ve80f72498: 6f697372 2e34206e 20312e39 6f726328    rsion 4.9.1 (cro80f724a8: 6f747373 4e2d6c6f 2e312047 302e3032    sstool-NG 1.20.080f724b8: 20292029 53203123 5720504d 4f206465    ) ) #1 SMP Wed O
80f724c8: 32207463 37312038 3a31353a 47203335    ct 28 17:51:53 G

Note

From Linux 3.5 onwards, there is a 16-byte binary header for each line in the kernel log buffer which encodes a timestamp, a log level and other things. There is a discussion about it in the Linux Weekly News titled Toward more reliable logging at https://lwn.net/Articles/492125/.