Source Level Debugging the XNU Kernel

Posted on

Whether you’re developing a kernel extension, doing vulnerability research, or you have some other need to spelunk into the macOS/iOS kernel, XNU, sometimes you need to attach a debugger. And when you do that, doing it with source code is really nice when possible. Damien DeVille, Snare and probably others have written about this process. Here are some of their articles:

Things have evolved a bit. And while you can probably work it all out from the combined previous work, there are some things that aren’t addressed. So let’s go through it here, from scratch.

Here are our goals:

  • Debug the macOS 10.13.6 kernel from a macOS environment (10.14 kernel sources haven’t yet been released)
  • Work with a virtual machine target so we don’t have to drag around two separate Macs
  • Not only have kernel symbols, but have kernel source as well
  • Be able to pretty print structures from memory of the target machine
  • At breakpoints, display:
    • Source listing
    • Register content
    • Backtrace
    • Current thread stack

Shopping List

Before we can do that though here’s a shopping list of stuff you need.

Virtual Machine

Create your macOS guest machine.

  • I recommend a simple username and password like “admin” and “a”
  • You’ll want to enable SSH and maybe automatic user login.
  • Be sure to install VMware tools in the guest
  • Check the OS version with sw_vers. We’re looking for 17G65, and xnu version xnu-4570.71.2:
Checking the macOS build and kernel version Checking the macOS build and kernel version

For ease of SSHing into the machine, you may want give it a static IP address using VMware’s DHCP as well as hostname via /etc/hosts. Here’s how.

  1. Get the MAC address for the virtual machine’s network interface (usually en0). Mine is 00:0c:29:a5:fd:3a
  2. Edit /Library/Preferences/VMware Fusion/vmnet8/dhcpd.conf (substitute vmnet1 if you’re not using VMware’s NAT)
  3. Add a stanza for your VM’s static DHCP lease:
    ####### VMNET DHCP Configuration. End of "DO NOT MODIFY SECTION" #######
    
    host gargleblaster{
            hardware ethernet 00:0c:29:a5:fd:3a;
            fixed-address 192.168.44.10;
    }
    
  4. Add your VM’s hostname to /etc/hosts on your host machine
  5. If you like, set the guest VM’s hostname in Sharing.prefpane as well as on the command line using scutil --set HostName
  6. Copy your ssh key to the vm using ssh-copy-id
  7. Shutdown the VM, quit & restart VMware, and boot the VM to test everything out.
  8. Boot the VM into recovery mode and disable SIP using csrutil

VMware’s GDB stub

The official way of debugging XNU is using its built-in debug stub that communicates using the Kernel Debug Protocol, or KDP. It can work over a variety of transports, including serial, FireWire, and I believe Thunderbolt. But for debugging a VM you’d use UDP, which is super flakey for kernel debugging. lldb frequently gets out of sync or loses contact with the debug server, and the kernel is left in a permanently halted state. I assume this is because the debug server is part of the kernel itself combined with UDP’s nature of being unreliable. So the kernel halts, stops communicating with the debugger, and lldb gives up.

A more reliable mechanism is the “hardware” debugging facility provided by VMware. This lets the VM simulate a hardware debugger beneath the kernel. In this case the kernel doesn’t play a role in its own debugging; it isn’t even “aware” it’s being debugged. This method isn’t 100% reliable but it’s generally more stable that KDP over UDP. Also you can (usually) break in with a ^C as if you were attached to a normal userspace process. Setting it up is easy:

Go ahead and shut down the VM. Then edit the its .vmx file found in the .vmwarevm bundle. Add the following lines1 to the file:

debugStub.listen.guest32 = "TRUE"
debugStub.listen.guest64 = "TRUE"

If you want to debug from a machine other than your host (such as another guest VM), you can add the remote listener:

debugStub.listen.guest32.remote = "TRUE"
debugStub.listen.guest64.remote = "TRUE"

Kernel Debug Kit

Download the Kernel Debug Kit from the Apple Developer Portal. It’s essential to download the KDK build version that matches the macOS build in your VM. I believe you need to log into the developer portal with an Apple ID, but I don’t think you need to pay for a developer account.

You’ll want to install the KDK in both your host and your guest. Technically you can get by just copying the development kernel to the guest, but it’s as easy to simply install the whole KDK.

In the guest, copy the development kernel from the KDK location to the directory where kernels live:

$ sudo cp /Library/Developer/KDKs/KDK_10.13.6_17G65.kdk/System/Library/Kernels/kernel.development /System/Library/Kernels/

Since the system doesn’t actually boot the kernel, but rather a prelinked kernel cache, you need to invalidate the existing kernel cache, causing it to be rebuilt. The kextcache command does this. It has lots of options, but for simplicity you can just tell it “rebuild everything you know about on the boot volume”2:

$ sudo kextcache -i /

It’s worth poking around in the KDK to see what it installed. Have a look in /Library/Developer/KDKs/KDK_10.13.6_17G65.kdk. In it, you’ll find lots of symbol bundles for both the kernel and kexts, which is pretty nice. You don’t need to worry about them, though. LLDB will find them by UUID using Spotlight and load them if and when it needs them. What’s really interesting is the kernel dSYM. In it are tons of Python lldb macros. LLDB loads some of them, but most it does not. They’re mostly undocumented but some are really useful. We’ll look at a few later.

Boot args

Some guides will have you set various boot args in your guest, such as kcsuffix. In my experience you don’t need to do anything special to boot into the development kernel. As long as it’s there (or more importantly, the kernel cache is there) it’ll take priority over the release kernel. Reboot your VM and check the kernel version to be sure you booted your DEVELOPMENT kernel:

admins-Mac:~ admin$ uname -a
Darwin admins-Mac.local 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/DEVELOPMENT_X86_64 x86_64

You can also set various debug flags in the debug= boot arg3 4, but you don’t need them. They don’t affect VMware’s gdb server stub in any way. If you’re using kdp in addition to VMWare’s debug stub, however, they may be useful. The flags are a bitfield with values that you OR together. For example, debug=0x1 tells the OS to halt at boot time and wait for a debugger. Probably a useful set of flags to start with is debug=0x141. Apple has a partial list of debug flags here, if you can’t find the debug flag that meets your needs, osfmk/kern/debug.c is probably your next best reference. You can also grep other for other places in kernel source where the debug boot arg is checked:

-==< zach@endor:~/src/xnu-4570.71.2 >==-
 (0) $ grep -rn 'PE_parse_boot_argn\(\"debug\"' .

Setting up LLDB

In order to get lldb to understand the thing we’re debugging, we need to give it some configuration. If you don’t have it already, create a ~/.lldb directory to hold some lldb-specific files. Also create an empty ~/.lldbinit file if you don’t have one already.

Put the x86_64_target_definition.py you downloaded earlier in here. As you go along any other general lldb tweaks or python scripts you develop can go in here as well. Then you can source them from your .lldbinit.

There’s a bit of build/kernel version-specific configuration, so it can’t all go in a common .lldbinit. I like to have a git repo to keep up with my various lldb init scripts, but for now, let’s assume you’re creating ~/.lldb/kernel-debugging.

The first thing is we need to tell lldb we’re debugging an x86_64 target. lldb is very flexible and can debug a variety of target architectures, even ones it’s never heard of before. The target definition file describes that architecture. Shouldn’t lldb know about x86_64 out of the box? Yes it should, and it normally does, but unfortunately the remote gdb stub we’re going to be connecting to can’t tell our debugger what architecture it’s debugging. So we tell the debugger ahead of time. Add this to your kernel-specific lldb init script:

settings set plugin.process.gdb-remote.target-definition-file ~/.lldb/x86_64_target_definition.py

Speaking of x86, you’ll probably want to set the disassembly flavor to Intel, rather than AT&T. You know. Because you’re a professional:

settings set target.x86-disassembly-flavor intel

Remember those python scripts in the dSYM bundle? Now we need to tell lldb to load them automatically (or at least whichever ones it decides are necessary).

settings set target.load-script-from-symbol-file true

The dSYMs in the KDK reference kernel source files at whatever location they were when Apple built the kernel. That’ll often be something like /BuildRoot/Library/Caches/com.apple.xbs/something/something. Of course lldb won’t be able to find kernel sources at that path (unless you put them there), so we need to tell it to translate. The following setting works for this build, but the path may be different for other kernels. Look for error messages from lldb.

settings set target.source-map  /BuildRoot/Library/Caches/com.apple.xbs/Sources/xnu/xnu-4570.71.2 /Users/zach/src/xnu-4570.71.2

We need to have lldb load some super useful macros from the kernel dSYM. It should pick up xnu.py, but there are even more in memory.py.

command script import "/Library/Developer/KDKs/KDK_10.13.6_17G65.kdk/System/Library/Kernels/kernel.development.dSYM/Contents/Resources/Python/lldbmacros/memory.py"

There are lots of other settings you can configure in lldb, and most of them default to something reasonable. Check help settings for a list. Pay attention in particular to settings with “darwin” in the name. In some cases it can be hard to work out what possible values are available for a setting. In that case it may be easiest to consult lldb source.

At this point, your .lldb/kernel-debugging script should look something like:

#Help lldb figure out we're debugging x86_64
settings set plugin.process.gdb-remote.target-definition-file ~/.lldb/x86_64_target_definition.py

#Use a reasonable disassembly syntax
settings set target.x86-disassembly-flavor intel

#Tell load any lldb scripts and macros hidden inside .dSYM files
settings set target.load-script-from-symbol-file true

#Tell lldb where the source directory really is
settings set target.source-map  /BuildRoot/Library/Caches/com.apple.xbs/Sources/xnu/xnu-4570.71.2 /Users/zach/src/xnu-4570.71.2

#This should get loaded automatically when we set the target executable
#command script import "/Library/Developer/KDKs/KDK_10.13.6_17G65.kdk/System/Library/Kernels/kernel.development.dSYM/Contents/Resources/Python/lldbmacros/xnu.py"

#This does not appear to get loaded automatically, so we load it here.
command script import "/Library/Developer/KDKs/KDK_10.13.6_17G65.kdk/System/Library/Kernels/kernel.development.dSYM/Contents/Resources/Python/lldbmacros/memory.py"

# Load the kernel binary we're going to be debugging.
target create /Library/Developer/KDKs/KDK_10.13.6_17G65.kdk/System/Library/Kernels/kernel.development

With your VM booted, if you run lldb (with no target binary), you get your basic (lldb) prompt. Now source your kernel debugging script with:

command source ~/.lldb/kernel-debugging

and look out for any errors:

Sourcing your lldb kernel script Sourcing your lldb kernel script

If you’ve set everything up right, you should be able to connect with the gdb-remote command:

(lldb) gdb-remote 8864

And you should break into the kernel, probably in the middle of an idle thread. Assuming lldb found the kernel, symbols, and source code okay, you should be see a short source code snippet at your breakpoint:

Breaking into the kernel Breaking into the kernel

I’m not sure what determines what thread the debugger breaks into. I suspect it’s just chance. If your machine is busy, you may break into a thread that isn’t idle and may even be executing in a kernel extension. In that case you’ll be at a place you don’t have source code for. Try setting a breakpoint on a frequently called kernel function like dofileread() and continuing.

(lldb) breakpoint set -n dofileread

If you break in the kernel proper, and not an extension, you should see source.

The VM should be frozen at this point. Try hitting c to continue running; the VM should be interactive once more. See if ^C breaks in.

To detach from the VM, do:

(lldb) c
Process 1 resuming
(lldb) detach
Process 1 detached
(lldb) target delete
1 targets deleted.
(lldb) quit

I’ve found that my lldb command history doesn’t get saved reliably5 unless I go through the whole process of detaching, deleting the target, and quitting. Before detaching, you might want to clear any breakpoints with breakpoint delete. Detaching should clear breakpoints, but in my experience it doesn’t always, and then your target can just randomly hang.

Pretty Printing Structures

If you’re able to break on functions by name and display source code, then you should be all set to pretty print structures and other objects from kernel memory. This is especially useful for really large structures that have lots of C macros and conditional defines. Printing them in lldb gives you an easy view of how the structure is actually composed.

Here’s an easy example. Set a breakpoint on dofileread():

(lldb) breakpoint set -n dofileread                                                            
Breakpoint 1: where = kernel`dofileread + 51 at sys_generic.c:359, address = 0xffffff8015f46eb3
(lldb) c

The debugger should hit your breakpoint right away; file reads are a super common operation. When it does, you should see lldb’s view of the function prototype:

kernel`dofileread(ctx=0xffffff8ce861bf00, fp=0xffffff8028c332e8, bufp=140465093751296, nbyte=65536, offset=-1, flags=0, retval=<unavailable>)

Try pretty printing some of the function parameters with the print command. You’ll see that ctx is of type vfs_context_t (which is actually a typedefed pointer), and fp is of type fileproc *. To print these structures, you need to have lldb intepret them as pointers and dereference them:

(lldb) print ctx
(vfs_context_t) $44 = 0xffffff8ce861bf00
(lldb) print *(vfs_context_t)ctx
(vfs_context) $45 = {
  vc_thread = 0xffffff802a029a10
  vc_ucred = 0xffffff8024d14520
}
(lldb) print fp
(fileproc *) $46 = 0xffffff8028c332e8
(lldb) print *(fileproc *)fp
(fileproc) $47 = {
  f_flags = 0
  f_iocount = 1
  f_fglob = 0xffffff802ffc9960
  f_wset = 0x0000000000000000
}

Here’s a screenshot of it in action:

Pretty printing structures from kernel memory Pretty printing structures from kernel memory

Setting up Voltron

So we’re succesfully debugging the kernel with symbols and source code. But lldb doesn’t give us much of a user interface. It’d be nice if we could see some more context like registers, the stack, disassembly at the instruction pointer, a backtrace for the current thread. You know. The things debuggers do. Well, instead of user interface, lldb gives you API. I guess if I had to choose between half implemented user interface and a really good API, I’d choose the latter. And that’s how we got Snare’s Voltron.

If you haven’t already, grab Voltron from https://github.com/snare/voltron. You’ll be tempted to install it with pip, but don’t. Use the included install.sh shell script instead6. This script figures out what debuggers you have installed and what Python version they use. It also helps resolve a conflict between Voltron’s six dependency and the one installed with the system Python.

When you get done installing it, it should have appended a line similar to the following to your .lldbinit:

command script import /Users/zach/Library/Python/2.7/lib/python/site-packages/voltron/entry.py

Make sure it’s there. Also be sure the bin directory under whatever Python path it used above is added to your shell $PATH. For example:

export PATH=$PATH:$HOME/Library/Python/2.7/bin

Now, in seperate terminal windows (or tmux panes or whatever), you can start up separate Voltron views. In your primary pane, start lldb as normal. Then configure your voltron panes as you choose. From the shell run voltron view registers, for example, to have a view of the registers that gets updated at every breakpoint. Here’s the help output:

$ voltron view -h
usage: voltron view [-h]
                    {backtrace,t,bt,back,registers,r,reg,register,breakpoints,b,bp,break,command,c,cmd,memory,m,mem,disasm,d,dis,stack,s,st}
                    ...

optional arguments:
  -h, --help            show this help message and exit

views:
  valid view types

  {backtrace,t,bt,back,registers,r,reg,register,breakpoints,b,bp,break,command,c,cmd,memory,m,mem,disasm,d,dis,stack,s,st}
                        additional help
    backtrace (t,bt,back)
                        backtrace view
    registers (r,reg,register)
                        register values
    breakpoints (b,bp,break)
                        breakpoints view
    command (c,cmd)     run a command each time the debugger host stops
    memory (m,mem)      display a chunk of memory
    disasm (d,dis)      disassembly view
    stack (s,st)        display a chunk of stack memory

Here’s my setup. Apologies for the giant screenshot.

Kernel debugging with Voltron Kernel debugging with Voltron

And that’s it. You’re debugging the macOS kernel with symbols, source and Voltron.

Be sure to tweet at me any comments or corrections.


  1. You really only need the 64-bit debug stub, I believe, but I added both. ↩︎

  2. If you want to uninstall the development kernel and go back to the release kernel, you need to:

    • remove the following from /System/Library/:
      • Kernels/kernel.development
      • PrelinkedKernels/prelinkedkernel.development
      • Caches/com.apple.kext.caches/Startup/kernelcache.development
    • Invalidate your kernel cache like before
     ↩︎
  3. You’ll be tempted to enable the non-maskable interrupt, or NMI, debug flag so you can halt the kernel with a keystroke. I’ll save you the trouble. It won’t work with a VM. Your host will catch the NMI every time. I panicked my system a bunch trying to figure this out. I don’t think there’s a way to simulate the NMI in a VM. ↩︎

  4. The debug flags don’t affect VMware’s debug stub in any way. Again, the kernel doesn’t even know about it. They only configure the kernel’s own KDP debug stub. That said, they can be useful since they give you a second way to attach a debugger. For example, if the system panics between breakpoints, you often can’t introspect into the panic context from the VMware stub. But you can attach a second lldb session to the KDP stub to look at the panic. ↩︎

  5. And once you figure out various lldb incantations, you don’t want to figure them out again. So you want your command history. ↩︎

  6. He told me this firsthand. I think he keeps Voltron in PyPI just to screw with noobs. ↩︎