BinTrace - tool for tracing binaries you don't have source for.
07 / 11 / 2021

Over the weekend I was strugling with some rev challange from past hacklu 2021 CTF. I ended up writing tool that allows me to easily trace spawned process in terms of how many times and in which order given address was executed. I have made some extra effort of putting this on github and writing some simple test for it and now writing this blog post. I think this might be a common thing to do when reversing software, so I decided to share It and shortly talk over It's workings and usage. Cool thing about it is that it can trace any executable address, you don't have to have a source for the application - It does dynamic instrumentation of the traced binary, all user needs are spot's addresses of user's interest. Addresses are given relative to base of the binary. Determining the real address of those places in memory when process is loaded by kernel is handled by the tool and user doesn't have to worry about it (it works if the ASLR is enabled too).

Tool is rather simple. It's usage is like this:

$ bintrace <binary path> <addr0> <addr1> ... <addrN>
          

This will run the binary found at binary path and trace specified addresses. After tracee exits it will print tracing history in order and information how many times it was hit.

Let me show you how the output may look like based on some example. Let's say we have such program (I attach source code in C, but It's needed only for brevity, user doesn't need it for the bintrace to work).

#include <stdio.h>

int add(int a, int b);
int sub(int a, int b);

int main(void)
{
	add(1,2);
	add(2,3);
	sub(add(3,4), sub(7,1));

	return 0;
}

int add(int a, int b)
{
	return a+b;
}

int sub(int a, int b)
{
	return a-b;
}
          

So the program have two functions - add and sub. It calls the functions in order: add, add, sub, add, sub (actually evaluation order is not required to be like that in the case of "nested calls" which are used to calculate the arguments, but my compiler generated this order, so we will stick with it for the purpose of this presentation). With objdump I can easily obtain addresses of the add and sub functions (in "real life" situation they might have been reverse engineered or something). In my case these addresses are: add@11bf and sub@11d3. As stated previously, bintrace will handle determining the load address of the binary, so we just give these, relative addresses. Command line is:

$ bintrace tracee 0x11bf 0x11d3

Created process: 85825
Process 85825 exited with 0

Trace history dump:
0x5555555551bf: 2
0x5555555551d3: 1
0x5555555551bf: 1
0x5555555551d3: 1
          

Calls are listed in order they occured in code. Load base address for the binary was in this case 0x555555554000 so we get: 0x11bf x2, 0x11d3, 0x11bf, 0x11d3. Which is what we expected, right? (add, add, sub, add, sub). If there were multiple hits in a row for a given address the entry is "collapsed" and the counter is increased so it takes less space.

bintrace works by inserting in proper places breakpoints and when process hit them, patches them back to the original value, executes the instruction and places the breakpoint back so It can be hit again. Very trivial algorithm. Tracing is done with use of ptrace (so for now it only works for Linux and for now only for x86 because I patch binaries with int3, but It can be ported to ARM in like 15 minutes or so). I might add this feature to pwndbg so it can be used when pwning easier, I guess.

Source is at my github . It is ~450 LOC C code, for "testing" there is Python3 script. I will happily receive improvements via pull requests ;]. There are some quality of life features I have in mind, but they are not that crucial, I have better things to do lmao.