June 23, 2011, 11:55 a.m.
posted by newmy
Sometimes just looking at the external actions of a binary are not enough. We need to be able to break it down and find out what's going on internally so we can understand the true nature of the program. This is not an easy process. The majority of the work you will do will be done, in the best case scenario, in low-level C. The majority of the time you will be working in assembly or whatever low-level language the object code was created in. Make sure you brush up on these skills before you begin this process.
In the Unix environment (and in Windows via the dumper utility in the Cygwin package), you have the ability to write out the entire memory space for an active program to a file. In the Unix vernacular, this is known as a core dump. Core dumps are extremely important in that they can store valuable information about what the program is doing and storing. To dump out the memory to a core file in Linux, you can use the kill command with a special signal:
$ kill -S SIGSEGV <processid>
If the above command doesn't work, check the environment and make sure core dumps are allowed. In Linux this is done using the ulimit command.
A SIGSEGV signal tells the process that a segmentation violation has occurred and tricks it into dumping out the contents of memory when the process terminates. The resulting core file is a flat binary representation of what was stored in memory at the time of termination. You can perform the same types of analysis on these core files that you would apply to any binary file. Start off with a quick strings run to see if you can locate any relevant text that may have been obfuscated in the binary. You can also then use the core dump to help when you run the binary in a debugger such as GDB to cleanly sandbox and diagnose what the program is doing.
Objdump is the GNU Binutils disassembler. It is a very powerful tool that you can use to take an object file and break it down into its assembly instructions. In addition, if you are lucky enough to be working with a file that has the symbols left in, you can actually re-create parts of the C code using objdump. Let's take a look at some of the options that objdump has in detail:
$ objdump <options> <filename> --demangle[=style] --debugging --disassemble --source --info
When a binary is compiled from C++ source, the names of the functions are changed. This is an artifact of the way that C++ works, where you can have two completely different functions named the exact same thing, just in different namespaces. The solution is to append what looks like gibberish onto the end of the function name, and then store the new function name in a lookup table. Unfortunately, this makes it very difficult for a human who is looking at the code after the fact to understand what in the world is going on. Using demangle is the best way to get the name back to a human-readable form that you can make sense out of.
If a binary is compiled using GCC, there can be special debugging metadata included in the file that will last even if the binary is stripped using the strip command. Using this metadata, objdump will attempt to reconstruct low-level C code for the program. This feature is by no means a slam dunk, but does offer a great place to start if you need to attack the source by hand.
In accordance with the Von Neumann architecture, a program holds both instructions and data, and the two are intertwined. For a human to look at the binary and determine which is which is darn near impossible. Using disassemble, objdump will parse out which elements are data and which elements are instructions, and only interpret the ones that are instructions. This can be extraordinarily useful in pairing down large executables into manageable code segments.
If you are lucky enough to have a binary that hasn't been stripped and still has the source code intact, source will automatically extract it for you and place it back into the original files. The usefulness of this feature can't be understated; if you can get back to the original source code, you'll have everything you need to perform an accurate and complete analysis.
You aren't always going to be dealing with executables for well-known platforms. The day may come when you are handed an RS/6000 running AIX that has been rooted. The info flag shows you all the platforms that objdump can decompile. This is useful because you won't have to re-create an entire environment to analyze a binary on some exotic platform. You can do it from the comfort of your Linux box.
If you are interested in a commercial application that can help your reverse engineering efforts, I cannot recommend IDA Pro enough. It automates many of the tasks I have discussed previously, such as creating a graph showing the dependencies of the program, the execution tree, and other things to aid in your task. The software has a signature database that can help identify common functions in the program, giving you the ability to fencepost certain areas in the code. In addition, it has support for a wide variety of binaries, including ELF (the Linux format) and the Win32 architecture.
As mentioned earlier, GDB can be an extremely effective tool in determining postmortem what an application does. If you can create a core file, GDB will allow you to navigate the file and poke through the memory contents. You can also place watches on file handles and network sockets to see what an application is accessing when. The one downside to GDB, however, is its steep learning curve. It is a program that is now over 20 years old and it shows with the sheer enormity of functionality the program holds. There is a fairly large time commitment required to learn its ins and outs, but this curve can be mitigated somewhat by using an external GUI such as DDD to help organize what you are trying to do.