reverse_engineering4 min read|2026-05-26

Binary Analysis Fundamentals

How to approach an unknown binary, recognize x86-64 disassembly patterns, and build a mental model of the code.

reverse-engineeringx86disassemblyanalysis

Starting with the binary

When you open an unknown binary in a disassembler, it is tempting to start reading instructions immediately. Start with the metadata first.

# Binary format
file target_binary

# Linked libraries
ldd target_binary

# Embedded strings
strings target_binary | grep -i "error\|fail\|password\|key\|flag"

# Exported symbols
nm -D target_binary

# Section headers
readelf -S target_binary

Strings alone can tell you a lot. Error messages reveal what the program does. Library function names tell you what APIs it uses. Format strings show you the shape of internal data.

Recognizing function prologues

Every function starts with a prologue that sets up the stack frame:

// Standard x86-64 prologue
push rbp
mov rbp, rsp
sub rsp, 0x30        // allocate 48 bytes of local variables

// Leaf function (no calls to other functions)
sub rsp, 0x18        // stack adjustment, no frame pointer

// With canary (stack protection)
push rbp
mov rbp, rsp
sub rsp, 0x30
mov rax, fs:[0x28]   // load stack canary
mov [rbp-0x8], rax   // store on stack

The stack canary pattern (fs:[0x28]) strongly suggests the binary was compiled with -fstack-protector. The canary value is checked before the function returns. If it was overwritten by a buffer overflow, the program aborts.

Calling conventions

On x86-64 Linux (System V ABI), function arguments go in registers:

ArgumentInteger/PointerFloat
1stRDIXMM0
2ndRSIXMM1
3rdRDXXMM2
4thRCXXMM3
5thR8XMM4
6thR9XMM5
7th+StackStack
ReturnRAXXMM0

So when you see:

mov edi, 0x10         // first arg: 16
call malloc           // malloc(16)
mov [rbp-0x18], rax   // store returned pointer in local variable

This allocates 16 bytes and stores the returned pointer.

Common patterns

If-else:

cmp eax, 5
jne .else_branch
// then-block code
jmp .after_if
.else_branch:
// else-block code
.after_if:

For loop:

mov ecx, 0            // i = 0
.loop_start:
cmp ecx, 100          // i < 100
jge .loop_end
// loop body
inc ecx               // i++
jmp .loop_start
.loop_end:

Switch statement (jump table):

cmp eax, 7            // check bounds
ja .default_case
lea rdx, [rip + .jump_table]
movsxd rax, [rdx + rax*4]
add rax, rdx
jmp rax

Virtual function call (C++ vtable):

mov rax, [rdi]        // load vtable pointer from object
call [rax + 0x18]     // call 4th virtual function (index 3)

Working with stripped binaries

Most release binaries are stripped, meaning all symbol names are removed. You can't see function names, only addresses. IDA Pro and Ghidra will auto-detect function boundaries and let you rename them as you figure out what they do.

The approach:

  1. Find main by looking at the entry point. _start calls __libc_start_main with main as an argument
  2. Look for strings referenced by functions to guess their purpose
  3. Look for library calls (printf, malloc, open, socket) to understand the high-level behavior
  4. Name functions as you understand them: check_password, parse_config, send_response

Dealing with optimization

Compiler optimizations can make disassembly harder to read. Common transformations:

  • Strength reduction: multiplication by constants becomes shifts and adds (x * 5 becomes lea eax, [rax + rax*4])
  • Inlining: small functions are copied into callers, so you won't find them as separate functions
  • Loop unrolling: the loop body is duplicated 2-4x to reduce branch overhead
  • Tail call optimization: the last call in a function becomes a jmp, so the function does not appear on the stack

At -O0, the disassembly closely matches the source code. At -O2 or -O3, the compiler rearranges code for performance and the mapping to source becomes less obvious. Start with -O0 builds when learning, then work up to optimized binaries.

Tools of the trade

  • Ghidra: Free, open source, excellent decompiler. Start here.
  • IDA Pro: Industry standard, expensive, best database format for collaborative RE (my preferred tool)
  • Binary Ninja: Good middle ground, nice API for scripting analysis
  • radare2/rizin: Command line, free, steep learning curve but powerful
  • GDB/LLDB: Dynamic analysis, breakpoints, memory inspection at runtime
  • strace/ltrace: Trace system calls and library calls without disassembling anything
  • x64/32 Dbg: If IDA pro isn't available to you, i'd use this.
A glitchy blue-screen themed illustration with retro computers
Sometimes the machine tells you the story sideways.