On a side-note, I laughed when I looked back at that other article, because it starts the same way: "oh, hey, a friend is about to learn x86 assembly, so I thought I would write this quick article!" So I guess the lesson here is: Parents, talk to your kids about assembly language... or else their friends will! }:-)
With no further ado, I'll get this thing moving with...
mov
Much of reverse engineering entails following the flow of data backward and forward as it moves through registers and memory. The mov instruction is the most commonly used instruction and the instruction you'll most often have to read to know where data is going.
lea
lea stands for load effective address. The lea instruction is supposed to give you a pointer to something rather than dereferencing the pointer and giving you the actual data. In reality, though, it just computes the sum or other expression in the square brackets and moves it to the specified location. Take, for example, the following instruction:
lea eax, [ebp-218h]
The eax register in this case will receive ebp minus 0x218, which is the address of some local variable. Compare this with:
mov eax, [ebp-218h]
Which actually dereferences ebp-0x218 to retrieve the contents of that local variable in the function stack frame and puts that value into eax.
Since the lea instruction in all reality just computes the value of the expression in the brackets, it can also be used to evaluate complex expressions involving multiplication and addition. If you see some values that can't possibly be addresses getting used with the lea instruction, you might be right. The program may be merely computing a value rather than working with memory addresses.
push
Data goes on the stack, usually for a function call.
Some compilers will also emit code to push an immediate operand (a constant value, e.g. 0) and then pop it to a register, like this:
Some compilers will also emit code to push an immediate operand (a constant value, e.g. 0) and then pop it to a register, like this:
push 4 ; Put the number 4 on the stack pop eax ; The number 4 winds up in eax
call
The processor pushes the address of the next instruction and transfers control to a procedure of the programmer's choosing. This is kind of equivalent to lines 3-5 below:
1 push arg2 ; Push function arguments as normal 2 push arg1 3 push offset L_nextinstr ; Save the address of the next instruction on the stack 4 jmp procedure ; Transfer control 5 L_nextinstr: 6 test eax, eax ; Resume normal stuff like checking return value 7
jmp
This is another way to transfer control, usually within a procedure, but sometimes to a procedure.
retn N
When you see a return instruction followed by a number, the function is cleaning up its own stack, which means it is stdcall (the standard calling convention for Microsoft Win32 APIs).
Obviously its usual use is plain arithmetic, but when it is used with the stack pointer as above, you know the preceding function call was to a cdecl function.
add
I mention this instruction now because it is used in the other calling convention, cdecl, to efficiently forget about function parameters pushed on the stack:add esp, 8
Obviously its usual use is plain arithmetic, but when it is used with the stack pointer as above, you know the preceding function call was to a cdecl function.
cmp
Compare two operands: Subtract the second operand from the first operand and set EFLAGS as if this were an arithmetic subtraction instruction.
test
Logical comparison. From Intel's manual: "Computes the bit-wise logical AND of first operand... and the second operand... and sets [EFLAGS accordingly]."
More
If you're unsure about what an instruction does, RTFM: http://www.intel.com/products/processor/manuals/
Intel's manuals are the definitive guide to how Intel's processors parse and execute instructions. They are organized as follows:
Volume 1: Basic Architecture
Volume 2: Instruction Set Reference
Volume 3: System Programming Guide
If you wonder about a particular instruction, you'll find it in volume 2 (Instruction Set Reference). If you want to learn about the x86 execution environment, volume 1 (Basic Architecture) is your friend. And if you're writing a bootloader, an operating system, or a hypervisor, volume 3 (System Programming Guide) is for you.
Misc
If you're interested in tabulating the most common instructions using IDAPython, here is a snippet.
from collections import defaultdict def _for_each_instr(callback, outputs=None, parms=None): """Do <callback> for each instruction. Call callback() providing fva, chunk start va, instr addr, and outputs/ parameters. """ for fva in Functions(): for (va_start, va_end) in Chunks(fva): for head in Heads(va_start, va_end): callback(fva, va_start, head, outputs, parms) def enum_mnemonics(): mnems = defaultdict(int) def enum_mnemonics_callback(fva, chunkva, head, unused1, unused2): mnems[GetMnem(head)] += 1 _for_each_instr(enum_mnemonics_callback) mnems_sorted = sorted(mnems.iteritems(), key=lambda(k,v):v, reverse=True) return mnems_sorted