x86-64 (Intel) Assembly - NASM & MASM
Assembly can be used to write code for x86 CPUs such as Intel and AMD, which are common desktop CPUs. Different CPU types that are often found in mobile devices, such as ARM, use an entirely different assembly language with different constraints and features.
x86 32-bit Assembly
Basic Concepts
- Endianness
Endianness is a concept in computer memory architecture that describes the order in which bytes are stored.
Big endian storage means the most significant byte (MSB) of data is stored at the lowest memory address, and the least significant byte (LSB) at the highest. For example, in big endian format, the number 0x11223344 is stored starting with 0x11 at the lowest address. In contrast, little endian storage does the opposite, placing the LSB at the lowest memory address and the MSB at the highest. Consequently, 0x11223344 (4-byte number in memory, which is most common with x86 CPUs) would start with 0x44 at the lowest address. Little endian format is commonly used in x86 CPUs, while big endian is used for network packets.
- Signage
In computer memory, we usually use 4 bytes to store numbers (which is the size of an integer in C). If we only want to store positive numbers, the range is from 0 to 4,294,967,295. To store the largest positive value, we use "unsigned." For negative numbers, the range is from -2,147,483,648 to 2,147,483,647, and we use one bit to indicate if it's negative (by setting the most significant bit).
Negative numbers are stored using two's complement. To convert a positive number to its negative equivalent, we invert all the bits and add 1. For example, 42 becomes -42 by inverting the bits and adding 1, resulting in 0xffffffd6 in hexadecimal.
Alternatively, we can think of it as a circular counter. For -1, we rotate all bits after 0, giving us 0xffffffff. For -2, we subtract 1 and get 0xfffffffe, and so on.
- CPU registers
Because accessing memory (RAM) is typically a slow process for a CPU, processors always contain a number of registers, which are small storage locations inside the processor, where data can be accessed very quickly.
In 32-bit x86 processors, registers can store 32 bits (4 bytes). The Extended Instruction Pointer (EIP) always points to the memory location with the CPU instruction to be executed.
There are 8 general-purpose registers with specific names like EAX, EBX, ECX, and EDX. These registers can also be divided into smaller parts like AX, BX, CX, and DX for easier reference.
EAX = 0x11223344 can be broken down as follows:
EAX = 0x11223344
AX = 0x3344
AH = 0x33
AL = 0x44
This general purpose registers are the following:
EAX (accumulator): Arithmetical and logical instructions
EBX (base): Base pointer for memory addresses
ECX (counter): Loop, shift, and rotation counter
EDX (data): I/O port addressing, multiplication, and division
ESI (source index): Pointer addressing of data and source in string copy operations
EDI (destination index): Pointer addressing of data and destination in string copy operations
There's also the EFLAGS register, which holds special flags like Carry flag (CF), Parity flag (PF), Zero flag (ZF), Sign flag (SF), and Overflow flag (OF) to indicate the status of certain operations in the program.
Segment registers are another group of registers but are not covered in this explanation.
- Stack Memory
The stack is a part of the computer's memory (RAM). It's used by the CPU to temporarily store data and addresses while executing programs. The stack is a region of memory reserved for managing function calls, storing local variables, and maintaining the execution context of a program.
The stack is like a special memory space used by functions to store their temporary information.
The "stack pointer" (ESP) is a special register that points to the top of the stack memory, , it keeps “track” of the most recently referenced location on the stack (top of the stack) by storing a pointer to it.
The stack grows towards lower memory addresses, so adding data makes ESP point to lower addresses.
When you add data to the stack, ESP moves down in memory.
Think of the stack as a Last In First Out (LIFO) store, where the last thing you put in is the first thing you take out.
Data in assembly language is often referred to by their sizes:
"byte" is 1 byte.
"word" is 2 bytes.
"dword" (double word) is 4 bytes.
"qword" (quad word) is 8 bytes.
Since the stack is in constant flux during the execution of a thread, it can become difficult for a function to locate its stack frame, which stores the required arguments, local variables, and the return address. EBP, the base pointer, solves this by storing a pointer to the top of the stack when a function is called.
- Heap Memory
The heap is also a part of the computer's memory (RAM). However, unlike the stack, which is used for managing function calls and storing local variables with a LIFO (Last In First Out) structure, the heap is a region of memory used for dynamic memory allocation. It's called the "heap" because it's like a pool of memory where programs can request and release memory blocks as needed.
In the heap, you can allocate and deallocate memory at any time, and the memory is not managed in a strict order like the stack. It's up to the programmer to manage memory allocation and deallocation in the heap. Common programming languages provide functions or mechanisms (e.g., malloc and free in C/C++) to interact with the heap, allowing you to request and release memory dynamically during program execution.
Common Assembly Instructions
- nop
NOP means "No Operation." When the CPU executes NOP, it does nothing. It's represented as 0x90 in opcode, but we don't usually need to know opcodes. NOP is actually a shorthand for the XCHG EAX, EAX instruction. This instruction swaps the values in the EAX register with itself, but since it's the same register, it has no effect.
- xchg
XCHG, which can swap values between any two registers. For example, XCHG ECX, EAX would swap the values in ECX and EAX registers.
- mov
The MOV instruction can be used to load a value into a register or a memory location. It allows to following combinations:
Load an immediate value into a register or memory address
Load a value from one register to another register
Load a value from a memory location to a register and vice-versa
! We can't copy a value from one memory location to another, generally we can't work between two memory locations directly.
Example:
In assembly language, square brackets [] are utilized to indicate indirect memory access, for example:
- lea
The Load Effective Address (LEA) instruction is similar to MOV, but it doesn't dereference the memory address. It only loads the address itself.
Other example:
The previous instruction is typically used to calculate a memory address and store that in a register. However, it can also be used to simply perform arithmetic operations.
The lea
instruction can be thought of as the equivalent of the C/C++ "address-of" operator &:
- push & pop
The PUSH and POP instructions will directly operate on the stack memory. By convention, PUSH and POP will work on the memory at the top of the stack, pointed to by ESP.
The PUSH instruction puts a value on the top of the stack, which will result in two things. First, ESP will be decreased by four bytes (as the stack grows towards lower memory addresses), and the value we specified will be stored at the memory address pointed to by ESP.
For example, If ESP points to the memory address 0x7f000020 at the beginning:
The POP instruction does the reverse of PUSH. It takes four bytes from the stack, puts it into the specified register, and then increases the value of ESP by four, causing the stack to shrink.
To summarize, we can use the PUSH instruction to store data on the stack (on the memory ESP points to), and we can use POP to retrieve data from the stack.
For example, push eax
would push the value of the eax register onto the stack and pop ebx
would pop the value at the top of the stack into the ebx register.
There is also the leave
instruction, which is used to clean up and exit a subroutine or function. When executed, it first moves the value of the base pointer register (EBP) to the stack pointer register (ESP). It then pops the value of the base pointer register from the stack, restoring it to its previous value. This instruction is often used to free reserved shadow spaces. Essentially, the leave instruction performs the same task as the following instructions:
- inc & dec
The INC instruction increases the value of a register or a value found at a memory address.
The DEC instruction will decrease a value by one. It's the pair of INC and can be used the same way.
- add & stub
They are used to add and subtract two values and store the result in a register.
The destination of the calculation is always the register or memory location that we specified in the left (left operand)
- mul & div
Multiplies the value found in EAX by the multiplier we specify and stores the result in EDX:EAX.
This is because the result might not fit into EAX, so another register is also used. EDX will store the high order bits, while EAX the low order bits.
! If the multiplier is not 32-bit (e.g.: dword or whole register), but smaller like a byte, or sub-register like AX, then there is a difference in the calculation being performed.
DIV works like mul but storing both, the quotient and remainder:
dword --> EDX:EAX / divisor = EAX (quotient), EDX (remainder)
- Bitwise Logic Operations
Assembly has an instruction for all four basic logic operations called AND, OR, XOR, and NOT. All instructions take two operands, except NOT, which takes one.
These operations can be used with registers, memory addresses, or immediate values, except in two specific memory locations. For NOT, you can invert either a memory location or a register. The result is always stored in the first operand.
Examples:
- jmp (Control Flow)
JMP stands for jump. This instruction will direct the instruction pointer (EIP) to a new memory location where execution will continue.
There are two main types of jumps in assembly language:
Relative Jump: This jump is based on a relative offset from the current value of the instruction pointer (EIP). It's often a 16-bit offset. This type of jump moves to a new location based on how far it is from the current spot. It's like saying, "Jump 10 steps forward from where you are." In code, you'd see something like "JMP number," where "number" tells you how many steps to jump.
Absolute Jump: This jump goes directly to a specific place in memory. It's like having a map with an address, and you say, "Go to this exact address." In code, you might see "JMP EAX," where EAX holds the address you want to go to.
- call & ret (Control Flow)
CALL is an instruction in assembly language that allows you to jump to another part of your code, like a function, while also saving the memory address of the instruction right after the CALL on the stack.
For example, if you have a CALL instruction in your code to go to a function, it will save the address of the instruction right after the CALL on the stack. This is called the return address.
When the function is done executing, you can use the RET instruction to go back to where you were before the CALL. RET takes the address from the top of the stack (the return address) and sets the instruction pointer (EIP) to that address. It also increases the value of the stack pointer (ESP).
So, CALL is like a jump with a bonus feature that saves where you should return to, and RET is used to go back to that saved location after the function is done.
For example, The CALL instruction changes EIP to 0x08049000, which is the address of the '_myfunction' function. It also saves the address of the instruction following the CALL (0x0804900b) on the stack. As a result, ESP is decreased by four bytes, and the value 0x0804900b (return address) is saved on the stack. When the '_myfunction' function is done executing, it encounters the RET instruction. RET takes the address from the top of the stack, which is 0x0804900b (the return address). It sets EIP to that address (0x0804900b), effectively returning to the instruction right after the original CALL. It also increases the value of ESP (since it quitted the saved address from the stack once returned to it).
TEST instruction:
TEST performs a bitwise AND operation between two values, which can be registers, memory addresses, or immediate values. Two of them can't be memory locations at the same time, so we can't test two memory addresses directly.
It does not store the numeric result but sets flags in the EFLAGS register.
The CF (carry flag) and OF (overflow flag) are always set to zero.
The ZF (zero flag) is set to "1" if the result is zero; otherwise, it's set to "0."
TEST is commonly used to check if a specific value is zero by comparing it to itself, thanks to the AND operation. If the result is zero, ZF becomes "1."
Conditional jumps like JZ (jump if zero) and JNZ (jump if non-zero) depend on the ZF flag.
JZ jumps if ZF is "1," while JNZ jumps if ZF is "0."
Example of a loop:
CMP instruction:
CMP performs a subtraction operation between the two values but does not store the result anywhere. It does this by subtracting the numbers in the background and discarding the result.
The two values can be registers, memory addresses, or immediate values. Two of them can't be memory locations at the same time, so we can't cmp two memory addresses directly.
If the two numbers are equal, the ZF is set to 1
It will set the OF for signed numbers and CF for unsigned numbers as well as the SF (signed flag) for signed numbers to indicate sign of the result.
Once the comparison is done, we can use plenty of Jxx instructions to make a decision (https://web.itu.edu.tr/kesgin/mul06/intel/instr/jxx.html).
Example:
x86 64-bit Assembly
The x64 Architecture
- Backward Compatibility
As mentioned earlier, x64 architecture is mostly backward-compatible with x86. This means that you can run x86 assembly code on an x64 processor, and it will work in compatibility mode as if it's an x86 CPU. All the instructions you learned for x86 will still work on x64.
- Larger Address Space
One of the most significant improvements in x64 architecture is the ability to address memory using 64 bits. In contrast to 32-bit CPUs, which have a limited addressable memory space, x64 processors can access and utilize a much larger address space. This allows for handling vast amounts of memory, which is crucial for modern computing tasks.
- 64-bit Data Processing
x64 processors can perform calculations on 64-bit size data, known as a quadword. This means they can process larger data sets with greater precision and efficiency.
- Register Changes
https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/x64-architecture
Instruction Pointer (EIP): In x64, the EIP register is renamed to RIP and is extended to 64 bits in size. It always holds a 64-bit memory address since x64 operates in a 64-bit address space. The concept of EIP is no longer applicable in the x64 architecture.
EFLAGS (Flags Register): The EFLAGS register in x86 is extended and renamed to RFLAGS in x64.
General-Purpose Registers: All the general-purpose registers in x64 are extended to 64 bits in length. Here's how they transform:
EAX becomes RAX
EBX becomes RBX
ECX becomes RCX
EDX becomes RDX
ESI becomes RSI
EDI becomes RDI
ESP becomes RSP
EBP becomes RBP
New Registers: In addition to the above changes, x64 introduces eight new 64-bit size general-purpose registers, named R8 through R15. These new registers provide additional resources for more complex calculations and data manipulation.
Accessing Lower Parts: You can still use the lower 32 bits of the general-purpose registers (e.g., EAX, EBX) as 32-bit value placeholders. Additionally, you can access the lower parts of the new 64-bit registers directly. For instance, R8D for the lower 32 bits, R8W for the lower 16 bits, and R8B for the lowest 8 bits. However, there is no direct way to access the second-lowest byte of these registers.
Zeroing Behavior: When working with the lower 32 bits of the general-purpose registers (like EAX), the higher 32 bits are always zeroed out in x64. This ensures that the upper part of the register does not affect calculations. However, when working with lower 16- or 8-bit parts, this zeroing behavior does not apply, similar to x86.
New in x64 Assembly
In x64 assembly, most of the assembly instructions you've learned for x86 still work, but they have been extended to work with new 64-bit registers and 64-bit memory addressing. Here's a brief explanation with code snippets:
- MOV Instruction for 64-bit Precision
In x64, you can use the MOV instruction to move a 64-bit value from a register into a memory address. For example, MOV [RAX], RCX will move the 64-bit value stored in RCX into the memory address pointed to by RAX, where the memory address size is 64-bit.
MOV [RAX], RCX
- MOV for 64-bit Immediate Values
The MOV instruction is the only one that can work with 64-bit immediate values. You can use it to move a 64-bit immediate value into a register or memory address. For instance, to move the value 0x1122334455667788 into RAX, you can use:
MOV RAX, 0x1122334455667788
- PUSH for 32-bit Size Values
The PUSH instruction in x64 can only work with a maximum of 32-bit size values. If you want to put a 64-bit value onto the stack, you first need to move it into a register or memory address and then use PUSH to push it onto the stack.
To further understand how push and pop operate in x64:
- RIP Relative Addressing
In x64, you can use the instruction pointer, RIP, to address a memory location. This is called RIP-relative addressing and was not possible in x86. For example, in x86, the instruction MOV EAX, [EIP+0x10] would be invalid, but in x64, you can do the following:
MOV RAX, [RIP+0x10]
Here, we move a value into RAX based on the current value of RIP. This is an example of RIP-relative addressing, which has no x86 equivalent.
- Assembly File Structure
The structure of your assembly file remains the same between x86 and x64 architectures. You still need a .text section and the _start label defined.
Shifting, Rotation, and Memory Setting Assembly Instructions
- SAL and SAR:
SAL (Shift Arithmetic Left) and SAR (Shift Arithmetic Right) are instructions used in computer programming to shift the bits of a value in a register or memory location to the left or right by a specified number of bits, respectively. SAL is equivalent to multiplication by 2^n, and SAR is equivalent to division by 2^n, where n is the number of bits shifted.
- SHL and SHR
SHL (Shift Logical Left) is another name for SAL, while SHR (Shift Logical Right) is slightly different from SAR. SAR preserves the sign bit (most significant bit), making it suitable for signed operations, while SHR fills with zeros from the left.
- Rotate Left (ROL) and Rotate Right (ROR)
This instructions are very similar to the previously-mentioned shift instructions. Both shift bits in the registers or memory location left and right; however, in this case, the bits don't "fall out" of the destination, but rotate as the name suggests.
During left rotation scenarios, the most significant bit is moved in place of the least significant bit.
Each bit is moved left, and the one "exiting" at the left returns on the right.
In case of right rotation, it works the same way, but in the opposite direction.
Similarly to shifting, we can rotate either a register or memory location.
- Store String (STOSx)
STOSx is an instruction for storing a byte, word, dword, or qword in a memory location. "x" indicates the data size and can be either
"B", "W", "D", or "Q", and the value to store is usually in the RAX register. RDI holds the target memory address, and after storage, RDI is incremented by the number of bytes stored.
- REP STOS
REP is a prefix used with instructions like STOS to repeat them "n" times, with the number of repetitions specified in the RCX register. It's often used in operations like memset to set a memory range to a specific value.
- Example
Hello World Program
Assemblers
NASM
NASM is a cross-platform assembler that can generate code for multiple architectures, including x86, x86-64 (64-bit), ARM, and more. It is not tied to a specific operating system or architecture, making it versatile for various platforms.
To compile 32-bit:
nasm -f elf32 nop.asm
ld -m elf_i386 nop.o -o nop
To compile 64-bit:
nasm -f elf64 test64.asm
ld test64.o -o test64
MASM
MASM or Microsoft Macro Assembler is an assembler that uses the Intel syntax to assemble assembly code in a Visual Studio project, as it is a part of the Visual Studio suite.
In VS Project Click Source > Add item > program.asm > Righ-clicl the item > Properties > Select Item Type MASM
- Program Structure
- Variable Declaration
To declare variables in MASM assembly, one must specify them within the data section of the file, which is created using the .data directive.
VarName directive VarValue
Where VarName is the variable name to be declared and directive is one of MASM's data declaration directives listed below:
word - Unsigned 16-bit value (word).
sword - Signed 16-bit integer value.
dword - Unsigned 32-bit value (double word).
sdword - Signed 32-bit integer value.
qword - Unsigned 64-bit value (quad word).
sqword - Signed 64-bit integer value.
oword - 128-bit value (octal word).
tbyte - Unsigned 80-bit value.
real4 - 32-bit floating point value.
real8 - 64-bit floating point value.
real10 - 80-bit floating point value.
byte - Unsigned 8-bit value.
sbyte - Signed 8-bit integer value.
Then, VarValue represents the variable's value.
Example:
It's possible to initialize a variable with a hexadecimal value using the h
suffix:
- String Declaration
One can declare a string using the byte MASM directive:
! byte directive in MASM assumes that it is dealing with hexadecimal characters, it is unnecessary to include the h
suffix
- Example of call in MASM
- Example of jmp in MASM
- See Assembly of C code
In Visual Studion click on Debug > Windows > Dissassembly
- Memory Access Specifiers
In MASM assembly language, memory access specifiers are used to indicate the size and the type of data being accessed in memory, these specifiers act like type-casting in a programming language.
Quadword Pointer - qword ptr:
Doubleword Pointer - dword ptr:
Byte Pointer - byte ptr:
- Calling Functions
Calling functions in assembly can happen in one of the following scenarios:
1.The first scenario involves calling an assembly function from another assembly function. This is done using the call instruction to jump to the callee, with the ret instruction used to return to the caller.
2.The second scenario involves calling an assembly function from C. To import an assembly function to a C file, the function prototype should be defined in the C file with the extern keyword. This informs the compiler that the function is already defined in another file, such as an .asm file.
Example:
3.Calling a C function from within an assembly file. To do this, the assembly code must first declare the C function using the externdef directive. This tells the MASM assembler that the symbol (i.e. function) is defined in another module. The externdef directive has the following syntax:
externdef symbol_name:type ; symbol_name is the name of the function to be defined, and type specifies the type of the function
Example:
- Passing Parameters
Once you have created and assembly procedure (function) and want to call you need to pass the arguments to it.
Register Parameters: The first four parameters are passed through registers RCX, RDX, R8, and R9, respectively. These are known as register parameters because they are passed in CPU registers.
Stack Parameters: If a procedure requires more than four parameters, they are pushed onto the stack. These are known as stack parameters. It's important to note that the stack must be 16-byte aligned to accommodate these parameters properly.
Stack Parameter Offset: The fifth parameter (the 5th procedure parameter) is located at a specific offset from the rsp register, but the exact offset depends on the function's calling convention. For example, if the fifth parameter is usually located at an offset of [rsp + 40], which is 40 bytes beyond the current stack pointer. The first four parameters reserved 32 bytes on the stack (8 bytes each), and an additional 8 bytes are reserved for the function's return address.
Example:
- Returning Value
When a 64-bit MASM procedure returns a value, it stores the output in the RAX register. Before executing the ret instruction, the procedure saves the value inside the RAX register, allowing the function to return the value.
The following AddtwoNumbers procedure, takes two parameters, to return their sum:
- Shadow Space
Even in cases where a procedure requires four or fewer parameters, there should be 32 bytes reserved on the stack for these parameters.
This reserved part of memory is called the shadow space and is deducted from the stack pointer register (RSP) at the start of the caller function.
In the Microsoft calling convention, when calling a procedure, the caller is responsible for allocating the shadow space for its callee. Ideally, the caller should also allocate an additional 8 bytes reserved for the return address of the callee, even if the callee takes no parameters.
The shadow space is not only 40 bytes in size (size of register parameters and the return address); instead, it includes the size of all stack parameters passed to the callee, multiplied by 8. It is worth mentioning that reserving shadow space is a complicated matter that depends on the calling convention of the functions and can interfere with another topic known as stack alignment.
The size of the shadow space can often be determined using the following rule:
Shadow Space Size = 32 + 8 + [8 * (number of stack parameters)]
- Reserving RSP's Value
Additionally, before subtracting the stack pointer register from the size of the shadow space, one should preserve its value to ensure stack alignment. This is usually done using two instructions:
Together, these two instructions set up a standard stack frame for a subroutine in 64-bit assembly. Typically, after these two instructions are executed, sub rsp, <shadow space>
instruction will follow, which reserves memory for the shadow space.
Then, before the procedure returns, one should undo the changes done to the base pointer and the stack pointer registers, rbp and rsp respectively:
! Using this method, while executing the producer's instructions, the rbp register should be untouched because it holds the original value of what the stack pointer should point to. If such information is lost (RSP's original value), upon returning from the procedure, the function may crash.
- Example of function with Shadow Space reserved
The following AsmCallFunction procedure, calls printf with five different parameters, allocating shadow space for it:
Last updated