x86-64 (Intel) Assembly - NASM & MASM

http://www.jegerlehner.ch/intel/IntelCodeTable.pdf https://www.felixcloutier.com/x86/ https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html

Assembly can be used to write code for x86 CPUs such as Intel and AMD, which are common desktop CPUs. Different CPU types that are often found in mobile devices, such as ARM, use an entirely different assembly language with different constraints and features.

x86 32-bit Assembly

Basic Concepts

- Endianness

Endianness is a concept in computer memory architecture that describes the order in which bytes are stored.

Big endian storage means the most significant byte (MSB) of data is stored at the lowest memory address, and the least significant byte (LSB) at the highest. For example, in big endian format, the number 0x11223344 is stored starting with 0x11 at the lowest address. In contrast, little endian storage does the opposite, placing the LSB at the lowest memory address and the MSB at the highest. Consequently, 0x11223344 (4-byte number in memory, which is most common with x86 CPUs) would start with 0x44 at the lowest address. Little endian format is commonly used in x86 CPUs, while big endian is used for network packets.

- Signage

In computer memory, we usually use 4 bytes to store numbers (which is the size of an integer in C). If we only want to store positive numbers, the range is from 0 to 4,294,967,295. To store the largest positive value, we use "unsigned." For negative numbers, the range is from -2,147,483,648 to 2,147,483,647, and we use one bit to indicate if it's negative (by setting the most significant bit).

Negative numbers are stored using two's complement. To convert a positive number to its negative equivalent, we invert all the bits and add 1. For example, 42 becomes -42 by inverting the bits and adding 1, resulting in 0xffffffd6 in hexadecimal.

Alternatively, we can think of it as a circular counter. For -1, we rotate all bits after 0, giving us 0xffffffff. For -2, we subtract 1 and get 0xfffffffe, and so on.

- CPU registers

Because accessing memory (RAM) is typically a slow process for a CPU, processors always contain a number of registers, which are small storage locations inside the processor, where data can be accessed very quickly.

In 32-bit x86 processors, registers can store 32 bits (4 bytes). The Extended Instruction Pointer (EIP) always points to the memory location with the CPU instruction to be executed.

There are 8 general-purpose registers with specific names like EAX, EBX, ECX, and EDX. These registers can also be divided into smaller parts like AX, BX, CX, and DX for easier reference.

EAX = 0x11223344 can be broken down as follows:

  • EAX = 0x11223344

  • AX = 0x3344

  • AH = 0x33

  • AL = 0x44

This general purpose registers are the following:

  • EAX (accumulator): Arithmetical and logical instructions

  • EBX (base): Base pointer for memory addresses

  • ECX (counter): Loop, shift, and rotation counter

  • EDX (data): I/O port addressing, multiplication, and division

  • ESI (source index): Pointer addressing of data and source in string copy operations

  • EDI (destination index): Pointer addressing of data and destination in string copy operations

There's also the EFLAGS register, which holds special flags like Carry flag (CF), Parity flag (PF), Zero flag (ZF), Sign flag (SF), and Overflow flag (OF) to indicate the status of certain operations in the program.

Segment registers are another group of registers but are not covered in this explanation.

- Stack Memory

The stack is a part of the computer's memory (RAM). It's used by the CPU to temporarily store data and addresses while executing programs. The stack is a region of memory reserved for managing function calls, storing local variables, and maintaining the execution context of a program.

The stack is like a special memory space used by functions to store their temporary information.

The "stack pointer" (ESP) is a special register that points to the top of the stack memory, , it keeps “track” of the most recently referenced location on the stack (top of the stack) by storing a pointer to it.

The stack grows towards lower memory addresses, so adding data makes ESP point to lower addresses.

When you add data to the stack, ESP moves down in memory.

Think of the stack as a Last In First Out (LIFO) store, where the last thing you put in is the first thing you take out.

Data in assembly language is often referred to by their sizes:

  • "byte" is 1 byte.

  • "word" is 2 bytes.

  • "dword" (double word) is 4 bytes.

  • "qword" (quad word) is 8 bytes.

Since the stack is in constant flux during the execution of a thread, it can become difficult for a function to locate its stack frame, which stores the required arguments, local variables, and the return address. EBP, the base pointer, solves this by storing a pointer to the top of the stack when a function is called.

- Heap Memory

The heap is also a part of the computer's memory (RAM). However, unlike the stack, which is used for managing function calls and storing local variables with a LIFO (Last In First Out) structure, the heap is a region of memory used for dynamic memory allocation. It's called the "heap" because it's like a pool of memory where programs can request and release memory blocks as needed.

In the heap, you can allocate and deallocate memory at any time, and the memory is not managed in a strict order like the stack. It's up to the programmer to manage memory allocation and deallocation in the heap. Common programming languages provide functions or mechanisms (e.g., malloc and free in C/C++) to interact with the heap, allowing you to request and release memory dynamically during program execution.

Common Assembly Instructions

- nop

NOP means "No Operation." When the CPU executes NOP, it does nothing. It's represented as 0x90 in opcode, but we don't usually need to know opcodes. NOP is actually a shorthand for the XCHG EAX, EAX instruction. This instruction swaps the values in the EAX register with itself, but since it's the same register, it has no effect.

- xchg

XCHG, which can swap values between any two registers. For example, XCHG ECX, EAX would swap the values in ECX and EAX registers.

- mov

The MOV instruction can be used to load a value into a register or a memory location. It allows to following combinations:

  • Load an immediate value into a register or memory address

  • Load a value from one register to another register

  • Load a value from a memory location to a register and vice-versa

! We can't copy a value from one memory location to another, generally we can't work between two memory locations directly.

Example:

MOV EAX, 1                  ; put the immediate value, "1", into EAX
MOV ECX, 0x42424242         ; put "0x42424242" into ECX
MOV DWORD [EAX], 3          ; put the number 3 as a dword (double word - 4-bytes) into the memory address pointed to by EAX, so it will put 00000003
MOV BYTE [EBX], 0x10        ; put one byte value, "0x10" into the memory location pointed to by EBX
MOV EAX, ECX                ; copy the value of ECX into EAX
MOV DWORD [EDI], EAX        ; put the value stored in EAX into the memory location pointed to by EDI
MOV DWORD [EAX], EAX        ; put the value stored in EAX into the memory location pointed to by EAX
MOV EBX, DWORD [EDI + 0x10] ; take the value from EDI, add 0x10 to it, and copies the value from that memory address into EBX.

mov eax, [memory_address]   ; Load the value from a memory address into rax
add eax, 10                 ; Perform an operation on the data in rax
mov [memory_address], rax   ; Store the modified value back to the memory address

In assembly language, square brackets [] are utilized to indicate indirect memory access, for example:

mov [ebx], al     ; move the value in AL to the memory location pointed to by the EBX register

- lea

The Load Effective Address (LEA) instruction is similar to MOV, but it doesn't dereference the memory address. It only loads the address itself.

LEA EBX, [ECX + 0x10]  ; adds 0x10 to ECX and puts the result in EBX. If ECX is 0x41000000, then EBX becomes 0x41000010.
MOV EBX, [ECX + 0x10]  ; read the value from the memory address ECX + 0x10 and store it in EBX. If the memory address 0x41000010 holds the value 0x5, then EBX becomes 0x5.

Other example:

LEA EAX, [ECX + 2*EAX + 0x10] ; store the result of the calculation ECX + 2*EAX + 0x10 in EAX.

The previous instruction is typically used to calculate a memory address and store that in a register. However, it can also be used to simply perform arithmetic operations.

The lea instruction can be thought of as the equivalent of the C/C++ "address-of" operator &:

StringVar byte 'String Variable', 0       ; A dummy string variable 
lea rcx, StringVar                        ; Load the address of the StringVar variable into RCX. RCX is now equal to &StringVar[0]

- push & pop

The PUSH and POP instructions will directly operate on the stack memory. By convention, PUSH and POP will work on the memory at the top of the stack, pointed to by ESP.

The PUSH instruction puts a value on the top of the stack, which will result in two things. First, ESP will be decreased by four bytes (as the stack grows towards lower memory addresses), and the value we specified will be stored at the memory address pointed to by ESP.

For example, If ESP points to the memory address 0x7f000020 at the beginning:

PUSH 1           ; put the value "0x1" on the stack.Because ESP was decreased by four bytes, it became 0x7f00001c and the value "0x1" is stored at that address.
PUSH EAX         ; Assuming that EAX contains "0x0" ESP is again decreased by four bytes and the value "0x0" is stored at the address 0x7f000018.
PUSH DWORD [ESP] ; puts the value pointed to by ESP to the stack. ESP was pointing to 0x7f000018 and the byte stored there is "0x0", so we'll put "0x0" on the stack. ESP is again decreased by four bytes and the value is stored.

The POP instruction does the reverse of PUSH. It takes four bytes from the stack, puts it into the specified register, and then increases the value of ESP by four, causing the stack to shrink.

POP EAX ; take four bytes from the stack and put them into EAX. The value of ESP will be increased by four bytes.
POP EBX ; take four bytes from the stack and put them into EBX. The value of ESP will be increased by four bytes.
POP ECX ; take four bytes from the stack and put them into ECX. The value of ESP will be increased by four bytes.

To summarize, we can use the PUSH instruction to store data on the stack (on the memory ESP points to), and we can use POP to retrieve data from the stack.

For example, push eax would push the value of the eax register onto the stack and pop ebx would pop the value at the top of the stack into the ebx register.

There is also the leave instruction, which is used to clean up and exit a subroutine or function. When executed, it first moves the value of the base pointer register (EBP) to the stack pointer register (ESP). It then pops the value of the base pointer register from the stack, restoring it to its previous value. This instruction is often used to free reserved shadow spaces. Essentially, the leave instruction performs the same task as the following instructions:

mov esp, ebp
pop ebp

- inc & dec

The INC instruction increases the value of a register or a value found at a memory address.

INC EAX         ; increase the value inside EAX by one
INC BYTE [EAX]  ; increase the value of the byte found at the memory address specified by EAX. If EAX has the value of 0x7f0000, it will look up the byte found at 0x7f0000 and increase it by one. Thus, if we have "0x1" at 0x7f0000, it will be increased to "0x2".
INC DWORD [EAX] ; increase the value of the dword found at that memory address

The DEC instruction will decrease a value by one. It's the pair of INC and can be used the same way.

- add & stub

They are used to add and subtract two values and store the result in a register.

The destination of the calculation is always the register or memory location that we specified in the left (left operand)

ADD EAX, EAX          ; EAX = EAX + EAX
ADD ECX, 4            ; ECX = ECX + 4
ADD DWORD [ESP], EAX  ; memory[ESP] = memory[ESP] + EAX
SUB EAX, EDX          ; EAX = EAX - EDX
SUB EBX, 0x10         ; EBX = EBX - 0x10

add rax, rbx          ; add the value in RBX to the value in RAX and store the result in RAX
add rax, [rcx]        ; add the value in the memory location at RCX to the value in RAX and store the result in RAX
add [rax], 10         ; add the value 10 to the memory location at RAX and store the result in that memory location

- mul & div

Multiplies the value found in EAX by the multiplier we specify and stores the result in EDX:EAX.

This is because the result might not fit into EAX, so another register is also used. EDX will store the high order bits, while EAX the low order bits.

MUL ECX          ; EDX:EAX = EAX * ECX - If EAX=0x80000000 and ECX=0x8, the result is 0x400000000. As it's stored in EDX:EAX, EDX will contain 0x4 and EAX contains 0x00000000.
MUL DWORD [EDX]  ; EDX:EAX = EAX * memory[EDX] - Multiplies EAX by the 4-byte value at the memory location pointed to by EDX.

! If the multiplier is not 32-bit (e.g.: dword or whole register), but smaller like a byte, or sub-register like AX, then there is a difference in the calculation being performed.

DIV works like mul but storing both, the quotient and remainder:

dword --> EDX:EAX / divisor = EAX (quotient), EDX (remainder)

- Bitwise Logic Operations

Assembly has an instruction for all four basic logic operations called AND, OR, XOR, and NOT. All instructions take two operands, except NOT, which takes one.

These operations can be used with registers, memory addresses, or immediate values, except in two specific memory locations. For NOT, you can invert either a memory location or a register. The result is always stored in the first operand.

Examples:

AND EAX, ECX  ; EAX = EAX and ECX
XOR EAX, EAX  ; EAX = EAX xor EAX (=0)
NOT ECX       ; ECX = not ECX
AND ECX, 0x11 ; ECX = ECX and 0x11

- jmp (Control Flow)

JMP stands for jump. This instruction will direct the instruction pointer (EIP) to a new memory location where execution will continue.

There are two main types of jumps in assembly language:

Relative Jump: This jump is based on a relative offset from the current value of the instruction pointer (EIP). It's often a 16-bit offset. This type of jump moves to a new location based on how far it is from the current spot. It's like saying, "Jump 10 steps forward from where you are." In code, you'd see something like "JMP number," where "number" tells you how many steps to jump.

Absolute Jump: This jump goes directly to a specific place in memory. It's like having a map with an address, and you say, "Go to this exact address." In code, you might see "JMP EAX," where EAX holds the address you want to go to.

- call & ret (Control Flow)

CALL is an instruction in assembly language that allows you to jump to another part of your code, like a function, while also saving the memory address of the instruction right after the CALL on the stack.

For example, if you have a CALL instruction in your code to go to a function, it will save the address of the instruction right after the CALL on the stack. This is called the return address.

When the function is done executing, you can use the RET instruction to go back to where you were before the CALL. RET takes the address from the top of the stack (the return address) and sets the instruction pointer (EIP) to that address. It also increases the value of the stack pointer (ESP).

So, CALL is like a jump with a bonus feature that saves where you should return to, and RET is used to go back to that saved location after the function is done.

For example, The CALL instruction changes EIP to 0x08049000, which is the address of the '_myfunction' function. It also saves the address of the instruction following the CALL (0x0804900b) on the stack. As a result, ESP is decreased by four bytes, and the value 0x0804900b (return address) is saved on the stack. When the '_myfunction' function is done executing, it encounters the RET instruction. RET takes the address from the top of the stack, which is 0x0804900b (the return address). It sets EIP to that address (0x0804900b), effectively returning to the instruction right after the original CALL. It also increases the value of ESP (since it quitted the saved address from the stack once returned to it).

TEST instruction:

  • TEST performs a bitwise AND operation between two values, which can be registers, memory addresses, or immediate values. Two of them can't be memory locations at the same time, so we can't test two memory addresses directly.

  • It does not store the numeric result but sets flags in the EFLAGS register.

  • The CF (carry flag) and OF (overflow flag) are always set to zero.

  • The ZF (zero flag) is set to "1" if the result is zero; otherwise, it's set to "0."

  • TEST is commonly used to check if a specific value is zero by comparing it to itself, thanks to the AND operation. If the result is zero, ZF becomes "1."

Conditional jumps like JZ (jump if zero) and JNZ (jump if non-zero) depend on the ZF flag.

JZ jumps if ZF is "1," while JNZ jumps if ZF is "0."

Example of a loop:

init_loop:
  MOV ECX, 3 ; Initialize i to 3
  MOV EAX, 10 ; Initialize j to 10

loop:
  TEST ECX, ECX ; Check if i is zero
  JZ continue_here ; If i is zero, exit the loop
  INC EAX ; Increment j
  DEC ECX ; Decrement i
  JMP loop ; Jump back to the beginning of the loop

continue_here:
; ... (rest of the code)

CMP instruction:

  • CMP performs a subtraction operation between the two values but does not store the result anywhere. It does this by subtracting the numbers in the background and discarding the result.

  • The two values can be registers, memory addresses, or immediate values. Two of them can't be memory locations at the same time, so we can't cmp two memory addresses directly.

  • If the two numbers are equal, the ZF is set to 1

  • It will set the OF for signed numbers and CF for unsigned numbers as well as the SF (signed flag) for signed numbers to indicate sign of the result.

Once the comparison is done, we can use plenty of Jxx instructions to make a decision (https://web.itu.edu.tr/kesgin/mul06/intel/instr/jxx.html).

Example:

init:
  MOV EAX, 3  ; Load the immediate value, "3," into EAX 

branching:
  CMP EAX, 4 ; Compare EAX with 4
  JA add_5   ; Jump to add_5 if CF=0 and ZF=0 (In the previous comparison, both values are not equal (zf=0), and EAX is greater CF=0(second value is less or equal))
  CMP EAX, 3 ; Compare EAX with 3
  JB add_6   ; Jump to add_6 if CF=1 (In the previous comparison, EAX is less, CF=1(second value is greater))
  JE add_1   ; Jump to add_1 if ZF=1 (In the comparison, both values are equal)
  JMP end    ; Relative jump to the end part of the code
  
add_5:
  ADD EAX, 5 ; Increment the value in EAX by 5
  JMP end    ; Relative jump to the end part of the code
add_6:
  ADD EAX, 6 ; Increment the value in EAX by 6
  JMP end    ; Relative jump to the end part of the code
add_1:
  INC EAX    ; Increment the value inside EAX by one
  JMP end    ; Relative jump to the end part of the code
  
end:
  MOV EAX, 4 ; Load the immediate value, "4," into EAX

x86 64-bit Assembly

The x64 Architecture

- Backward Compatibility

As mentioned earlier, x64 architecture is mostly backward-compatible with x86. This means that you can run x86 assembly code on an x64 processor, and it will work in compatibility mode as if it's an x86 CPU. All the instructions you learned for x86 will still work on x64.

- Larger Address Space

One of the most significant improvements in x64 architecture is the ability to address memory using 64 bits. In contrast to 32-bit CPUs, which have a limited addressable memory space, x64 processors can access and utilize a much larger address space. This allows for handling vast amounts of memory, which is crucial for modern computing tasks.

- 64-bit Data Processing

x64 processors can perform calculations on 64-bit size data, known as a quadword. This means they can process larger data sets with greater precision and efficiency.

- Register Changes

https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/x64-architecture

Instruction Pointer (EIP): In x64, the EIP register is renamed to RIP and is extended to 64 bits in size. It always holds a 64-bit memory address since x64 operates in a 64-bit address space. The concept of EIP is no longer applicable in the x64 architecture.

EFLAGS (Flags Register): The EFLAGS register in x86 is extended and renamed to RFLAGS in x64.

General-Purpose Registers: All the general-purpose registers in x64 are extended to 64 bits in length. Here's how they transform:

  • EAX becomes RAX

  • EBX becomes RBX

  • ECX becomes RCX

  • EDX becomes RDX

  • ESI becomes RSI

  • EDI becomes RDI

  • ESP becomes RSP

  • EBP becomes RBP

New Registers: In addition to the above changes, x64 introduces eight new 64-bit size general-purpose registers, named R8 through R15. These new registers provide additional resources for more complex calculations and data manipulation.

Accessing Lower Parts: You can still use the lower 32 bits of the general-purpose registers (e.g., EAX, EBX) as 32-bit value placeholders. Additionally, you can access the lower parts of the new 64-bit registers directly. For instance, R8D for the lower 32 bits, R8W for the lower 16 bits, and R8B for the lowest 8 bits. However, there is no direct way to access the second-lowest byte of these registers.

Zeroing Behavior: When working with the lower 32 bits of the general-purpose registers (like EAX), the higher 32 bits are always zeroed out in x64. This ensures that the upper part of the register does not affect calculations. However, when working with lower 16- or 8-bit parts, this zeroing behavior does not apply, similar to x86.

New in x64 Assembly

In x64 assembly, most of the assembly instructions you've learned for x86 still work, but they have been extended to work with new 64-bit registers and 64-bit memory addressing. Here's a brief explanation with code snippets:

- MOV Instruction for 64-bit Precision

In x64, you can use the MOV instruction to move a 64-bit value from a register into a memory address. For example, MOV [RAX], RCX will move the 64-bit value stored in RCX into the memory address pointed to by RAX, where the memory address size is 64-bit.

MOV [RAX], RCX

- MOV for 64-bit Immediate Values

The MOV instruction is the only one that can work with 64-bit immediate values. You can use it to move a 64-bit immediate value into a register or memory address. For instance, to move the value 0x1122334455667788 into RAX, you can use:

MOV RAX, 0x1122334455667788

- PUSH for 32-bit Size Values

The PUSH instruction in x64 can only work with a maximum of 32-bit size values. If you want to put a 64-bit value onto the stack, you first need to move it into a register or memory address and then use PUSH to push it onto the stack.

mov rax, qword [my_value]  ; Move the 64-bit value into a register (RAX)
push rax                   ; Push the value onto the stack

To further understand how push and pop operate in x64:

SimpleAsmFunc PROC

    push 3              ; Push value 3 onto the stack, rsp - 8 (1 qword 8 bytes)
    push 2              ; Push value 2 onto the stack, rsp - 16 (2 qword 16 bytes)
    push 1              ; Push value 1 onto the stack, rsp - 24 (3 qword 24 bytes)

    mov eax, [rsp]      ; Move the value at rsp (1) into eax, eax = 1
    mov ebx, [rsp + 8]  ; Move the value at rsp + 8 (2) into ebx, ebx = 2
    mov ecx, [rsp + 16] ; Move the value at rsp + 16 (3) into ecx, ecx = 3

    add rsp, 24         ; Adjust the stack pointer to "pop" the pushed values (3 pushes x 8 bytes each = 24 bytes)

    ret                 ; Return from the function

SimpleAsmFunc END

- RIP Relative Addressing

In x64, you can use the instruction pointer, RIP, to address a memory location. This is called RIP-relative addressing and was not possible in x86. For example, in x86, the instruction MOV EAX, [EIP+0x10] would be invalid, but in x64, you can do the following:

MOV RAX, [RIP+0x10]

Here, we move a value into RAX based on the current value of RIP. This is an example of RIP-relative addressing, which has no x86 equivalent.

- Assembly File Structure

The structure of your assembly file remains the same between x86 and x64 architectures. You still need a .text section and the _start label defined.

Shifting, Rotation, and Memory Setting Assembly Instructions

- SAL and SAR:

SAL (Shift Arithmetic Left) and SAR (Shift Arithmetic Right) are instructions used in computer programming to shift the bits of a value in a register or memory location to the left or right by a specified number of bits, respectively. SAL is equivalent to multiplication by 2^n, and SAR is equivalent to division by 2^n, where n is the number of bits shifted.

- SHL and SHR

SHL (Shift Logical Left) is another name for SAL, while SHR (Shift Logical Right) is slightly different from SAR. SAR preserves the sign bit (most significant bit), making it suitable for signed operations, while SHR fills with zeros from the left.

- Rotate Left (ROL) and Rotate Right (ROR)

This instructions are very similar to the previously-mentioned shift instructions. Both shift bits in the registers or memory location left and right; however, in this case, the bits don't "fall out" of the destination, but rotate as the name suggests.

During left rotation scenarios, the most significant bit is moved in place of the least significant bit.

Each bit is moved left, and the one "exiting" at the left returns on the right.

In case of right rotation, it works the same way, but in the opposite direction.

Similarly to shifting, we can rotate either a register or memory location.

ROL r/mX, C ; The "X" can be 8, 16, 32, or 64, representing the register size. "C" represents the number of bits to rotate.

- Store String (STOSx)

STOSx is an instruction for storing a byte, word, dword, or qword in a memory location. "x" indicates the data size and can be either

"B", "W", "D", or "Q", and the value to store is usually in the RAX register. RDI holds the target memory address, and after storage, RDI is incremented by the number of bytes stored.

- REP STOS

REP is a prefix used with instructions like STOS to repeat them "n" times, with the number of repetitions specified in the RCX register. It's often used in operations like memset to set a memory range to a specific value.

- Example

SUB RSP, 0x100       ; Allocating 0x100 bytes on the stack, decreasing RSP by 0x100
LEA RDI, [RSP]       ; Load the effective address of RSP into RDI. RDI is the destination of the STOS operation
MOV RCX, 0x100       ; Enter 0x100 into RCX, which will be the counter for the repeat
MOV RAX, 0x1         ; Set RAX to 0x1, the value to store
REP STOSB            ; Repeat STOSB (store one byte) operation RCX times, which will store one byte repeatedly.
NOP                  ; No operation (placeholder)

Hello World Program

global _start

section .text

_start:
  ; write system call
  MOV RDI, 1                ; standard output
  MOV RSI, hello_text       ; address of "Hello world!"
  MOV RDX, hello_text_len   ; length of "Hello world!"
  MOV RAX, 1                ; write syscall number
  SYSCALL

  ; exit system call
  MOV RDI, 0                ; success
  MOV RAX, 60               ; exit syscall number
  SYSCALL

section .data
  hello_text db "Hello world!", 0
  hello_text_len equ $ - hello_text

Assemblers

NASM

NASM is a cross-platform assembler that can generate code for multiple architectures, including x86, x86-64 (64-bit), ARM, and more. It is not tied to a specific operating system or architecture, making it versatile for various platforms.

section .text

global _start

_start:
  NOP
  NOP

To compile 32-bit:

nasm -f elf32 nop.asm

ld -m elf_i386 nop.o -o nop

To compile 64-bit:

nasm -f elf64 test64.asm

ld test64.o -o test64

MASM

MASM or Microsoft Macro Assembler is an assembler that uses the Intel syntax to assemble assembly code in a Visual Studio project, as it is a part of the Visual Studio suite.

In VS Project Click Source > Add item > program.asm > Righ-clicl the item > Properties > Select Item Type MASM

- Program Structure

; Data section: contains variable and memory values, adding this section is optional 
; Variables can be declared below the ".data" directive
.data


; Code section: contains the assembly code/functions
; Assembly functions can be declared below the ".code" directive
.code


; MASM function declaration
main PROC ; Start of function "main"
     
      ; Assembly code of "main"
      
      ret ; Return from "main"     
main ENDP ; End of function "main"    


; The "end" directive marks the end of the source file
end

- Variable Declaration

To declare variables in MASM assembly, one must specify them within the data section of the file, which is created using the .data directive.

VarName directive VarValue

Where VarName is the variable name to be declared and directive is one of MASM's data declaration directives listed below:

  • word - Unsigned 16-bit value (word).

  • sword - Signed 16-bit integer value.

  • dword - Unsigned 32-bit value (double word).

  • sdword - Signed 32-bit integer value.

  • qword - Unsigned 64-bit value (quad word).

  • sqword - Signed 64-bit integer value.

  • oword - 128-bit value (octal word).

  • tbyte - Unsigned 80-bit value.

  • real4 - 32-bit floating point value.

  • real8 - 64-bit floating point value.

  • real10 - 80-bit floating point value.

  • byte - Unsigned 8-bit value.

  • sbyte - Signed 8-bit integer value.

Then, VarValue represents the variable's value.

Example:

WordVariable      word         2
sWordVariable     sword       -2
FloatVariable     real8       3.1

It's possible to initialize a variable with a hexadecimal value using the h suffix:

DwordVariable     dword       10h       ; this is 10 in hex, which is 16 in decimal

- String Declaration

One can declare a string using the byte MASM directive:

! byte directive in MASM assumes that it is dealing with hexadecimal characters, it is unnecessary to include the h suffix

StringVar  byte 'This is a string', 0                        ; we add "0" to null-terminate the string
StringVar byte 'This is a string with a new line', 10, 0     ; "10" represents the new line character and is equal to 16 in decimal format

- Example of call in MASM

.code

DummpProc     PROC
      mov rcx, 3        ; dummy code
      add rbx, 2
      sub esi, 1
      ret               ; return execution back to "main"
DummpProc     ENDP


main          PROC
      call DummpProc    ; calling "DummpProc"
      ret               ; function "main" is terminated
main          ENDP

end

- Example of jmp in MASM

.code

main PROC
      add eax, 2              ; dummy code
      xor ax, 5
      mov bx, ax
      jmp LabelName           ; Jump to execute 'LabelName' 
      mov eax, 100            ; These instructions won't get executed
      mov ebx, 100
LabelName:
      xor eax, eax            ; LabelName's code
      sub ebx, 2      
      ret
main ENDP

end

- See Assembly of C code

In Visual Studion click on Debug > Windows > Dissassembly

- Memory Access Specifiers

In MASM assembly language, memory access specifiers are used to indicate the size and the type of data being accessed in memory, these specifiers act like type-casting in a programming language.

Quadword Pointer - qword ptr:

mov rax, qword ptr [rbx]         ; Access the 64-bit integer value stored at the memory location pointed to by the rbx register
mov rax, qword ptr [rsp + 32h]   ; Access the 64-bit integer value stored at an offset of 32h bytes from the rsp register.

Doubleword Pointer - dword ptr:

mov dword ptr [ebx], 12345678	; Example 1: stores a 32-bit integer value in memory
mov eax, dword ptr [edx + 4]	; Example 2: loads a 32-bit integer value stored at an offset of 4 bytes edx memory location into the eax register

Byte Pointer - byte ptr:

mov al, byte ptr [edx + 2]	; Example 1: loads a 8-bit integer value from memory into the al register
mov byte ptr [ebx + 8], 55h  ; Example 2: stores an 8-bit integer with the hexadecimal value 55h in a single byte of memory at an offset of 8 bytes from the address in the ebx register.

- Calling Functions

Calling functions in assembly can happen in one of the following scenarios:

1.The first scenario involves calling an assembly function from another assembly function. This is done using the call instruction to jump to the callee, with the ret instruction used to return to the caller.

2.The second scenario involves calling an assembly function from C. To import an assembly function to a C file, the function prototype should be defined in the C file with the extern keyword. This informs the compiler that the function is already defined in another file, such as an .asm file.

Example:

/*
      main.c file
*/

#include <stdio.h>

extern void SimpleAsmFunc(); // SimpleAsmFunc's prototype. Parameters and function return data type is covered in a later section

int main (){
      printf("[i] Calling 'SimpleAsmFunc' ... ");
      SimpleAsmFunc();
      printf("[+] Done");
      return 0;
}
; The asm file that includes the definition of 'SimpleAsmFunc'

.code

SimpleAsmFunc PROC
      xor rcx, rcx      ; SimpleAsmFunc's code
      add rcx, 2
      ret
SimpleAsmFunc ENDP

end

3.Calling a C function from within an assembly file. To do this, the assembly code must first declare the C function using the externdef directive. This tells the MASM assembler that the symbol (i.e. function) is defined in another module. The externdef directive has the following syntax:

externdef symbol_name:type ; symbol_name is the name of the function to be defined, and type specifies the type of the function

Example:

/*
      main.c file
*/

#include <stdio.h>

// Dummy C function
void SimpleCFunc() {

	int i = 100;
	i = i * (i + 7) >> 3;
	i += i/2;

	if (i > 100)
	   i -= 20;
        else
	   i += 20;
}


int main() {
	// You can port "AsmFunc" here and call it
	return 0;
}
; The asm file that calls 'SimpleCFunc'

externdef SimpleCFunc:proc 	; Using externdef to declare "SimpleCFunc" as a procedure defined in an other file

.code 

AsmFunc PROC

      call SimpleCFunc		; Calling SimpleCFunc
      ret

AsmFunc ENDP

end

- Passing Parameters

Once you have created and assembly procedure (function) and want to call you need to pass the arguments to it.

Register Parameters: The first four parameters are passed through registers RCX, RDX, R8, and R9, respectively. These are known as register parameters because they are passed in CPU registers.

Stack Parameters: If a procedure requires more than four parameters, they are pushed onto the stack. These are known as stack parameters. It's important to note that the stack must be 16-byte aligned to accommodate these parameters properly.

Stack Parameter Offset: The fifth parameter (the 5th procedure parameter) is located at a specific offset from the rsp register, but the exact offset depends on the function's calling convention. For example, if the fifth parameter is usually located at an offset of [rsp + 40], which is 40 bytes beyond the current stack pointer. The first four parameters reserved 32 bytes on the stack (8 bytes each), and an additional 8 bytes are reserved for the function's return address.

Example:

AsmFunc11Parms PROC

    ; RCX => Parm1
    ; RDX => Parm2
    ; R8  => Parm3
    ; R9  => Parm4

    mov rax, qword ptr [rsp + 40]  ; Parm5
    mov rax, qword ptr [rsp + 48]  ; Parm6
    mov rax, qword ptr [rsp + 56]  ; Parm7
    mov rax, qword ptr [rsp + 64]  ; Parm8
    mov rax, qword ptr [rsp + 72]  ; Parm9
    mov rax, qword ptr [rsp + 80]  ; Parm10
    mov rax, qword ptr [rsp + 88]  ; Parm11

    ret

AsmFunc11Parms ENDP
#include <Windows.h>

extern int AsmFunc11Parms(PVOID Parm1, PVOID Parm2, PVOID Parm3, PVOID Parm4, PVOID Parm5, PVOID Parm6, PVOID Parm7, PVOID Parm8, PVOID Parm9, PVOID Parm10, PVOID Parm11);

int main() {
	AsmFunc11Parms(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
	return 0;
}

- Returning Value

When a 64-bit MASM procedure returns a value, it stores the output in the RAX register. Before executing the ret instruction, the procedure saves the value inside the RAX register, allowing the function to return the value.

The following AddtwoNumbers procedure, takes two parameters, to return their sum:

AddtwoNumbers PROC
    mov rax, rcx    ; Moving the 1st parmeter to RAX  
    add rax, rdx    ; Add the 2nd parmeter to the value in RAX
    ret             ; return (RAX here is RCX + RDX)
AddtwoNumbers ENDP

- Shadow Space

Even in cases where a procedure requires four or fewer parameters, there should be 32 bytes reserved on the stack for these parameters.

This reserved part of memory is called the shadow space and is deducted from the stack pointer register (RSP) at the start of the caller function.

In the Microsoft calling convention, when calling a procedure, the caller is responsible for allocating the shadow space for its callee. Ideally, the caller should also allocate an additional 8 bytes reserved for the return address of the callee, even if the callee takes no parameters.

The shadow space is not only 40 bytes in size (size of register parameters and the return address); instead, it includes the size of all stack parameters passed to the callee, multiplied by 8. It is worth mentioning that reserving shadow space is a complicated matter that depends on the calling convention of the functions and can interfere with another topic known as stack alignment.

The size of the shadow space can often be determined using the following rule:

Shadow Space Size = 32 + 8 + [8 * (number of stack parameters)]

- Reserving RSP's Value

Additionally, before subtracting the stack pointer register from the size of the shadow space, one should preserve its value to ensure stack alignment. This is usually done using two instructions:

; Pushes the value of the base pointer register to the stack, thus saving it there. The base pointer register is typically used to reference variables on the stack within a subroutine. By pushing its current value onto the stack, the subroutine can save the previous value of rbp and restore it later before returning to the caller.
push rbp 

; Copies the current value of the stack pointer register into the base pointer register.
mov rbp, rsp 

Together, these two instructions set up a standard stack frame for a subroutine in 64-bit assembly. Typically, after these two instructions are executed, sub rsp, <shadow space> instruction will follow, which reserves memory for the shadow space.

Then, before the procedure returns, one should undo the changes done to the base pointer and the stack pointer registers, rbp and rsp respectively:

; mov rsp, rbp - Restores the stack pointer to its original position before the subroutine was called.
; pop rbp - Pops the previous value of the base pointer register off the stack and restores it. This is the value that was saved at the beginning of the subroutine using the push rbp instruction.
leave

! Using this method, while executing the producer's instructions, the rbp register should be untouched because it holds the original value of what the stack pointer should point to. If such information is lost (RSP's original value), upon returning from the procedure, the function may crash.

- Example of function with Shadow Space reserved

The following AsmCallFunction procedure, calls printf with five different parameters, allocating shadow space for it:

.data

String1     byte    '[i] AsmCallFunction => This is a string: "%s" | This is a dword: %d | This is a word: %d | This is a byte: 0x%0.2X', 10, 0          
String2     byte    'Hello World!', 0
DwordVar    dword   213483
WordVar     word    23
ByteVar     byte    10

externdef printf:PROC		    ; Declaring an external value as a procedure (printf in C) 

.code

AsmCallFunction PROC

    push rbp                        ; Setting up AsmCallFunction's shadow space
    mov rbp, rsp
    sub rsp, 48                     ; 32 bytes (4 parameters) + 8 bytes (5th parameter) + 8 byte (return address of printf)
    
    lea rcx, String1                ; 1st parameter 
    lea rdx, String2                ; 2nd parameter
    
    xor r8, r8                      ; 3rd parameter
    mov r8d, DwordVar   
    
    xor r9, r9                      ; 4th parameter
    mov r9w, WordVar    
    
    xor rax, rax                    ; 5th parameter   
    mov ah, ByteVar      
    mov byte ptr [rsp + 32], ah     ; '32' represents the size of the stack reserved for the register parameters of "AsmCallFunction". Thus a 5th parameter will start at this offset from RSP

    call printf                     ; Calling printf

    leave                           ; "mov rsp, rbp" & "pop rbp"
    ret


AsmCallFunction ENDP

end

Last updated