Understanding Assembly
Assembly language is a low-level programming language that closely corresponds to machine code, using mnemonics to represent CPU instructions. It provides direct control over hardware and memory, making it essential for tasks requiring granular analysis and manipulation. Assembly language is foundational for cybersecurity because it enables deep introspection and manipulation of software behavior. Mastery of assembly equips professionals to reverse engineer binaries, dissect malware, and develop exploits, bridging the gap between high-level abstractions and hardware-level execution. This low-level expertise is crucial for both defending systems and understanding adversary tactics.
x86
CISC (Complex Instruction Set Computer)
Here is a really good explanation of registers, instructions, stack and calling conventions:
https://www.cs.virginia.edu/~evans/cs216/guides/x86.html https://www.eecg.utoronto.ca/~amza/www.mindsec.com/files/x86regs.html https://sonictk.github.io/asm_tutorial/ https://github.com/sonictk/asm_tutorial/tree/master
CPU instructions to memory: MOV, PUSH, POP, LEA CPU instructions to I/O Devices: IN, INS, INB, OUT, OUTS, OUTB https://redirect.cs.umbc.edu/~cpatel2/links/310/slides/chap11_lect08_IO1.pdf
Registers:
General registers EAX EBX ECX EDX
Segment registers CS DS ES FS GS SS
Index and pointers ESI EDI EBP EIP ESP
Indicator EFLAGS
General registers
As the title says, general register are the one we use most of the time Most of the instructions perform on these registers. They all can be broken down into 16 and 8 bit registers.
32 bits : EAX EBX ECX EDX
16 bits : AX BX CX DX
8 bits : AH AL BH BL CH CL DH DL
The “H” and “L” suffix on the 8 bit registers stand for high byte and low byte. With this out of the way, let’s see their individual main use
EAX,AX,AH,AL : Called the Accumulator register.
It is used for I/O port access, arithmetic, interrupt calls,
etc...
EBX,BX,BH,BL : Called the Base register
It is used as a base pointer for memory access
Gets some interrupt return values
ECX,CX,CH,CL : Called the Counter register
It is used as a loop counter and for shifts
Gets some interrupt values
EDX,DX,DH,DL : Called the Data register
It is used for I/O port access, arithmetic, some interrupt
calls.
Indexes and pointers
Indexes and pointer and the offset part of and address. They have various uses but each register has a specific function. They some time used with a segment register to point to far address (in a 1Mb range). The register with an “E” prefix can only be used in protected mode.
ES:EDI EDI DI : Destination index register
Used for string, memory array copying and setting and
for far pointer addressing with ES
DS:ESI EDI SI : Source index register
Used for string and memory array copying
SS:EBP EBP BP : Stack Base pointer register
Holds the base address of the stack
SS:ESP ESP SP : Stack pointer register
Holds the top address of the stack
CS:EIP EIP IP : Instruction Pointer
Holds the offset of the next instruction
It can only be read
Instructions
Arithmetics instructions: ADD, SUB, INC, DEC, IMUL, IDIV, AND, OR, XOR, NOT, NEG, SHL and SHR
They can be easy to explain from a Logic Door and Electronic circuit perspective.
- MOV: moves between registers, memory and constants indifferently.
- PUSH: adds the register passed as argument to the top of the stack and decrement stack pointer (enlarge stack)
- POP: gets the value on the top of the stack and puts it in the register passed as paramter, it increments the stack pointer (shorten stack)
- LEA: Load Effective Address, given a certain pointer it can get the actual address of the pointer, for example the top of the stack (RSP)
- LEA QWORD rax, [rsp]: saves address pointing to the top of the stack (not the value at the top of stack) into register A.
Branching:
- JMP: Jumps to the address passed by argument, it means that RIP now points to that address, unconditionally.
- CALL: is the same as JMP, it additionally (before the JMP) saves the current RIP into the top of the stack.
- INT: an interruption is the same as CALL, but you will jump to a direction that you don’t know of in the interruption table (BIOS) you pass the index of that table as an argument. More on this here
- INT 0x80 is used for system calls, it would be similar than the modern and faster SYSCALL instruction.
- RET: contrarily to CALL, RET will pop the direction to jump from the top of the stack, so after the subroutine is finished you need to clean the stack to have return direction that CALL instruction pushed.
- CMP: Compare the values of the two specified operands, setting the condition codes in the machine status word appropriately. It will put a 0 or a 1 in the EFLAGS register bit for this matter. After this instruction the conditional JMP instruction will check this EFLAGS bit and do their thing, namely:
- JE: jump when equal
- JNE: jump when not equal
- JZ: jump when last result was zero
- JG: jump when greater than
- JGE: jump when greater than or equal to
- JL: jump when less than
- JLE: jump when less than or equal to
Additional Mnemonics:
- LEAVE: is used as subroutine suffix to return stack pointer to base pointer (cleaning the stack) just before RET instruction.
- PUSHA and POPA: is the same as PUSH and POP, but putting and getting all the general purpose registers into and from the stack, instead of just the one passed by argument.
- LOOP: its a conditional jump to make loops with RCX as a counter to 0.
Memory Layout
Other program segments
You may have noticed in the the diagram above that in addition to the stack and heap, there are separate, distinct regions of memory in the program’s address space that I drew, namely the bss, data and text segments. If you recall the Hello, world section example, you might also have realized that there are labels in that example that line up with the names of these program segments.
Well, that’s no coincidence; it turns out that is what is being defined in those regions, and these regions exist as a result of the PE executable file format, which is a standardized format that Microsoft dictates for how executable’s memory segments should be arranged in order for the operating system’s loader to be able to load it into memory.
Crossing the platforms
On Linux/OS X, the executable file format is known as the Executable and Linkable Format, or ELF. While there are differences between it and the PE file format used by Windows, the segments we’ll be dealing with here exist on both, and function the same way on both. On a fun note, Fabian Giesen has an amusing tweet regarding the name.
Let’s go over what each of the major segments are for.
- Block Started by Symbol (BSS): This segment of memory is meant for variables that are uninitialzed (e.g. int a;). The name, as is the case with many things in assembly, has a very long history.
- Data segment: This segment of memory contains constants or initialized data (e.g. int a = 5; or const int a = 5;). This is fairly straightfoward.
- Text segment, also known as the code segment: This contains the actual assembly instructions, hence the name.
Virtual Memory Address System
If you’ve never heard of the term virtual memory before, this section is for you. Basically, every time you’ve stepped into a debugger and seen a hexadecimal address for your pointers, your stack variables, or even your program itself, this has all been a lie; the addresses you see there are not the actual addresses that the memory resides at.
What actually happens here is that every time a user-mode application asks for an allocation of memory by the operating system, whether from calling HeapAlloc, reserving it on the stack through int a[5] or even doing it in assembly, the operating system does a translation from a physical address of a real, physical addressable location on the hardware to a virtual address, which it then provides you with to perform your operations as you deem fit. Processes are mapped into a virtual address space, and memory is reserved by the operating system for that space, but only in virtual memory; the physical hardware could still have memory unmapped to that address space until the user-mode process actually requests an allocation.
If the memory is already reserved and available for use, the CPU translates the physical addresses into virtual addresses and hands them back to the user-mode process for use. If the physical memory is not available yet (i.e. perhaps because the CPU caches are full and we now need to use the DDR RAM sticks’ storage instead), the OS will page in memory from whatever available physical memory is available, whether that be DDR RAM, the hard drive, etc.
The translation of the physical memory addresses of the hardware to virtual memory addresses that the operating system can distribute to applications is done using a special piece of hardware known as the Memory Management Unit, or MMU. As you might expect, if every single time a program requested memory required a corresponding translation operation, this might prove costly. Hence, there is another specialized piece of hardware known as the Translation Look-aside Buffer, or TLB cache. That also sits on the CPU and caches the result of previous addresses translations, which it then hands to the operating system, which in turn hands it to the application requesting the allocation.
Calling Conventions
EFLAGS register
x64
Basically the same but registers start with R instead of with E: RAX, RBX, RSP, RIP,…
In RFLAGS we basically use the same as EFLAGS because 32-63 bits are reserved.
Calling Convention
Recall that some calling conventions require parameters to be passed on the stack on x86. On x64, most calling conventions pass parameters through registers. For example, on Windows x64, there is only one calling convention and the first four parameters are passed through RCX, RDX, R8, and R9; the remaining are pushed on the stack from right to left. On Linux, the first six parameters are passed on RDI, RSI, RDX, RCX, R8, and R9.
ntdll.dll:
*************************************************************
* FUNCTION
*************************************************************
int __fastcall NtAllocateVirtualMemory (int pHandle , voi
assume GS_OFFSET = 0xff00000000
int EAX:4 <RETURN>
int ECX:4 pHandle
void * RDX:8 allocAddr
int R8D:4 some
void * R9:8 allocSize
int Stack[0x28]:4 mem
int Stack[0x30]:4 page
Bits, bytes and sizes
At this point, it’s also probably a good idea to give a quick refresher for the non-Win32 API programmers out there who might not be as familiar with the nomenclature of the Windows data types.
Bit: 0 or 1. The smallest addressable form of memory.
Nibble: 4 bits.
Byte: 8 bits. In C/C++, you might be familiar with the term char.
WORD: On the x64 architecture, the word size is 16 bits. In C/C++, you might be more familiar with the term short.
DWORD: Short for “double word”, this means 2 × 16 bit words, which means 32 bits. In C/C++, you might be more familiar with the term int.
QWORD: quad word, this is for 64 bits. NASM syntax as well.
oword: Short for “octa-word” this means 8 × 16 bit words, which totals 128 bits. This term is used in NASM syntax.
yword: Also used only in NASM syntax, this refers to 256 bits in terms of size (i.e. the size of ymm register.)
float: This means 32 bits for a single-precision floating point under the IEEE 754 standards.
double: This means 64 bits for a double-precision floating point under the IEEE 754 standard. This is also referred to as a quad word.
Pointers: On the x64 ISA, pointers are all 64-bit addresses.
Ring 0
There are certain instructions that will only work with Ring 0 intel privilege level.
- HLT (Halt)
- MOV to/from Control Registers (CR0, CR4)
- IN/OUT (Input/Output)
- CLI/STI
- LGDT/LDT (Load Global/Local Descriptor Table)
- SMI (System Management Interrupt)
Hello world in ASM
https://neetx.github.io/posts/windows-asm-hello-world/
test.asm
; wsl nasm -f win64 test.asm
; link test.obj /defaultlib:Kernel32.lib /subsystem:console /entry:main /LARGEADDRESSAWARE:NO /out:test.exe
global main
; kernel32 functions to use in printcustom
extern GetStdHandle
extern WriteFile
section .text
printcustom:
push rbp
mov rbp, rsp
sub rsp, 40 ; shadow space added to stack section
; this would be optional in case of stack variables
; but it is mandatory for Windows x64 calling convention when you call more than 4 arguments
mov r8, [rdx] ; store 2nd argument into 3rd argument to WriteFile
mov rdx, rcx ; store 1st argument into 2nd argument to WriteFile
mov rcx, 0fffffff5h
call GetStdHandle
mov rcx, rax ; 1st argument StdOut Handle
mov r9, BytesWritten ; 4th argument bytes written
mov dword [rsp + 32], 00h ; 5th argumnet using shadow space
call WriteFile
;add rsp, 40 ; clear shadow space
;mov rsp, rbp ; restores base pointer into stack pointer
leave ; with leave we can achieve both previous instructions
ret ; pop the stack (to find previous rip in main) and puts it in rip
main:
mov rcx, Buffer
mov rdx, BytesToWrite
call printcustom ; call pushes rip into the stack and sets location as current rip
mov rcx, newline
mov rdx, newlineBytes
call printcustom
mov rcx, Buffer2
mov rdx, BytesToWrite2
call printcustom
ExitProgram:
xor eax, eax
ret
section .data
Buffer: db 'Hello, World!', 00h
BytesToWrite: dq 0eh
Buffer2: db 0x40,0x41,0x42,0x00 ; msfvenom -f powershell
BytesToWrite2: dq 4h
newline: db 0ah
newlineBytes: dq 1h
section .bss
BytesWritten: resd 01h
Assembly tool
When assembling with NASM, the -f win64 option plays a crucial role in how the output file is structured. Here’s a breakdown of what happens:
Binary Format (-f bin)
- In binary format (-f bin), NASM generates a simple object file containing only the raw machine code instructions you wrote in your assembly code file (.asm).
- This format lacks information about sections, headers, or symbols needed for linking and creating an executable file.
- It’s often used for specific purposes like embedding raw machine code into another program or for learning the low-level instruction encoding.
Win64 Format (-f win64)
- When you use -f win64, NASM assembles the code for the 64-bit Windows platform (x86-64) and creates an object file in the Portable Executable (PE) format.
- This format is much richer than the binary format and includes several crucial elements:
- Sections: Your assembly code is organized into different sections (like .text for code, .data for data) based on their purpose.
- Headers: The PE file contains headers that provide information about the file organization, entry point, dependencies, etc.
- Symbols: NASM can optionally include symbol information for functions and variables defined in your code, which is helpful for debugging and linking.
- This PE format object file can then be linked with other object files and libraries to create a final executable file (.exe) or a Dynamic-Link Library (DLL) on Windows.
Key Points:
-f bin provides a very basic object file with just raw instructions.
-f win64 creates a PE format object file suitable for linking and generating Windows executables or DLLs.
While -f win64 generates sections, it still doesn’t include the .idata section at this stage because import information is determined during linking.
Import Information and .idata in ELF:
- Just like with -f win64, using -f elf64 during assembly won’t create the .idata section yet.
- The ELF format also has a mechanism for handling external library dependencies, but it uses a different structure compared to PE.
- In ELF, the linker is responsible for generating the import information and its corresponding sections based on the program’s dependencies on shared libraries. These sections might include names like .got (Global Offset Table) and .plt (Procedure Linkage Table) which play a similar role to the IAT in PE format.
Examples:
Assembling into shellcode:
nasm test.asm (does not work with external names, file has to specify BITS 32 or BITS 64)
This can be useful for study, get opcodes and dissassemble bytes:
ndisasm -b 64 test (-b flag depends on the BITS 64 instruction of the assembly code)
Assembling into obj (with PE headers):
C:\Users\nobody\Desktop\assembly>nasm -f win64 test.asm
Linking (setting up IAT in the .idata section):
C:\Users\nobody\Desktop\assembly>link test.obj “C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\um\x64\kernel32.lib” /subsystem:console /entry:main /LARGEADDRESSAWARE:NO /out:test.exe
also: link test.obj /defaultlib:Kernel32.lib /subsystem:console /entry:main /LARGEADDRESSAWARE:NO /out:test.exe
you can also link and provide permissions to specific sections (in addition to other flags, as well)
link test.obj /entry:main /section:.custom,RWE /out:main.exe
https://academy.hackthebox.com/module/227/section/2493
Tools:
link is included into VS C/C++ toolchain
[nasm] (https://nasm.us) or using wsl, linux or macOS
Linux
In linux you can link with ld command in the binutils package
# Assemble the .asm file to an object file
nasm -f elf64 test.asm -o test.o
# Link the object file to create executable
ld test.o -o main
# Assemble 32bit
nasm -f elf32 test.asm -o test.o
# Link
ld -m elf_i386 test.o -o main
full link:
ld test.o -o main --entry=main --section-flags .custom=alloc,write,execute
???