The use of packing, shellcode execution and in-memory Dynamic Loaded Library (DLL) loading is very common in the malware scene. This can be quite tedious to extract the real payload by sole static analysis techniques. A dynamic approach can help the reverser to find a near generic method to de- obfuscate stages n+1. This is what TEHTRIS offers you the “Shellcode extraction” open source tool. This tool does not pretend to be 100% reliable but find quick-win in most of the cases.
This article describes the capabilities of this tool made by TEHTRIS who helps to extract payloads from a Portable Executable (PE) file. It aims to provide a minimalist Sandboxing environment to track the Self Modifying Code (SMC) with limited memory consumption.
The pin [0] Application Programming Interface (API) already offers SMC detection callbacks [1]:
typedef VOID(* LEVEL_PINCLIENT::SMC_CALLBACK) (ADDRINT traceStartAddress, ADDRINT traceEndAddress, VOID *v)
Unfortunately, this API does not always detect SMC in kernel32!VirtualAlloc memory pages. Also, more features are required to help the reverser:
- Automatic dump of the shellcode
- Dump PE file in case of manual DLL loading (outside kernel32!LoadLibrary)
- Dump the trace which performs the decryption
- Log the Original Entry Point (OEP)
When detected, the reverser has immediate access to the payloads and the address/dump of the decryption routine. This may help to save time during decryption and generate signatures. The manual debugging or static analysis is a time-consuming process which an analyst cannot always afford to waste when quick reaction time is critical.
The code can be found in a github repository [2].
Analysis
A generic example of obfuscation tricks commonly found in the wild has been generated for demonstration purposes.
Code
A test sample has been generated using mingw32, then UPX [5] packed:
#include <windows.h>
#include <stdio.h>
int main()
{
char shellcode[] = "\x91\x91\x91\xc2";
// Alloc memory
LPVOID addressPointer = VirtualAlloc(NULL, sizeof(shellcode), MEM_COMMIT, PAGE_EXECUTE_READWRITE);
if(!addressPointer) {
printf("Fail to allocate\n");
return 0;
}
// unxor
for(size_t i=0; i<sizeof(shellcode); i++) {
((BYTE *)addressPointer)[i] = shellcode[i] ^ 1;
}
// Create thread pointing to shellcode address
CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)addressPointer, NULL, 0, 0);
// Sleep for a second to wait for the thread
Sleep(1000);
return 0;
}
There is 2 layers obfuscation: UPX + xor 1. To trigger a SMC detection, the shellcode will be executed in separate thread.
The payload consists only in nop/ret instructions: x91x91x91xc2 before xor, x90x90x90 xc3. As it is a pretty trivial example, this can be enough to hide strings and fool YARA rules against malicious code. In conjunction with anti-debug techniques, the analysis can be very tedious.
Behaviour
The in-memory manipulations are taking place as follows:

Demo on the example sample
It is time to show the Shellcode Extraction Tool in action against the previous sample. To analyse, simply run the script:
quickShellcodeDetector /home/user/samples/shellcode.exe /dev/shm
The logs show that UPX unpacked sample and the shellcode has correctly been decrypted. The full log extract for this tool is pasted in the code box below. Every trace is prefixed by the Process IDentifier (PID) of the process to avoid collision with sub processes if the shellcode address is the same:
[INFO] ShellcodeDetector.cpp:503 Starting program: pid=7360
[INFO] ShellcodeDetector.cpp:446 IMG_LOAD: F:\pin-3.18-98332gaebd7b1e6-msvc-windows\source\tools\QuickDetector\shellcode.exe addr=0x00370000 id=1 size=0x1c000
[INFO] ShellcodeDetector.cpp:446 IMG_LOAD: C:\Windows\SysWOW64\ KernelBase.dll addr=0x76d80000 id=2 size=0x214000
[INFO] ShellcodeDetector.cpp:446 IMG_LOAD: C:\Windows\SysWOW64\ kernel32.dll addr=0x77300000 id=3 size=0xf0000
[INFO] ShellcodeDetector.cpp:446 IMG_LOAD: C:\Windows\SysWOW64\ntdll .dll addr=0x77560000 id=4 size=0x1a3000
[INFO] ShellcodeDetector.cpp:446 IMG_LOAD: C:\Windows\SysWOW64\ apphelp.dll addr=0x74c80000 id=5 size=0x9f000
[INFO] ShellcodeDetector.cpp:113 Found obfuscation routine at: 0 x37105e (F:\pin-3.18-98332-gaebd7b1e6-msvc-windows\source\tools\ QuickDetector\shellcode.exe+0x105e)
[INFO] ShellcodeDetector.cpp:115 Dumping trace into: C:\Users\user\ AppData\Local\Temp\ShellcodeDetector\0x1cc0_0x0037105e.trc
[INFO] ShellcodeDetector.cpp:239 Dumping ShellCode: C:\Users\user\ AppData\Local\Temp\ShellcodeDetector\0x1cc0_0x0b230000.bin ep=0 x00000000 size=0x1000
[INFO] ShellcodeDetector.cpp:254 Dumped: 4096 bytes
[INFO] ShellcodeDetector.cpp:446 IMG_LOAD: C:\Windows\SysWOW64\ kernel.appcore.dll addr=0x74000000 id=6 size=0xf000
[INFO] ShellcodeDetector.cpp:446 IMG_LOAD: C:\Windows\SysWOW64\
msvcrt.dll addr=0x77400000 id=7 size=0xbf000
[INFO] ShellcodeDetector.cpp:446 IMG_LOAD: C:\Windows\SysWOW64\
rpcrt4.dll addr=0x76150000 id=8 size=0xc0000
[INFO] ShellcodeDetector.cpp:463 Done in 3 seconds
[INFO] ShellcodeDetector.cpp:467 Freing image: 1
[INFO] ShellcodeDetector.cpp:467 Freing image: 2
[INFO] ShellcodeDetector.cpp:467 Freing image: 3
[INFO] ShellcodeDetector.cpp:467 Freing image: 4
[INFO] ShellcodeDetector.cpp:467 Freing image: 5
[INFO] ShellcodeDetector.cpp:467 Freing image: 6
[INFO] ShellcodeDetector.cpp:467 Freing image: 7
[INFO] ShellcodeDetector.cpp:467 Freing image: 8
Both the obfuscation routine and the payload have been found. The log file indicates the address of the trace which deobfuscates the payload:
Found obfuscation routine at: 0x37105e (F:\pin-3.18-98332-gaebd7b1e6msvc-windows\source\tools\QuickDetector\shellcode.exe+0x105e)
The trace found at shellcode.exe+0x105e, exported in the 0x1cc0_0x0037105e.trc file:

The IDA [4] trace matching Relative Virtual Address (RVA) (at 0x37105e) helps to find the xor 1 (this is the same trace as extracted by the tool):

The logfile indicates the address of shellcode along with its entry point relative to the address base of the memory allocation:
[INFO] ShellcodeDetector.cpp:239 Dumping ShellCode: C:\Users\user\ AppData\Local\Temp\ShellcodeDetector\0x1cc0_0x0b230000.bin ep=0 x00000000 size=0x1000
This is the extracted shellcode, found at address 0x0b230000 exported in the 0x1cc0_0x0b230000.bin file:

The shellcode size is 4096 bytes due to page alignment forced by memory allocation.
Architecture
The tool is pin based and does not needs any dependency. The following schema describes the overall architecture. This is a simple Virtual Machine (VM) based architecture using VirtualBox [3] CLI.

The commands are passed throw guest additions which is not ideal for stealth. This should be replaced by an agent in the future.
Tool internals
The tool is trying to keep every memory access with minimal memory consumption. To achieve this, the tool logs for each trace its W/X access to memory blocks determined by kernel32!VirtualQuery.
This shortcut allows not to double the memory allocated and compare each copy.
Only the trace and memory region are logged. The following structures are used to log the events:
typedef struct _TRACEACCESS {
UINT32 access_type;
ADDRINT membase;
} TRACEACCESS, *PTRACEACCESS;
typedef struct _TRACE {
ADDRINT address;
USIZE length;
size_t accessnb;
PTRACEACCESS access;
} *PTRACE;
typedef struct _MEMACCESS {
size_t tracenb;
PTRACE trace;
} MEMACCESS, PMEMACCESS;
These structures are provisioned by the W/X callbacks.

The collision detection is performed in real time to ensure the memory dump is processed as soon as the page is executed for the first time. This ensures in most of the cases the payload to be fully deobfuscated.
A very simple yet efficient PE parser is included to parse the PE header then DUMP the DLL.
Conclusion
This tool is quite handy when an analyst needs to extract an obfuscated payload in a sample with few protections. There are many workarounds and detection possible, this could be improved in the future.
Bibliography
[2] https://github.com/tehtris-hub/ShellCodeDetector
[3] https://www.virtualbox.org/