Malware Optimization: Entropy Reduction & Compile Settings

Binary Entropy Reduction

Entropy refers to the degree of randomness within a provided data set.

Shannon's Entropy produces a value between 0 and 8. As the level of randomness in the data set increases, so does the entropy value.

Malware generally have a higher entropy value than ordinary files due to encrypted or packed data

To calculate a File's Entropy we can use the following python script:

import sys
import math
import pefile

# output the entropy of specific data 'buffer'
def calc_entropy(buffer):
    if isinstance(buffer, str):
        buffer = buffer.encode()
    entropy = 0
    for x in range(256):
        p = (float(buffer.count(bytes([x])))) / len(buffer)
        if p > 0:
            entropy += - p * math.log(p, 2)
    return entropy

# print help
def printhelp():
    print("[i] Usage:\n" + 
          "\t- python.exe EntropyCalc.py <filename>\n" +
          "\t- python.exe EntropyCalc.py <-pe> <pe filename>")
    sys.exit(1)
    
# output the entropy of the file as a whole
def calc_file_entropy(filename):
    try:
        with open(filename, "rb") as f:
            buf = f.read()
            entropy = calc_entropy(buf)
            print(f"Entropy Of {filename} As A Whole File Is : {entropy:.5f}")
    except FileNotFoundError:
        print(f"[!] Error: \"{filename}\" Is Not A Valid File")
        printhelp()

# if input is only the file's name 
if len(sys.argv) == 2:
        calc_file_entropy(sys.argv[1])
        sys.exit(1)

# else if input is with the "-pe" argument
elif len(sys.argv) == 3 and sys.argv[1] == "-pe":
    try:
        PEfile = pefile.PE(sys.argv[2])
        print(f"[i] Parsing {sys.argv[2]}'s PE Section Headers ... ")
        for section in PEfile.sections:
            name = section.Name.rstrip(b'\x00').decode()
            entropy = calc_entropy(section.get_data())
            color_code = 31 + (hash(name) % 6)  # choose color based on section name
            print(f"\t>>> \033[{color_code}m\"{name}\"\033[0m Scored Entropy Of Value: \033[{color_code}m{entropy:.5f}\033[0m")
    except FileNotFoundError:
        print(f"[!] Error: \"{sys.argv[2]}\" Is Not A Valid File")
        printhelp()
    except pefile.PEFormatError:
        print(f"[!] Error: \"{sys.argv[2]}\" Is Not A Valid PE File")
        printhelp()
            
else:
    printhelp()

- Algorithm Selection

When encrypting, we can choose an algorithm that does not change the overall entropy. However, these algorithms are usually weak.

Another solution is to use obduscation algorithms such as IPv4fuscation, since these data have a degree of organization and order and do not significantly increase entropy.

- English String

Inserting English strings may reduce entropy since English letters consist of only 26 characters, which means that there are only 26 * 2 (upper and lower case letters) different possibilities for every single byte saved.

The downside is that these strings can be used as signatures to later detect the malware.

- CRT Library Removal

Removing the CRT library can significantly reduce the entropy of the final implementation. This is shown in the Malware Compiling Settings Section.

- Automated Tools for entropy reduction

Malware Compiling Settings

Modifying Visual Studio's compiler settings can have changes on the produced binary such as reducing the size, lowering entropy and increasing compatibility.

- Multi-threaded (/MT)

By default, when compiling an application, the Runtime Library option in Visual Studio is set to "Multi-threaded DLL (/MD)". With this option, the CRT Library DLLs are linked dynamically which means they are loaded at runtime. This creates compatibility issues.

We should change it to the following: Properties > C/C++ > Code Generation > Runtime Library > Multi-threaded (/MT)

After removing the CRT Library, the program can only be compiled in Release mode.

- Disable C++ Exceptions

As the CRT Library is no longer linked, this option is not necessary and should be disabled: Properties > C/C++ > Code Generation > Enable C++ Exceptions > No

- Disable Whole Program Optimization

The Whole Program Optimization should be disabled to prevent the compiler from performing optimizations that may affect the stack: Properties > C/C++ > Optimization > Whole Program Optimization > No

- Disable Debug Info

To remove the added debugging information:

  • Properties > Linker > Debugging > Generate Debug Info > No

  • Properties > Linker > Manifest File > Generate Manifest > No

- Ignore All Default Libraries

To exclude the default system libraries from being linked by the compiler with the program: Properties > Linker > Input > Ignore All Default Libraries > Yes (/NODEFAULTLIB)

! This may result in compiling errors, it is the responsibility of the user to provide any required functions that are usually provided by these default libraries.

- Setting Entry Point Symbol

To solve the mainCRTStartup issue: Properties > Linker > Entry Point > Advanced > main

- Disable Security Check

To solve the "__security_check_cookie" issue: Properties > C/C++ > Code Generation > Security Check > Disable Security Check (/Gs-)

- Disable SDL Checks

To solve the "overriding '/sdl' with '/GS-'" issue: Properties > C/C++ > General > SDL checks > No (/sdl-)

- Hiding The Console Window

Properties > Linker > System > SubSystem > Windows (/SUBSYSTEM:WINDOWS)

CRT Library Functions Replacements

When removing the CRT Library, writing one's own version of functions such as printf, strlen, strcat, memcpy is necessary.

To obtain replacements: https://github.com/vxunderground/VX-API

- Printf Replacement

#include <Windows.h>
#include <stdio.h>

#define PRINTA( STR, ... )                                                                  \
    if (1) {                                                                                \
        LPSTR buf = (LPSTR)HeapAlloc( GetProcessHeap(), HEAP_ZERO_MEMORY, 1024 );           \
        if ( buf != NULL ) {                                                                \
            int len = wsprintfA( buf, STR, __VA_ARGS__ );                                   \
            WriteConsoleA( GetStdHandle( STD_OUTPUT_HANDLE ), buf, len, NULL, NULL );       \
            HeapFree( GetProcessHeap(), 0, buf );                                           \
        }                                                                                   \
    }  


int main() {
   PRINTA("Hello World ! \n");
   return 0;
   
}

- Intrinsic Function Usage

To force the compiler to deal with custom functions instead of using the CRT exported version we can use intrinsic functions.

For example, a custom version of the memset function can be specified to the compiler in the following manner, using the intrinsic keyword:

#include <Windows.h>

// The `extern` keyword sets the `memset` function as an external function.
extern void* __cdecl memset(void*, int, size_t);

// The `#pragma intrinsic(memset)` and #pragma function(memset) macros are Microsoft-specific compiler instructions.
// They force the compiler to generate code for the memset function using a built-in intrinsic function.
#pragma intrinsic(memset)
#pragma function(memset)

void* __cdecl memset(void* Destination, int Value, size_t Size) {
	// logic similar to memset's one
	unsigned char* p = (unsigned char*)Destination;
	while (Size > 0) {
		*p = (unsigned char)Value;
		p++;
		Size--;
	}
	return Destination;
}


int main() {
	
	PVOID pBuff = HeapAlloc(GetProcessHeap(), 0, 0x100);
	if (pBuff == NULL)
		return -1;

    // this will use our version of 'memset' instead of CRT's Library version 
	ZeroMemory(pBuff, 0x100);

	HeapFree(GetProcessHeap(), 0, pBuff);

	return 0;
}

Last updated