Thursday, October 20, 2016

This one weird trick for decoding DLL malware strings

TL;DR: argtracker and ctypes. It's the ctypes part that surprised me. Read on to see why.

This procedure can make light work of decoding strings in a DLL that has a horrifying string decoder or contains a metric ton of strings. The first stage leans on code that's already out there, with a bit of duct tape to get to the second stage; the second stage is to load your malware and call into it. There's just one stick-in-the-mud limitation: it has to be a file you can load into your address space using LoadLibrary, such as a DLL. Otherwise, you have to use a different kind of tool (I'll discuss this later).

First of all, gather all the strings you want to decode. Jay Smith wrote a very cool tool for this that uses Vivisect to emulate code and locate arguments. It's called argtracker. Don't duplicate it like I was starting to do with idaapi. Please, for the love of all that is lazy, just download it and get it installed.

The IDA Python script below is basically the code from the FireEye blog with a second function added to print all the encoded strings out so you can feed them to the second stage of this procedure. If your strings aren't printable prior to decoding, then you'll need to change this up a bit.

import vivisect
import flare.argtracker as c_argtracker
import flare.jayutils as c_jayutils

# Obtain the address where each argument is referenced by the decoder along
# with the offset that was referenced
def get_first_push_arg(decoder):
    ret = []
    vw = c_jayutils.loadWorkspace(c_jayutils.getInputFilePath())
    tracker = c_argtracker.ArgTracker(vw)
    xrefs = idautils.CodeRefsTo(decoder, 1)
    for xref in xrefs:
        argslist = tracker.getPushArgs(xref, 1)
        for argdict in argslist:
            va_at, offset = argdict[1]
            ret.append(argdict[1])
    return ret

# Now go get each string
def print_va_off_and_contents(pushed_args):
    print('refva, off, argcontents')
    for (va_at, offset) in pushed_args:
        print(hex(va_at) + ', ' + hex(offset) + ', ' + GetString(offset, -1, 0))
        # https://www.hex-rays.com/products/ida/support/idadoc/283.shtml
        # 0 <= ASCSTR_C
        # 3 <= ASCSTR_UNICODE

Provide your decoder's virtual address to get_first_push_arg, and then supply the returned list to print_va_off_and_contents to get something you can massage into shape for the second stage. Yes, I know, I'm using print instead of Python's logging module. The title of this blog was actually going to have the word "lazy" in it. Maybe it still should. Anyway...

Second and final step: load the malware and call its decoder. The interesting thing I learned is that Python ctypes can call non-exported functions. What a happy surprise! First, you have to define a function prototype, then you obtain a callable by hooking that prototype to an address in your binary where the function lives. There are prototypes for stdcall (WINFUNCTYPE) and cdecl (CFUNCTYPE). We're using stdcall. Here's a convenient snippet along with the string decoding goodness.

from ctypes import *

# Modify all this
offset = 0x4321                             # Decoder offset in your mal DLL
strings = [                                 # Populate from stage 1 (above)
    [0x10001234, "ABCdef"],
    [0x10005678, "ZYX990"],
    ...
]
dll = cdll.my_malware_dll                   # Modify to load your DLL
prototype = WINFUNCTYPE(c_char_p, c_char_p) # Stdcall, accepts & returns char*

# Leave this alone
string_decoder_addr = dll._handle + offset
decode = prototype(string_decoder_addr);

for (va, s) in strings:
    print(hex(va) + ' ' + s + ' -> ' + decode(s))

Simple, dimple. Paste the strings from IDA Pro into this script, ctypes loads and calls into the malware, and Bob's your uncle. For extra credit, you can update this script to emit another script that will create the appropriate comments or bookmarks in IDA Pro. This ctypes procedure works great for DLLs. Unfortunately, next time, it'll probably be an EXE and not a DLL. For those cases, you'll have to adapt this to a different tool, such as flare-dbg, to control malware execution and feed it the strings you want to decode. I'll talk more about tools and techniques for this another time.

No comments:

Post a Comment