Friday, July 19, 2019

Snooping Again

Back At It

As a red teamer in 2016, I found it painful to dredge up command, output, and IP address details about red team operations that had already concluded.

In 2016, I wrote about researching and developing a Microsoft Detours DLL to hook cmd.exe and its subordinate processes and log console I/O with a few system details. Cool project, but mostly experimental with several deficiencies.

But I'm writing to share that I've improved the tool, learned a few things, and am releasing the updated tool on GitHub for your red teaming pleasure.

Deficiencies, You Say?

When I first wrote this, Detours Express was only licensed for non-commercial use on 32-bit platforms. It was possible to use my DLL with the 32-bit cmd.exe, but if a 64-bit executable was invoked, it couldn't execute with the hooking DLL attached. My DLL also created a new log per process, which was inconvenient to review. Lastly, some CLI-driven tools including PowerShell use different Windows APIs to interact with the console, so their output never made it into the logs.

Two events motivated me to revisit and rectify these.

First, as of April 23, 2018, Galen Hunt and the Detours Team indicated that Detours 4.0.1 was freely available on GitHub, supporting "x86, x64 and other Windows-compatible processors (IA64 and ARM). It includes support for either 32-bit or 64-bit processes." This is great news! It allows malware analysts, red teamers, researchers, and developers, to instrument and extend many Windows userspace applications almost arbitrarily without any licensing encumbrances.  I'm using my old project to extol the joys of using Detours on both x86 and x64 for free.

Second, in a 2019 discussion, I overheard some red teamers wishing for a way to log time/date stamped console output, and those same red teamers also indicated it would be helpful to have IP address information available for the same reasons as I stated above. Cmder is a powerful console with logging, but I don't believe it has time/date stamps or IP address logging. Furthermore, I think its output contains ANSI color escape sequences that hinder those logs from being readily reviewed and presented.

Bringing it Up to Date

In the aforementioned blog post, I wrote in detail about using the Microsoft Detours' traceapi sample to observe API usage, form hypotheses, and arrive at discoveries about how one might hook and modify behavior. I did the same thing to learn that PowerShell uses ReadConsoleInput and WriteConsoleOutput to do its work instead of simple ReadConsole/WriteConsole.

I furthermore read the documentation for DetourCreateProcessWithDlls and used SysInternals' Process Monitor to figure out how to let Detours use a rundll32 helper process to load the correct architecture of my DLL into new subprocesses.

And finally, I arrived at a scheme that uses environment variables to establish a single log file path for a given command interpreter and its subprocesses to write to. Consequently, one may look in a single log file to review the command line session.

Oh, and in response to some friendly suggestions, I now crudely prevent the IP address information from being displayed for each and every command entered, provided it has not changed within a given CLI session.

Here's some log output so you can see roughly what it looks like.

Nested logging of cmd.exe/powershell.exe


Deficiencies that Will Remain

Alas, various CLI applications mix and match line endings, resulting in stray ^M characters. Perhaps smart I/O transformation on the part of my logging apparatus could eliminate this, but I find it simple and convenient to post-process the log files. For example, Vim has the following edit mode command:

:%s/^M//g

What if Cmder Just Adds These Features?

I'm certainly not competing with the fine folks who write Cmder. If they take the trouble to to add and support the relevant state logic and configuration settings to achieve these same ends, that would be pretty cool and useful. My project remains a fun study into how to research and instrument programs, and a nice example of the usefulness of mixed-architecture Detours usage.

The Link, Please?

Oh yeah, so here is the link to my project:

https://github.com/strictlymike/cmdlog

Enjoy!

Friday, May 18, 2018

Vimifier!


If you've seen my blog, you know I love Vim. And if you've worked with me, you know I hate when I have to use editors like Microsoft Word because I have to take my fingers off the home row so much to select text, frequently change its format, and generally get around and edit what I'm working on. I worked on a 72-page report last week where I lamented this extensively. Over the week, I started to think: what stops me from turning the keystrokes I want to type into the keystrokes I am typing?

For instance, when I want to move the cursor between words, I use Ctrl+Left and Ctrl+Right. And when I want to select several words, I hold down the shift key. I started to wonder how hard it might be to create a keyboard hook that would translate Vim-style shortcuts (like b, w, v, ...) into their Windows hotkey equivalents (Ctrl+Left, Ctrl+Right, Shift, ...).

Well, I tried it out, and yeah it's a bit of work, but it's worth it! I don't have screenshots or GIFs because it's hard to make it evident what keystrokes I'm using, but if you're a tinkerer then take a look and try it out. I think you'll be amused!

Get the source code under my GitHub profile at https://github.com/strictlymike/vimifier.

If you need a bare bones compiler to get this built, the quickest shortcut I know of is to grab the Microsoft Visual C++ Compiler for Python 2.7.

Thursday, December 21, 2017

Getting Into RE

This has been done before, but I find myself producing this information repeatedly, so what the hell, here's a blog article about it: how to get started in reverse engineering (RE). You'll need a VM (VirtualBox is free and works for me).

I'll first promote the resources that I used because that's what worked for me. Then I'll talk about how to get practice via a certain CTF, and share some resources that I believe have been useful to others.

PMA

That stands for Practical Malware Analysis. This book is already showing its age, but I still think it is the best all-in-one resource to learn reverse engineering fundamentals.

The approach I took that helped me really absorb the material was:
  1. Read the book through and just absorb it;
  2. Go back through for the labs, reviewing each chapter as necessary;
  3. When a lab takes more than a certain time (start with 30 minutes), use the back of the book for the answer;
  4. If you don't see the connection between what you've seen so far and the answer in the back of the book, read the extended answer to see how they got that; and,
  5. Always read the extended answer to glean any techniques that might be more efficient than the road you took.
It takes discipline to remember that you have limited time and you need to move through the labs if you are to learn and improve. If you're banging your head into the wall, you're that much closer to giving up, which is only okay if you have determined that this is simply not interesting to you anymore.

If you don't know assembly language well and you think it is hindering your ability to move through PMA, then suspend that process and take some time with...

x86 Assembly Language

I will suggest two main roads for learning x86 (and x64) assembly language, and a couple of other references to support them. The first and most accessible main resource is the one that a lot of my colleagues have said helped them: http://opensecuritytraining.info/

A lot of reverse engineers, both aspiring and established, say that Xeno's courses are where they learned assembly language, and it went really well for them, so with it being available online for free, I have to put it out there.

As for myself, my main resource for learning x86 assembly language was Richard Blum's book, Professional Assembly Language Programming. The book teaches with GNU tools, so it uses the AT&T syntax which is largely unpopular with the RE crowd, but on the upside, the GNU tools are superlatively easy to acquire and use on most Linux distributions.

Aside from those, the first chapter (x86/x64) of Practical Reverse Engineering reinforced and clarified some essential concepts for me.

Finally, for the definitive RTFM experience, Volume 2 of Intel's processor manuals contains the instruction set reference, which you can use to look up weird instructions you come across. If you're using IDA Pro, there is also an auto-comment mode in IDA Pro that may help remind you if you are just getting started.

Debugging

Tarik Soulami's book Inside Windows Debugging is an outstanding read about not just WinDbg but Windows internals. I can't recommend it emphatically enough.

FLARE-On Challenges

If you're done with PMA and ready for some practice, the FLARE-On Challenge binaries archived at http://flare-on.com/ pose a unique training opportunity for two reasons: first, because they deliberately mimic real malware; and second, because they are all accompanied by solution write-ups on fireeye.com:
When used for training, I suggest approaching these incrementally: attack level 1 of each, level 2 of each, level 3 of each, in turn. I also suggest treating them like the PMA labs: if you exceed a certain duration analyzing them, peek at the solution write-up and see if that gives you a shove in the right direction.

Things I Did That You Don't Have To

The first book I read about RE-related things was actually not PMA; it was The Shellcoder's Handbook, 1st Edition (2nd Edition is here). This was not a gentle introduction. But looking back, it taught me a lot of things that I refer back to very frequently. So maybe it was more formative for me than I even remember.

I also continue to get a lot out of reading Kyle Loudon's book, Mastering Algorithms in C. The book talks about all those things I consider to be magic, especially crypto and compression.

Other Resources


See the ellipsis? I'm going to tack things onto this article as I learn about them. If you know of one, hollar. It's easy for me to remember what helped ME, but I could use a reminder of what others have found helpful.

Friday, November 3, 2017

x86 Assembly Language Brush-Up

A buddy of mine is reviewing x86 assembly, so I thought I would write a brief primer on common x86 assembly language instructions. If you aren't yet familiar with the x86 register file, check out Remember the Registers first for a super-quick overview.

On a side-note, I laughed when I looked back at that other article, because it starts the same way: "oh, hey, a friend is about to learn x86 assembly, so I thought I would write this quick article!" So I guess the lesson here is: Parents, talk to your kids about assembly language... or else their friends will! }:-)

With no further ado, I'll get this thing moving with...

mov

Much of reverse engineering entails following the flow of data backward and forward as it moves through registers and memory. The mov instruction is the most commonly used instruction and the instruction you'll most often have to read to know where data is going.

lea

lea stands for load effective address. The lea instruction is supposed to give you a pointer to something rather than dereferencing the pointer and giving you the actual data. In reality, though, it just computes the sum or other expression in the square brackets and moves it to the specified location. Take, for example, the following instruction:

lea eax, [ebp-218h]

The eax register in this case will receive ebp minus 0x218, which is the address of some local variable. Compare this with:

mov eax, [ebp-218h]

Which actually dereferences ebp-0x218 to retrieve the contents of that local variable in the function stack frame and puts that value into eax.

Since the lea instruction in all reality just computes the value of the expression in the brackets, it can also be used to evaluate complex expressions involving multiplication and addition. If you see some values that can't possibly be addresses getting used with the lea instruction, you might be right. The program may be merely computing a value rather than working with memory addresses.

push

Data goes on the stack, usually for a function call.

Some compilers will also emit code to push an immediate operand (a constant value, e.g. 0) and then pop it to a register, like this:

push 4 ; Put the number 4 on the stack
pop eax ; The number 4 winds up in eax

call

The processor pushes the address of the next instruction and transfers control to a procedure of the programmer's choosing. This is kind of equivalent to lines 3-5 below:

1   push arg2               ; Push function arguments as normal
2   push arg1
3   push offset L_nextinstr ; Save the address of the next instruction on the stack
4   jmp procedure           ; Transfer control
5 L_nextinstr:
6   test eax, eax           ; Resume normal stuff like checking return value
7 

jmp

This is another way to transfer control, usually within a procedure, but sometimes to a procedure.

retn N

When you see a return instruction followed by a number, the function is cleaning up its own stack, which means it is stdcall (the standard calling convention for Microsoft Win32 APIs).

add

I mention this instruction now because it is used in the other calling convention, cdecl, to efficiently forget about function parameters pushed on the stack:

add esp, 8

Obviously its usual use is plain arithmetic, but when it is used with the stack pointer as above, you know the preceding function call was to a cdecl function.

cmp

Compare two operands: Subtract the second operand from the first operand and set EFLAGS as if this were an arithmetic subtraction instruction.

test

Logical comparison. From Intel's manual: "Computes the bit-wise logical AND of first operand... and the second operand... and sets [EFLAGS accordingly]."

More

If you're unsure about what an instruction does, RTFM: http://www.intel.com/products/processor/manuals/

Intel's manuals are the definitive guide to how Intel's processors parse and execute instructions. They are organized as follows:

Volume 1: Basic Architecture
Volume 2: Instruction Set Reference
Volume 3: System Programming Guide

If you wonder about a particular instruction, you'll find it in volume 2 (Instruction Set Reference). If you want to learn about the x86 execution environment, volume 1 (Basic Architecture) is your friend. And if you're writing a bootloader, an operating system, or a hypervisor, volume 3 (System Programming Guide) is for you.

Misc

If you're interested in tabulating the most common instructions using IDAPython, here is a snippet.

from collections import defaultdict

def _for_each_instr(callback, outputs=None, parms=None):
    """Do <callback> for each instruction.

    Call callback() providing fva, chunk start va, instr addr, and outputs/
    parameters.
    """
    for fva in Functions():
        for (va_start, va_end) in Chunks(fva):
            for head in Heads(va_start, va_end):
                callback(fva, va_start, head, outputs, parms)

def enum_mnemonics():
    mnems = defaultdict(int)
    def enum_mnemonics_callback(fva, chunkva, head, unused1, unused2):
        mnems[GetMnem(head)] += 1

    _for_each_instr(enum_mnemonics_callback)

    mnems_sorted = sorted(mnems.iteritems(), key=lambda(k,v):v, reverse=True)

    return mnems_sorted

Wednesday, October 4, 2017

Free Shortcuts in IDA Pro

I'm tired of reassigning everything to the same hotkey in IDA Pro because I don't know which hotkeys are free. Here are the IDA Pro keyboard shortcuts I know of that aren't in use in IDA Pro so far. Only one- and two-key shortcuts are included. I have omitted shortcuts used by BinDiff and flare-ida, because I don't want to collide with those.

  • ` (backtick)
  • , (comma)
  • <>{}[] (left and right angle bracket, brace, and square bracket)
  • Most (but not all) of the top row: !@#$%^&*()+=
  • I
  • J
  • Alt+F4,F5, and F8
  • Alt+E, F, N, O, U, V, W, Z
  • Ctrl+0, 4, 5, 7, 8, 9
  • Ctrl+H
  • Ctrl+Y
  • Shift+A-C
  • Shift+F-O
  • Shift+Q
  • Shift+S-Z
I got these by visiting Options -> Shortcuts... in a recent version of IDA Pro. If you notice any others, or notable collisions, please comment or message and I will update.

Thursday, September 28, 2017

Two Great Tastes: IDA + WinDbg

Setting up remote debugging with IDA+WinDbg can be difficult when it doesn't work right off the bat, because the errors don't jog the right thought process for you to fix the setup and get it working. This causes some people to walk away from the whole thing, which is unfortunate. This setup is SOOOOO useful that it's worth slogging through the pain to get it working. The value of having graph view and IDA's annotation capabilities on-hand while debugging cannot be overstated.

Here, I'll emphasize one thing that could stand to be better emphasized in Hex-Rays' own documentation: you have to be using the same version of WinDbg on each side. And I'll indicate some ways to isolate end-to-end (E2E) issues. Note that the system with IDA Pro on it is referred to here as the analysis system (it's where you do your analysis of the code), and the system where you run malware is referred to as the target system.

Pointers

  • Resolve any end-to-end (E2E) issues first (firewalls, networking, etc.)
  • Lock IDA Pro into using the same version of WinDbg as is on your target system
  • Use WinDbg itself to verify that there are no E2E issues

Algorithm

This is exactly how to set up a remote debugging setup with IDA Pro and WinDbg. Here are the steps:
  1. Both systems: Ensure your analysis and target machines can access each other over the network
    1. If they are VMs, you may need to adjust them to ensure they are both host-only
    2. You might need to mess with firewall settings
    3. If you are using FakeNet-NG, you might need to add an exception for dbgsrv.exe
  2. Target system: Locate (install, if necessary) WinDbg on your target system.
  3. Target -> Analysis system: If you haven't installed the same version of WinDbg to both systems, then simply copy the entire x86 directory where you located WinDbg on the target system, onto your analysis system. It doesn't matter where you place this.
  4. Analysis system: Edit ida.cfg to set DBGTOOLS to point to the x86 directory
    1. Use double backslashes, e.g. DBGTOOLS = "C:\\Program Files (x86)\\Windows Kits\\10\\Debuggers\\x86\\";
  5. Target system: Start the WinDbg debug server
    1. "C:\path\to\dbgsrv.exe" -t tcp:port=9999
  6. Analysis system: Test by trying to connect remotely with WinDbg itself - if this doesn't work, then you've got end-to-end issues to resolve before IDA will work
  7. Analysis system: configure your IDB to use WinDbg:
    1. Debugger -> Switch debugger... (select Windbg debugger and click OK)
    2. Debugger -> Options...
      1. Application: path\on\your\target\system\to\binary.exe
      2. Input file: path\on\your\target\system\to\binary.exe
      3. Directory: path\on\your\target\system\to
      4. Parameters: command-line arguments you want passed to the malware (if any)
      5. Connection string: tcp:server=TARGETSYSTEMNAME,port=9999
      6. Click OK 
  8. Analysis system: Click on an instruction and hit F4 to "run to" that instruction, or set a breakpoint and strike F9
  9. Disregard warnings as applicable ;-)

Troubleshooting

You may want to audit your user and system PATH environment variables to ensure that they don't include the x86 directory of a conflicting version of WinDbg, or the x64 directory for that matter.

If you get "Could not initialize WinDbg engine 0x7f / The specified procedure could not be found... You might try adding the path to that x86 directory to your system path and closing/reopening IDA. I also find that certain Python scripts seem to cause IDA Pro to emit this error, so you might also try closing/reopening IDA, initiating your debug session, and only THEN loading any ancillary IDAPython scripts you were using.

Miscellany

As of 2011, Hex-Rays indicated that this would not work with the x64 tools.

Saturday, August 19, 2017

Done to Death

I saw this tweet from @redteamwrangler today:


This question is a pretty easy fallback for interviewers and it might be getting a little old for us. But it set me to thinking about how tediously I could answer this without using the Internet or any reference materials on my machine. Some of these details might be a little off or just downright wrong, but I'll display my ignorance. It was a fun exercise since it isn't in the context of any actual interview :-)

What happens when you type www.google.com into a browser and press return? Supposing you're using Internet Explorer for Windows on an wired (Ethernet) connection:

  • 8259a or emulated 8259a keyboard controller emits some scancodes
  • Keyboard interrupt is sent to the CPU
  • Interrupt service routine acknowledges interrupt, potentially moves one or more scancodes into a buffer
  • Or a delayed procedure call (DPC) does
  • ntoskrnl/win32k determine which thread corresponds to the foreground window and deliver a series of window messages of type WM_KEYDOWN/WM_KEYUP ending with one having virtual key code VK_ENTER
  • Window procedure for the browser URL bar (which is a window object) is called with a window message of type WM_KEYUP with virtual key code VK_ENTER
  • Window procedure has a switch statement / jmp table in it that accounts for this particular window message (WM_KEYUP) and maybe a sub-case for VK_ENTER
  • Probably takes the accumulated buffer so far (L"www.google.com") and passes it to a function
  • Probably uses a library like WinInet to do the real stuff
    • Probably calls InternetOpen() to get a handle of type HINTERNET
    • Probably calls InternetOpenUrlW() or HttpSomethingSomething() to get another HINTERNET handle
      • Probably reads the registry or uses a cached value for the HTTP User-Agent field that it provides here
      • Probably uses WinSock2 for TCP
      • Probably calls ws2_32!WSAStartup() if it hasn't been called yet
      • Checks proxy settings for the user and optionally establishes a connection with the corresponding hostname and implements HTTP proxy requests instead of direct HTTP requests
      • Parses the URL for the hostname, protocol scheme, any explicit port specification, URI, query parameters, etc.
        • Probably uses urlmon!InternetCrackUrlW() for this (or is that in wininet? I think it's in urlmon)
        • If no protocol scheme is specified, uses http:// which has a default port of 80
        • If https:// is specified, a default port of 443
      • Issues a DNS A (IPv4 address) request and/or AAAA (IPv6 address) request for the name
        • Probably calls ws2_32!gethostbyname() to do this
          • Thread consults DNS resolver cache service (if running) for name, probably via IPC
            • DNS resolver cache either returns the name or...
            • Probably uses dnsapi.dll which exports some function that...
              • Checks DNS configuration (probably the registry) to get primary, secondary, tertiary, etc. DNS servers
              • Calls ws2_32!inet_aton() to convert human-readable configuration to IPv4 or IPv6 addresses
              • Creates a socket object via ws2_32!socket()
              • Creates an in_addr object to communicate with the DNS server
              • Uses ws2_32!sendto() to use AF_INET/IPPROTO_UDP connectionlessly querying the server
                • Network layer (hmm, getting hand wavy) consults routing table to determine what interface packet should go through and whether it must visit a gateway
                • Network card device driver creates and fills out an object that the kernel uses to describe network datagrams (packets)
                • Network card device driver initiates I/O request with NIC via PCI registers or other hardware interface to provide datagram to be transmitted
                  • NIC takes the medium and transmits Ethernet frames bearing the octets that were given to it
                  • If another host transmits at the same time, the two hosts use the binary exponential backoff algorithm to wait until the medium is clear
                  • A router is likely the gateway; it accepts the packets and creates new packets to send across one or more other networks until the DNS server receives them
                  • DNS server UDP stack handles incoming packet, provides it to UDP-based DNS service e.g. bind which is bound to port 53
                  • Bind parses the packets, potentially forwards the request if the desired names are not in its zone file, and returns the response
              • The DNS resolver cache service receives and parses DNS reply/replies and returns the answer to the DNS client
          • If the DNS resolver cache service wasn't running, ws2_32!gethostbyname() probably does most of this itself
      • Since you only typed "www.google.com", it's plaintext HTTP on the default port, so 80
      • Establishes a TCP connection with the resulting host number and port
        • ws2_32!socket() to get a socket object
        • ws2_32!connect() with AF_INET and IPPROTO_TCP to connect
          • tcpip.sys is probably involved here
          • Again the network card and Ethernet medium stuff
          • TCP three-way handshake, window negotiation, etc.
            • The client sends a SYN TCP segment
            • The server returns a SYN,ACK TCP segment
            • The client returns an ACK TCP segment
          • TCP data transmission
            • The client sends a SYN,PSH TCP segment pushing data
            • Something like "GET / HTTP/1.1\nHost: www.google.com\nUser-Agent: ..."
  • Google's web server does some thinking and returns a response
  • The client receives a 3xx redirect response and gets directed to go to https://www.google.com/
  • Uses Microsoft schannel (secure channel) library to negotiate ciphers, parse the server's security certificate, and transmit data over TLS
  • Starts with ClientHello message, ServerHello, etc.
  • Obtains HTTP response, something like "HTTP 200 OK\n..." with some HTML in the HTTP body
  • HTML links to images, maybe JavaScript, etc., resulting in Cross-Origin (CORS) processing and follow-on requests
  • Invokes the JScript scripting engine for JScript and rendering engine to display the content
  • Uses graphics primitives and likely renders into a buffer that it furnishes to win32k.sys through GDI calls
  • GDI manages framebuffer of all windows including the foreground window
  • Monitor dispays the framebuffer
  • Photons fly into your eyes
  • Optic nerve and brain adjust for upside-down image arriving at retina
  • Person realizes then went to Google and says, "crap, I meant to go to Bing." Orrrrrr maybe not, haha.

If I had more time, I'd draw this out a little more, but I had to quit eventually. And the more I do this, the more I bump into all the things I don't know. A couple things I'd like to know more about:

  • What does "network layer" mean on Windows? I could use ETW with syscall stackwalking enabled to follow the ws2_32!connect() call into the kernel, or Windows Internals might just tell me.
  • What kernel object represents a packet in the Windows kernel? A packet buffer? I forgot :-( 
  • How does the networking stack give a packet to the NIC to transmit it? My device driver fu is ageing.