tag:blogger.com,1999:blog-54335554581203081272024-02-18T22:11:38.566-08:00Bailey's on the RocksMichael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.comBlogger51125tag:blogger.com,1999:blog-5433555458120308127.post-18326500425092382502019-07-19T01:23:00.003-07:002019-07-19T01:31:41.519-07:00Snooping Again<h2>
Back At It</h2>
As a red teamer in 2016, I found it painful to dredge up command, output, and IP address details about red team operations that had already concluded.<br />
<br />
In 2016, I wrote about <a href="http://baileysoriginalirishtech.blogspot.com/2016/02/snooping-on-myself-for-change.html" target="_blank">researching and developing a Microsoft Detours DLL</a> to hook cmd.exe and its subordinate processes and log console I/O with a few system details. Cool project, but mostly experimental with several deficiencies.<br />
<br />
But I'm writing to share that I've improved the tool, learned a few things, and am releasing the updated tool on GitHub for your red teaming pleasure.<br />
<h2>
Deficiencies, You Say?</h2>
When I first wrote this, Detours Express was only licensed for non-commercial use on 32-bit platforms. It was possible to use my DLL with the 32-bit cmd.exe, but if a 64-bit executable was invoked, it couldn't execute with the hooking DLL attached. My DLL also created a new log per process, which was inconvenient to review. Lastly, some CLI-driven tools including PowerShell use different Windows APIs to interact with the console, so their output never made it into the logs.<br />
<br />
Two events motivated me to revisit and rectify these.<br />
<br />
First, as of April 23, 2018, Galen Hunt and the Detours Team indicated that Detours 4.0.1 was freely available on GitHub, supporting "x86, x64 and other Windows-compatible processors (IA64 and ARM). It includes support for either 32-bit or 64-bit processes." This is great news! It allows malware analysts, red teamers, researchers, and developers, to instrument and extend many Windows userspace applications almost arbitrarily without any licensing encumbrances. I'm using my old project to extol the joys of using Detours on both x86 and x64 for free.<br />
<br />
Second, in a 2019 discussion, I overheard some red teamers wishing for a way to log time/date stamped console output, and those same red teamers also indicated it would be helpful to have IP address information available for the same reasons as I stated above. <a href="https://cmder.net/" target="_blank">Cmder</a> is a powerful console with logging, but I don't believe it has time/date stamps or IP address logging. Furthermore, I think its output contains ANSI color escape sequences that hinder those logs from being readily reviewed and presented.<br />
<h2>
Bringing it Up to Date</h2>
In the aforementioned blog post, I wrote in detail about using the Microsoft Detours' traceapi sample to observe API usage, form hypotheses, and arrive at discoveries about how one might hook and modify behavior. I did the same thing to learn that PowerShell uses ReadConsoleInput and WriteConsoleOutput to do its work instead of simple ReadConsole/WriteConsole.<br />
<br />
I furthermore read the documentation for <a href="https://github.com/Microsoft/Detours/wiki/DetourCreateProcessWithDlls" target="_blank">DetourCreateProcessWithDlls</a> and used SysInternals' Process Monitor to figure out how to let Detours use a rundll32 helper process to load the correct architecture of my DLL into new subprocesses.<br />
<br />
And finally, I arrived at a scheme that uses environment variables to establish a single log file path for a given command interpreter and its subprocesses to write to. Consequently, one may look in a single log file to review the command line session.<br />
<br />
Oh, and in response to some friendly suggestions, I now crudely prevent the IP address information from being displayed for each and every command entered, provided it has not changed within a given CLI session.<br />
<br />
Here's some log output so you can see roughly what it looks like.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiOP3LB0VJpy0E2Tx1Iffx4fz1PwubSwHfRJ23p_LRMswSQtg2pd8LLeWoYybhxmE26ijCU6xTLZxFfcryKlRcBcAsFfGKR19_tEjO8aNAi_82-v9UNbIo8WOlIakt9I2oTG3VyKsox7v52/s1600/tmp.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1030" data-original-width="983" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiOP3LB0VJpy0E2Tx1Iffx4fz1PwubSwHfRJ23p_LRMswSQtg2pd8LLeWoYybhxmE26ijCU6xTLZxFfcryKlRcBcAsFfGKR19_tEjO8aNAi_82-v9UNbIo8WOlIakt9I2oTG3VyKsox7v52/s400/tmp.png" width="381" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Nested logging of cmd.exe/powershell.exe</td></tr>
</tbody></table>
<h2>
<br />
Deficiencies that Will Remain</h2>
Alas, various CLI applications mix and match line endings, resulting in stray ^M characters. Perhaps smart I/O transformation on the part of my logging apparatus could eliminate this, but I find it simple and convenient to post-process the log files. For example, Vim has the following edit mode command:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">:%s/^M//g</span><br />
<h2>
What if Cmder Just Adds These Features?</h2>
I'm certainly not competing with the fine folks who write Cmder. If they take the trouble to to add and support the relevant state logic and configuration settings to achieve these same ends, that would be pretty cool and useful. My project remains a fun study into how to research and instrument programs, and a nice example of the usefulness of mixed-architecture Detours usage.<br />
<h2>
The Link, Please?</h2>
Oh yeah, so here is the link to my project:<br />
<br />
<a href="https://github.com/strictlymike/cmdlog">https://github.com/strictlymike/cmdlog</a><br />
<br />
Enjoy!<br />
<style>
code {
font-family: Courier;
margin:.75em 0;
border:1px solid #596;
border-width:1px 1px;
padding:5px 15px;
display: block;
background-color: #dedede;
white-space: pre;
}
</style>Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0tag:blogger.com,1999:blog-5433555458120308127.post-22518674685528949402018-05-18T14:54:00.005-07:002018-05-18T14:54:43.826-07:00Vimifier!<style>
code {
font-family: Courier;
margin:.75em 0;
border:1px solid #596;
border-width:1px 1px;
padding:5px 15px;
display: block;
background-color: #dedede;
white-space: pre;
}
</style><br />
<div>
If you've seen my blog, you know I love <a href="http://baileysoriginalirishtech.blogspot.com/search/label/Vim" target="_blank">Vim</a>. And if you've worked with me, you know I hate when I have to use editors like Microsoft Word because I have to take my fingers off the home row so much to select text, frequently change its format, and generally get around and edit what I'm working on. I worked on a 72-page report last week where I lamented this extensively. Over the week, I started to think: what stops me from turning the keystrokes I <i>want</i> to type into the keystrokes I <i>am</i> typing?</div>
<div>
<br /></div>
<div>
For instance, when I want to move the cursor between words, I use Ctrl+Left and Ctrl+Right. And when I want to select several words, I hold down the shift key. I started to wonder how hard it might be to create a keyboard hook that would translate Vim-style shortcuts (like b, w, v, ...) into their Windows hotkey equivalents (Ctrl+Left, Ctrl+Right, Shift, ...).</div>
<div>
<br /></div>
<div>
Well, I tried it out, and yeah it's a bit of work, but it's worth it! I don't have screenshots or GIFs because it's hard to make it evident what keystrokes I'm using, but if you're a tinkerer then take a look and try it out. I think you'll be amused!</div>
<div>
<br /></div>
<div>
Get the source code under my GitHub profile at <a href="https://github.com/strictlymike/vimifier">https://github.com/strictlymike/vimifier</a>.</div>
<div>
<br /></div>
<div>
If you need a bare bones compiler to get this built, the quickest shortcut I know of is to grab the <a href="https://aka.ms/vcpython27" target="_blank">Microsoft Visual C++ Compiler for Python 2.7</a>.</div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0tag:blogger.com,1999:blog-5433555458120308127.post-50993889075232361212017-12-21T03:32:00.003-08:002018-01-13T11:00:36.032-08:00Getting Into REThis has been done before, but I find myself producing this information repeatedly, so what the hell, here's a blog article about it: how to get started in reverse engineering (RE). You'll need a VM (<a href="https://www.virtualbox.org/" target="_blank">VirtualBox</a> is free and works for me).<br />
<br />
I'll first promote the resources that I used because that's what worked for me. Then I'll talk about how to get practice via a certain CTF, and share some resources that I believe have been useful to others.<br />
<br />
<h3>
PMA</h3>
<div>
That stands for <a href="http://nostarch.com/malware" target="_blank">Practical Malware Analysis</a>. This book is already showing its age, but I still think it is the best all-in-one resource to learn reverse engineering fundamentals.</div>
<div>
<br /></div>
<div>
The approach I took that helped me really absorb the material was:</div>
<div>
<ol>
<li>Read the book through and just absorb it;</li>
<li>Go back through for the labs, reviewing each chapter as necessary;</li>
<li>When a lab takes more than a certain time (start with 30 minutes), use the back of the book for the answer;</li>
<li>If you don't see the connection between what you've seen so far and the answer in the back of the book, read the extended answer to see how they got that; and,</li>
<li>Always read the extended answer to glean any techniques that might be more efficient than the road you took.</li>
</ol>
</div>
<div>
It takes discipline to remember that you have limited time and you need to move through the labs if you are to learn and improve. If you're banging your head into the wall, you're that much closer to giving up, which is only okay if you have determined that this is simply not interesting to you anymore.</div>
<div>
<br /></div>
<div>
If you don't know assembly language well and you think it is hindering your ability to move through PMA, then suspend that process and take some time with...<br />
<br /></div>
<h3>
x86 Assembly Language</h3>
<div>
I will suggest two main roads for learning x86 (and x64) assembly language, and a couple of other references to support them. The first and most accessible main resource is the one that a lot of my colleagues have said helped them: http://opensecuritytraining.info/</div>
<div>
<br /></div>
<div>
A lot of reverse engineers, both aspiring and established, say that <a href="http://twitter.com/XenoKovah" target="_blank">Xeno's</a> courses are where they learned assembly language, and it went really well for them, so with it being available online for free, I have to put it out there.</div>
<div>
<br /></div>
<div>
As for myself, my main resource for learning x86 assembly language was Richard Blum's book, <a href="http://www.wiley.com/WileyCDA/WileyTitle/productCd-0764579010.html" target="_blank">Professional Assembly Language Programming</a>. The book teaches with GNU tools, so it uses the AT&T syntax which is largely unpopular with the RE crowd, but on the upside, the GNU tools are superlatively easy to acquire and use on most Linux distributions.</div>
<div>
<br /></div>
<div>
Aside from those, the first chapter (x86/x64) of <a href="https://www.amazon.com/Practical-Reverse-Engineering-Reversing-Obfuscation/dp/1118787315" target="_blank">Practical Reverse Engineering</a> reinforced and clarified some essential concepts for me.<br />
<br /></div>
<div>
Finally, for the definitive RTFM experience, Volume 2 of <a href="http://www.intel.com/products/processor/manuals/" target="_blank">Intel's processor manuals</a> contains the instruction set reference, which you can use to look up weird instructions you come across. If you're using IDA Pro, there is also an <a href="https://reverseengineering.stackexchange.com/questions/13513/are-there-ida-scripts-plugins-to-translate-comment-instructions-to-with-pseudoco" target="_blank">auto-comment mode in IDA Pro</a> that may help remind you if you are just getting started.<br />
<br /></div>
<h3>
Debugging</h3>
<div>
Tarik Soulami's book Inside Windows Debugging is an outstanding read about not just WinDbg but Windows internals. I can't recommend it emphatically enough.<br />
<br /></div>
<h3>
FLARE-On Challenges</h3>
<div>
If you're done with PMA and ready for some practice, the FLARE-On Challenge binaries archived at <a href="http://flare-on.com/">http://flare-on.com/</a> pose a unique training opportunity for two reasons: first, because they deliberately mimic real malware; and second, because they are all accompanied by solution write-ups on fireeye.com:</div>
<div>
<ul>
<li>2014: <a href="https://www.fireeye.com/blog/threat-research/2014/11/the_flare_on_challen.html" target="_blank">part 1</a> and <a href="https://www.fireeye.com/blog/threat-research/2014/11/flare_on_challengep.html" target="_blank">part 2</a> </li>
<li>2015: <a href="https://www.fireeye.com/blog/threat-research/2015/09/flare-on_challenges.html">https://www.fireeye.com/blog/threat-research/2015/09/flare-on_challenges.html</a></li>
<li>2016: <a href="https://www.fireeye.com/blog/threat-research/2016/11/2016_flare-on_challe.html">https://www.fireeye.com/blog/threat-research/2016/11/2016_flare-on_challe.html</a></li>
<li>2017: <a href="https://www.fireeye.com/blog/threat-research/2017/10/2017-flare-on-challenge-solutions.html">https://www.fireeye.com/blog/threat-research/2017/10/2017-flare-on-challenge-solutions.html</a></li>
<ul>
<li>I wrote challenge 7 and the associated <a href="https://www.fireeye.com/content/dam/fireeye-www/global/en/blog/threat-research/Flare-On%202017/Challenge7.pdf" target="_blank">solution write-up</a>, you should check it out! :-)</li>
</ul>
</ul>
<div>
When used for training, I suggest approaching these incrementally: attack level 1 of each, level 2 of each, level 3 of each, in turn. I also suggest treating them like the PMA labs: if you exceed a certain duration analyzing them, peek at the solution write-up and see if that gives you a shove in the right direction.<br />
<br />
<h3>
Things I Did That You Don't Have To</h3>
<div>
The first book I read about RE-related things was actually not PMA; it was The Shellcoder's Handbook, 1st Edition (2nd Edition is <a href="https://www.amazon.com/Shellcoders-Handbook-Discovering-Exploiting-Security/dp/047008023X" target="_blank">here</a>). This was not a gentle introduction. But looking back, it taught me a lot of things that I refer back to very frequently. So maybe it was more formative for me than I even remember.</div>
</div>
</div>
<div>
<br /></div>
<div>
I also continue to get a lot out of reading Kyle Loudon's book, <a href="http://shop.oreilly.com/product/9781565924536.do" target="_blank">Mastering Algorithms in C</a>. The book talks about all those things I consider to be magic, especially crypto and compression.<br />
<br /></div>
<h3>
Other Resources</h3>
<div>
<ul>
<li><a href="https://twitter.com/malwareunicorn" target="_blank">@malwareunicorn</a> has made available her materials for learning about RE, and it looks like an interesting way to get started: <a href="https://securedorg.github.io/RE101/">https://securedorg.github.io/RE101/</a></li>
<li>2017.12.27: A vulnerability researcher told me that his "Aha! book" was <a href="https://www.wiley.com/en-us/Reversing%3A+Secrets+of+Reverse+Engineering+-p-9780764574818" target="_blank">Reversing: Secrets of Reverse Engineering</a>. I took a look at this while visiting a bookstore and it looks not only informative but interesting.</li>
<li>...</li>
</ul>
</div>
<div>
<br /></div>
<div>
See the ellipsis? I'm going to tack things onto this article as I learn about them. If you know of one, hollar. It's easy for me to remember what helped ME, but I could use a reminder of what others have found helpful.</div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0tag:blogger.com,1999:blog-5433555458120308127.post-35118330067690194922017-11-03T06:17:00.005-07:002017-11-03T06:25:19.544-07:00x86 Assembly Language Brush-UpA buddy of mine is reviewing x86 assembly, so I thought I would write a brief primer on common x86 assembly language instructions. If you aren't yet familiar with the x86 <a href="https://en.wikipedia.org/wiki/Register_file" target="_blank">register file</a>, check out <a href="http://baileysoriginalirishtech.blogspot.com/2017/02/remember-registers.html" target="_blank">Remember the Registers</a> first for a super-quick overview.<br />
<br />
On a side-note, I laughed when I looked back at that other article, because it starts the same way: "oh, hey, a friend is about to learn x86 assembly, so I thought I would write this quick article!" So I guess the lesson here is: Parents, talk to your kids about assembly language... or else their friends will! }:-)<br />
<br />
With no further ado, I'll get this thing moving with...<br />
<h2>
mov</h2>
<div>
Much of reverse engineering entails following the flow of data backward and forward as it moves through registers and memory. The mov instruction is the most commonly used instruction <b>and</b> the instruction you'll most often have to read to know where data is going.</div>
<h2>
lea</h2>
<div>
lea stands for load effective address. The lea instruction is supposed to give you a pointer to something rather than dereferencing the pointer and giving you the actual data. In reality, though, it just computes the sum or other expression in the square brackets and moves it to the specified location. Take, for example, the following instruction:</div>
<div>
<br /></div>
<pre class="vimCodeElement"><span class="Identifier">lea</span> <span class="Identifier">eax</span>, [<span class="Identifier">ebp</span>-<span class="Constant">218</span><span class="Identifier">h</span>]
</pre>
<div>
<br /></div>
<div>
The eax register in this case will receive ebp minus 0x218, which is the address of some local variable. Compare this with:</div>
<div>
<br /></div>
<pre class="vimCodeElement"><span class="Identifier">mov</span> <span class="Identifier">eax</span>, [<span class="Identifier">ebp</span>-<span class="Constant">218</span><span class="Identifier">h</span>]
</pre>
<div>
<br /></div>
<div>
Which actually dereferences ebp-0x218 to retrieve the <i>contents</i> of that local variable in the function stack frame and puts that value into eax.</div>
<div>
<br />
Since the lea instruction in all reality just computes the value of the expression in the brackets, it can also be used to evaluate complex expressions involving multiplication and addition. If you see some values that <i>can't possibly</i> be addresses getting used with the lea instruction, you might be right. The program may be merely computing a value rather than working with memory addresses.</div>
<h2>
push</h2>
<div>
Data goes on the stack, usually for a function call.<br />
<br />
Some compilers will also emit code to push an immediate operand (a constant value, e.g. 0) and then pop it to a register, like this:</div>
<div>
<br /></div>
<div>
<pre class="vimCodeElement"><span class="Identifier">push</span> <span class="Constant">4</span> <span class="Comment">; Put the number 4 on the stack</span>
<span class="Identifier">pop</span> <span class="Identifier">eax</span> <span class="Comment">; The number 4 winds up in eax</span></pre>
</div>
<div>
<h2>
call</h2>
</div>
<div>
The processor pushes the address of the next instruction and transfers control to a procedure of the programmer's choosing. This is kind of equivalent to lines 3-5 below:<br />
<br />
<pre class="vimCodeElement"><span class="LineNr" id="L1">1 </span> <span class="Identifier">push</span> <span class="Identifier">arg2</span> <span class="Comment">; Push function arguments as normal</span>
<span class="LineNr" id="L2">2 </span> <span class="Identifier">push</span> <span class="Identifier">arg1</span>
<span class="LineNr" id="L3">3 </span> <span class="Identifier">push</span> <span class="Identifier">offset</span> <span class="Identifier">L_nextinstr</span> <span class="Comment">; Save the address of the next instruction on the stack</span>
<span class="LineNr" id="L4">4 </span> <span class="Identifier">jmp</span> <span class="Identifier">procedure</span> <span class="Comment">; Transfer control</span>
<span class="LineNr" id="L5">5 </span><span class="Identifier">L_nextinstr</span>:
<span class="LineNr" id="L6">6 </span> <span class="Identifier">test</span> <span class="Identifier">eax</span>, <span class="Identifier">eax</span> <span class="Comment">; Resume normal stuff like checking return value</span>
<span class="LineNr" id="L7">7 </span></pre>
<h2>
jmp</h2>
</div>
<div>
This is another way to transfer control, usually within a procedure, but sometimes to a procedure.<br />
<h2>
retn N</h2>
<div>
When you see a return instruction followed by a number, the function is cleaning up its own stack, which means it is stdcall (the standard calling convention for Microsoft Win32 APIs).<br />
<h2>
add</h2>
I mention this instruction now because it is used in the other calling convention, cdecl, to efficiently forget about function parameters pushed on the stack:<br />
<br />
<pre class="vimCodeElement"><span class="Identifier">add</span> <span class="Identifier">esp</span>, <span class="Constant">8</span>
</pre>
<br />
Obviously its usual use is plain arithmetic, but when it is used with the stack pointer as above, you know the preceding function call was to a cdecl function.</div>
</div>
<div>
<h2>
cmp</h2>
</div>
<div>
Compare two operands: Subtract the second operand from the first operand and set EFLAGS as if this were an arithmetic subtraction instruction.</div>
<div>
<h2>
test</h2>
</div>
<div>
Logical comparison. From Intel's manual: "Computes the bit-wise logical AND of first operand... and the second operand... and sets [EFLAGS accordingly]."</div>
<div>
<h2>
More</h2>
</div>
<div>
If you're unsure about what an instruction does, RTFM: <a href="http://www.intel.com/products/processor/manuals/">http://www.intel.com/products/processor/manuals/</a></div>
<div>
<br /></div>
<div>
Intel's manuals are the definitive guide to how Intel's processors parse and execute instructions. They are organized as follows:</div>
<div>
<br /></div>
<div>
Volume 1: Basic Architecture</div>
<div>
Volume 2: Instruction Set Reference</div>
<div>
Volume 3: System Programming Guide</div>
<div>
<br />
If you wonder about a particular instruction, you'll find it in volume 2 (Instruction Set Reference). If you want to learn about the x86 execution environment, volume 1 (Basic Architecture) is your friend. And if you're writing a bootloader, an operating system, or a hypervisor, volume 3 (System Programming Guide) is for you.<br />
<h2>
Misc</h2>
</div>
<div>
If you're interested in tabulating the most common instructions using IDAPython, here is a snippet.</div>
<div>
<br /></div>
<pre class="vimCodeElement"><span class="PreProc">from</span> collections <span class="PreProc">import</span> defaultdict
<span class="Statement">def</span> <span class="Identifier">_for_each_instr</span>(callback, outputs=<span class="Identifier">None</span>, parms=<span class="Identifier">None</span>):
<span class="Constant">"""Do <callback> for each instruction.</span>
<span class="Constant"> Call callback() providing fva, chunk start va, instr addr, and outputs/</span>
<span class="Constant"> parameters.</span>
<span class="Constant"> """</span>
<span class="Statement">for</span> fva <span class="Statement">in</span> Functions():
<span class="Statement">for</span> (va_start, va_end) <span class="Statement">in</span> Chunks(fva):
<span class="Statement">for</span> head <span class="Statement">in</span> Heads(va_start, va_end):
callback(fva, va_start, head, outputs, parms)
<span class="Statement">def</span> <span class="Identifier">enum_mnemonics</span>():
mnems = defaultdict(<span class="Identifier">int</span>)
<span class="Statement">def</span> <span class="Identifier">enum_mnemonics_callback</span>(fva, chunkva, head, unused1, unused2):
mnems[GetMnem(head)] += <span class="Constant">1</span>
_for_each_instr(enum_mnemonics_callback)
mnems_sorted = <span class="Identifier">sorted</span>(mnems.iteritems(), key=<span class="Statement">lambda</span>(k,v):v, reverse=<span class="Identifier">True</span>)
<span class="Statement">return</span> mnems_sorted
</pre>
<div>
<br /></div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com3tag:blogger.com,1999:blog-5433555458120308127.post-24303301477811948182017-10-04T13:51:00.003-07:002017-10-04T13:52:58.399-07:00Free Shortcuts in IDA ProI'm tired of reassigning everything to the same hotkey in IDA Pro because I don't know which hotkeys are free. Here are the IDA Pro keyboard shortcuts I know of that aren't in use in IDA Pro so far. Only one- and two-key shortcuts are included. I have omitted shortcuts used by <a href="https://www.zynamics.com/bindiff.html" target="_blank">BinDiff</a> and <a href="https://github.com/fireeye/flare-ida" target="_blank">flare-ida</a>, because I don't want to collide with those.<style>
code {
font-family: Courier;
margin:.75em 0;
border:1px solid #596;
border-width:1px 1px;
padding:5px 15px;
display: block;
background-color: #dedede;
white-space: pre;
}
</style><br />
<div>
<br /></div>
<div>
<ul>
<li>` (backtick)</li>
<li>, (comma)</li>
<li><>{}[] (left and right angle bracket, brace, and square bracket)</li>
<li>Most (but <i>not </i>all) of the top row: !@#$%^&*()+=</li>
<li>I</li>
<li>J</li>
<li>Alt+F4,F5, and F8</li>
<li>Alt+E, F, N, O, U, V, W, Z</li>
<li>Ctrl+0, 4, 5, 7, 8, 9</li>
<li>Ctrl+H</li>
<li>Ctrl+Y</li>
<li>Shift+A-C</li>
<li>Shift+F-O</li>
<li>Shift+Q</li>
<li>Shift+S-Z</li>
</ul>
<div>
I got these by visiting <u>O</u>ptions -> <u>S</u>hortcuts... in a recent version of IDA Pro. If you notice any others, or notable collisions, please comment or message and I will update.</div>
</div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com1tag:blogger.com,1999:blog-5433555458120308127.post-24743470791463421812017-09-28T10:53:00.002-07:002017-09-28T10:55:05.132-07:00Two Great Tastes: IDA + WinDbgSetting up remote debugging with IDA+WinDbg can be difficult when it doesn't work right off the bat, because the errors don't jog the right thought process for you to fix the setup and get it working. This causes some people to walk away from the whole thing, which is unfortunate. This setup is SOOOOO useful that it's worth slogging through the pain to get it working. The value of having graph view and IDA's annotation capabilities on-hand while debugging cannot be overstated.<br />
<div>
<br /></div>
<div>
Here, I'll emphasize one thing that could stand to be better emphasized in <a href="https://www.hex-rays.com/products/ida/support/tutorials/debugging_windbg.pdf" target="_blank">Hex-Rays' own documentation</a>: you have to be using the same version of WinDbg on each side. And I'll indicate some ways to isolate end-to-end (E2E) issues. Note that the system with IDA Pro on it is referred to here as the <i>analysis</i> system (it's where you do your analysis of the code), and the system where you run malware is referred to as the <i>target</i> system.<br />
<div>
<h2>
Pointers</h2>
<div>
<ul>
<li>Resolve any end-to-end (E2E) issues first (firewalls, networking, etc.)</li>
<li>Lock IDA Pro into using the same version of WinDbg as is on your target system</li>
<li>Use WinDbg itself to verify that there are no E2E issues</li>
</ul>
</div>
<div>
<h2>
Algorithm</h2>
<div>
This is exactly how to set up a <i>remote</i> debugging setup with IDA Pro and WinDbg. Here are the steps:</div>
<div>
<ol>
<li>Both systems: Ensure your analysis and target machines can access each other over the network</li>
<ol>
<li>If they are VMs, you may need to adjust them to ensure they are both host-only</li>
<li>You might need to mess with firewall settings</li>
<li>If you are using FakeNet-NG, you might need to add an exception for dbgsrv.exe</li>
</ol>
<li>Target system: Locate (install, if necessary) WinDbg on your target system.</li>
<li>Target -> Analysis system: If you haven't installed the same version of WinDbg to both systems, then simply copy the entire x86 directory where you located WinDbg on the target system, onto your analysis system. It doesn't matter where you place this.</li>
<li>Analysis system: Edit ida.cfg to set DBGTOOLS to point to the x86 directory</li>
<ol>
<li>Use double backslashes, e.g. DBGTOOLS = "C:\\Program Files (x86)\\Windows Kits\\10\\Debuggers\\x86\\";</li>
</ol>
<li>Target system: Start the WinDbg debug server</li>
<ol>
<li>"C:\path\to\dbgsrv.exe" -t tcp:port=9999</li>
</ol>
<li>Analysis system: Test by trying to connect remotely with WinDbg itself - if this doesn't work, then you've got end-to-end issues to resolve before IDA will work</li>
<li>Analysis system: configure your IDB to use WinDbg:</li>
<ol>
<li>Deb<u>u</u>gger -> Switch d<u>e</u>bugger... (select Windbg d<u>e</u>bugger and click O<u>K</u>)</li>
<li>Deb<u>u</u>gger -> <u>O</u>ptions...</li>
<ol>
<li><u>A</u>pplication: path\on\your\target\system\to\binary.exe</li>
<li><u>I</u>nput file: path\on\your\target\system\to\binary.exe</li>
<li><u>D</u>irectory: path\on\your\target\system\to</li>
<li><u>P</u>arameters: command-line arguments you want passed to the malware (if any)</li>
<li><u>C</u>onnection string: tcp:server=TARGETSYSTEMNAME,port=9999</li>
<li>Click O<u>K</u> </li>
</ol>
</ol>
<li>Analysis system: Click on an instruction and hit F4 to "run to" that instruction, or set a breakpoint and strike F9</li>
<li>Disregard warnings as applicable ;-)</li>
</ol>
<h2>
Troubleshooting</h2>
<div>
You may want to audit your user and system PATH environment variables to ensure that they don't include the x86 directory of a conflicting version of WinDbg, or the x64 directory for that matter.</div>
<div>
<br /></div>
<div>
If you get "Could not initialize WinDbg engine 0x7f / The specified procedure could not be found... You <i>might</i> try adding the path to that x86 directory to your system path and closing/reopening IDA. I also find that certain Python scripts seem to cause IDA Pro to emit this error, so you might also try closing/reopening IDA, initiating your debug session, and only THEN loading any ancillary IDAPython scripts you were using.</div>
<h2>
Miscellany</h2>
</div>
<div>
As of 2011, Hex-Rays indicated that <a href="https://www.hex-rays.com/products/ida/support/tutorials/debugging_windbg.pdf" target="_blank">this would not work with the x64 tools</a>.</div>
</div>
</div>
</div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com3tag:blogger.com,1999:blog-5433555458120308127.post-18278643105921368112017-08-19T16:01:00.002-07:002018-01-13T11:10:44.350-08:00Done to DeathI saw <a href="https://twitter.com/redteamwrangler/status/898570341294850048" target="_blank">this tweet from @redteamwrangler</a> today:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgF_PfsN-E6nahdUWuYwtmlEv1ISmh0JRbDU2lQY7npQ_ugLPQeSI2qKeUSQ0qvJsdyYKUeDvfklGaInsGsOrbcBZukmmk_mSA2DQJBmfqUgaM_ekznBzSm_coy0ErNsXWNdTqC1q3T6HQT/s1600/typing_google.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="331" data-original-width="637" height="206" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgF_PfsN-E6nahdUWuYwtmlEv1ISmh0JRbDU2lQY7npQ_ugLPQeSI2qKeUSQ0qvJsdyYKUeDvfklGaInsGsOrbcBZukmmk_mSA2DQJBmfqUgaM_ekznBzSm_coy0ErNsXWNdTqC1q3T6HQT/s400/typing_google.png" width="400" /></a></div>
<br />
This question is a pretty easy fallback for interviewers and it might be getting a little old for us. But it set me to thinking about how tediously I could answer this without using the Internet or any reference materials on my machine. Some of these details might be a little off or just downright wrong, but I'll display my ignorance. It was a fun exercise since it isn't in the context of any actual interview :-)<br />
<br />
What happens when you type www.google.com into a browser and press return? Supposing you're using Internet Explorer for Windows on an wired (Ethernet) connection:<br />
<br />
<ul>
<li>8259a or emulated 8259a keyboard controller emits some scancodes</li>
<li>Keyboard interrupt is sent to the CPU</li>
<li>Interrupt service routine acknowledges interrupt, potentially moves one or more scancodes into a buffer</li>
<li>Or a delayed procedure call (DPC) does</li>
<li>ntoskrnl/win32k determine which thread corresponds to the foreground window and deliver a series of window messages of type WM_KEYDOWN/WM_KEYUP ending with one having virtual key code VK_ENTER</li>
<li>Window procedure for the browser URL bar (which is a window object) is called with a window message of type WM_KEYUP with virtual key code VK_ENTER</li>
<li>Window procedure has a switch statement / jmp table in it that accounts for this particular window message (WM_KEYUP) and maybe a sub-case for VK_ENTER</li>
<li>Probably takes the accumulated buffer so far (L"www.google.com") and passes it to a function</li>
<li>Probably uses a library like WinInet to do the real stuff</li>
<ul>
<li>Probably calls InternetOpen() to get a handle of type HINTERNET</li>
<li>Probably calls InternetOpenUrlW() or HttpSomethingSomething() to get another HINTERNET handle</li>
<ul>
<li>Probably reads the registry or uses a cached value for the HTTP User-Agent field that it provides here</li>
<li>Probably uses WinSock2 for TCP</li>
<li>Probably calls ws2_32!WSAStartup() if it hasn't been called yet</li>
<li>Checks proxy settings for the user and optionally establishes a connection with the corresponding hostname and implements HTTP proxy requests instead of direct HTTP requests</li>
<li>Parses the URL for the hostname, protocol scheme, any explicit port specification, URI, query parameters, etc.</li>
<ul>
<li>Probably uses urlmon!InternetCrackUrlW() for this (or is that in wininet? I think it's in urlmon)</li>
<li>If no protocol scheme is specified, uses http:// which has a default port of 80</li>
<li>If https:// is specified, a default port of 443</li>
</ul>
<li>Issues a DNS A (IPv4 address) request and/or AAAA (IPv6 address) request for the name</li>
<ul>
<li>Probably calls ws2_32!gethostbyname() to do this</li>
<ul>
<li>Thread consults DNS resolver cache service (if running) for name, probably via IPC</li>
<ul>
<li>DNS resolver cache either returns the name or...</li>
<li>Probably uses dnsapi.dll which exports some function that...</li>
<ul>
<li>Checks DNS configuration (probably the registry) to get primary, secondary, tertiary, etc. DNS servers</li>
<li>Calls ws2_32!inet_aton() to convert human-readable configuration to IPv4 or IPv6 addresses</li>
<li>Creates a socket object via ws2_32!socket()</li>
<li>Creates an in_addr object to communicate with the DNS server</li>
<li>Uses ws2_32!sendto() to use AF_INET/IPPROTO_UDP connectionlessly querying the server</li>
<ul>
<li>Network layer (hmm, getting hand wavy) consults routing table to determine what interface packet should go through and whether it must visit a gateway</li>
<li>Network card device driver creates and fills out an object that the kernel uses to describe network datagrams (packets)</li>
<li>Network card device driver initiates I/O request with NIC via PCI registers or other hardware interface to provide datagram to be transmitted</li>
<ul>
<li>NIC takes the medium and transmits Ethernet frames bearing the octets that were given to it</li>
<li>If another host transmits at the same time, the two hosts use the binary exponential backoff algorithm to wait until the medium is clear</li>
<li>A router is likely the gateway; it accepts the packets and creates new packets to send across one or more other networks until the DNS server receives them</li>
<li>DNS server UDP stack handles incoming packet, provides it to UDP-based DNS service e.g. bind which is bound to port 53</li>
<li>Bind parses the packets, potentially forwards the request if the desired names are not in its zone file, and returns the response</li>
</ul>
</ul>
<li>The DNS resolver cache service receives and parses DNS reply/replies and returns the answer to the DNS client</li>
</ul>
</ul>
<li>If the DNS resolver cache service wasn't running, ws2_32!gethostbyname() probably does most of this itself</li>
</ul>
</ul>
<li>Since you only typed "www.google.com", it's plaintext HTTP on the default port, so 80</li>
<li>Establishes a TCP connection with the resulting host number and port</li>
<ul>
<li>ws2_32!socket() to get a socket object</li>
<li>ws2_32!connect() with AF_INET and IPPROTO_TCP to connect</li>
<ul>
<li>tcpip.sys is probably involved here</li>
<li>Again the network card and Ethernet medium stuff</li>
<li>TCP three-way handshake, window negotiation, etc.</li>
<ul>
<li>The client sends a SYN TCP segment</li>
<li>The server returns a SYN,ACK TCP segment</li>
<li>The client returns an ACK TCP segment</li>
</ul>
<li>TCP data transmission</li>
<ul>
<li>The client sends a SYN,PSH TCP segment pushing data</li>
<li>Something like "GET / HTTP/1.1\nHost: www.google.com\nUser-Agent: ..."</li>
</ul>
</ul>
</ul>
</ul>
</ul>
<li>Google's web server does some thinking and returns a response</li>
<li>The client receives a 3xx redirect response and gets directed to go to https://www.google.com/</li>
<li>Uses Microsoft schannel (secure channel) library to negotiate ciphers, parse the server's security certificate, and transmit data over TLS</li>
<li>Starts with ClientHello message, ServerHello, etc.</li>
<li>Obtains HTTP response, something like "HTTP 200 OK\n..." with some HTML in the HTTP body</li>
<li>HTML links to images, maybe JavaScript, etc., resulting in Cross-Origin (CORS) processing and follow-on requests</li>
<li>Invokes the JScript scripting engine for JScript and rendering engine to display the content</li>
<li>Uses graphics primitives and likely renders into a buffer that it furnishes to win32k.sys through GDI calls</li>
<li>GDI manages framebuffer of all windows including the foreground window</li>
<li>Monitor dispays the framebuffer</li>
<li>Photons fly into your eyes</li>
<li>Optic nerve and brain adjust for upside-down image arriving at retina</li>
<li>Person realizes then went to Google and says, "crap, I meant to go to Bing." Orrrrrr maybe not, haha.</li>
</ul>
<br />
If I had more time, I'd draw this out a little more, but I had to quit eventually. And the more I do this, the more I bump into all the things I don't know. A couple things I'd like to know more about:<br />
<br />
<ul>
<li>What does "network layer" mean on Windows? I could use ETW with syscall stackwalking enabled to follow the ws2_32!connect() call into the kernel, or Windows Internals might just tell me.</li>
<li>What kernel object represents a packet in the Windows kernel? A packet buffer? I forgot :-( </li>
<li>How does the networking stack give a packet to the NIC to transmit it? My device driver fu is ageing.</li>
</ul>
<br />
<br />
<style>
code {
font-family: Courier;
margin:.75em 0;
border:1px solid #596;
border-width:1px 1px;
padding:5px 15px;
display: block;
background-color: #dedede;
white-space: pre;
}
</style>Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0tag:blogger.com,1999:blog-5433555458120308127.post-43025809259627758942017-04-23T17:53:00.000-07:002017-04-23T17:53:08.262-07:00So, de beency bouncy burger, eh? <style>
code {
font-family: Courier;
margin:.75em 0;
border:1px solid #596;
border-width:1px 1px;
padding:5px 15px;
display: block;
background-color: #dedede;
white-space: pre;
}
</style><br />
<div>
A colleague found this decoder called CyberChef, which I wish to bookmark and share with you. You may find it useful, too. I hear it can be downloaded as a standalone web page if you wish to audit it and then use it privately for opsec reasons, offline analysis, etc.</div>
<div>
<br /></div>
<div>
<a href="https://gchq.github.io/CyberChef/">https://gchq.github.io/CyberChef/</a></div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhSJB3TP4NSyYX4rSVn1if0rxHNcQs0YJH5zXhc49woZ9AeKptQF0H_om9bZZ9o9jYIYyruvIEWPvl1mrFF_2wKz9R9UWn4mnKRdrOGIW4SQPclwqsBL1fNNTPgqGks3zdl1LjI_kMm-Kb8/s1600/Swedish_poser.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhSJB3TP4NSyYX4rSVn1if0rxHNcQs0YJH5zXhc49woZ9AeKptQF0H_om9bZZ9o9jYIYyruvIEWPvl1mrFF_2wKz9R9UWn4mnKRdrOGIW4SQPclwqsBL1fNNTPgqGks3zdl1LjI_kMm-Kb8/s400/Swedish_poser.jpg" width="280" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
Børk, børk, børk!</div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0tag:blogger.com,1999:blog-5433555458120308127.post-67796469422670845622017-04-23T15:08:00.000-07:002017-05-02T21:17:55.068-07:00Truthful ProgrammEEng<span class="pl-s">In this article I'll share how to approach complex logic problems by borrowing from the Electrical Engineering (EE) discipline. </span>I'll discuss how I've used techniques from digital logic circuit design to write efficient and accurate code for hypervisor startup and for packet redirection.<br />
<br />
This approach can reduce unnecessary nested/multiple conditional statements down to a one-liner. It applies to software engineering scenarios when you can express your problem in terms of a set of strictly Boolean (true/false) conditions.<br />
<br />
If you find yourself contemplating a tangled web of conditional (if/else) code or you find it difficult to understand what conditions contribute to the decision you want your software to make, take a step back and consider applying this technique from digital logic circuit design. If nothing else, the first three or five steps will get you on the right track to write the correct conditional logic (in if/else style) to solve your problem. If you can carry this process to its conclusion, then you can arrive at the most optimized one-liner to solve your logic problem.<br />
<br />
It consists of the following steps:<br />
<ol>
<li>Figure out what Boolean (true/false) conditions are necessary to the decision at hand</li>
<li>Define a truth table that specifies all the possible conditions</li>
<li>Mark which conditions you want to yield in a TRUE result</li>
<li>Turn the TRUE cases (EEs call these minterms) into logical expressions</li>
<li>Logical-OR those expressions together to render a decision (EEs call this the canonical sum-of-products, or CSOP, logic function)</li>
<li>Simplify if possible</li>
</ol>
<span class="pl-s">
</span>
<br />
<div>
<span class="pl-s">Here's a hypervisor programming example followed by a network filtering example to demonstrate the process.</span></div>
<span class="pl-s">
</span>
<br />
<h2>
<span class="pl-s">
Starting a VT-x Hypervisor</span></h2>
<span class="pl-s">
</span>
<br />
<div>
<span class="pl-s">I needed to write startup code for a tiny hypervisor project using Intel Virtualization Technology Extensions (VT-x). I knew nothing about VT-x, but I did know that Intel documented the corresponding VMX instruction set and all its requirements in their <a href="http://www.intel.com/products/processor/manuals/" target="_blank">processor manual</a>, so I started there. Most of the publicly available code I could get my hands on seemed to specifically check one-off control flags to check if it was okay to enter VMX root operation, ignoring the exact specifications that Intel provided in more modern documentation. I decided to implement my own check based on the data sheet.</span><br />
<span class="pl-s"><br /></span>
<span class="pl-s">The Intel manual describes some requirements for control registers CR0 and CR4 to allow the VMXON instruction to work. Here is some not-very-succinct text from the Intel manual (in case this makes your eyes bleed, I'll provide a more succinct summary next):</span><br />
<span class="pl-s"><br /></span></div>
<div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiB6mhuVReTcISXzJftyC6H_meSS6S-R3_C3HN5GvfGMGR3mCKDESD6Pype-JNutWfk-nluerNPKUhiV6UdCNwSNsH0VmjwHnBa2f1tVJ5cmOrsVaVvbNugfPI2pR7igeXSYGulqCCsKt4-/s1600/vmx.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="480" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiB6mhuVReTcISXzJftyC6H_meSS6S-R3_C3HN5GvfGMGR3mCKDESD6Pype-JNutWfk-nluerNPKUhiV6UdCNwSNsH0VmjwHnBa2f1tVJ5cmOrsVaVvbNugfPI2pR7igeXSYGulqCCsKt4-/s640/vmx.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Clear as mud.</td></tr>
</tbody></table>
<span class="pl-s"><br /></span>
<span class="pl-s">In summary, there is a FIXED0 and a FIXED1 model-specific register (MSR) for CR0, and another pair of these for CR4. The most important part of the text above is the last sentence starting with "Thus, each bit...". This amounts to the following possible values of FIXED0:FIXED1 along with their corresponding meaning:</span><br />
<span class="pl-s"><br /></span>
<br />
<table border="1">
<tbody>
<tr><td><b>F0</b></td><td><b>F1</b></td><td><b>CR must be</b></td></tr>
<tr><td>0</td><td>0</td><td>fixed to 0</td></tr>
<tr><td>1</td><td>1</td><td>fixed to 1</td></tr>
<tr><td>0</td><td>1</td><td>don't care</td></tr>
</tbody></table>
<span class="pl-s"><br /></span>
<span class="pl-s">Notice that one case is missing from Intel's description: </span>$F0:F1 = 10$. When we create a truth table including $CR$ as an input, this will amount to two missing / don't-care cases (one for $CR=0$ and another for $CR=1$).<br />
<br />
<span class="pl-s">So, expanding this out to take into account the two possible values of any bit in the control register we are checking, we have the following truth table for the possible values of $F0:F1$ and $CR$:</span><br />
<span class="pl-s"><br /></span>
<br />
<table border="1">
<tbody>
<tr><td><b>F0</b></td><td><b>F1</b></td><td><b>CR</b></td><td><b>Invalid</b></td></tr>
<tr><td>0</td><td>0</td><td>0</td><td>FALSE</td></tr>
<tr><td>0</td><td>0</td><td>1</td><td>TRUE (control register bit should be 0 but is 1)</td></tr>
<tr><td>1</td><td>1</td><td>0</td><td>TRUE (control register bit should be 1 but is 0)</td></tr>
<tr><td>1</td><td>1</td><td>1</td><td>FALSE</td></tr>
<tr><td>0</td><td>1</td><td>0</td><td>FALSE</td></tr>
<tr><td>0</td><td>1</td><td>1</td><td>FALSE</td></tr>
</tbody></table>
<span class="pl-s"><br /></span>
<span class="pl-s">Since there are two cases where a given control register bit can be in an invalid state, that means there are two logical cases, or minterms, either of which should result in a verdict of true (i.e. it is true that the control register is in an invalid state). Here's the same table from above, but formatted a little bit more like a truth table, with minterms called out.</span><br />
<span class="pl-s"><br /></span>
<br />
<table border="1">
<tbody>
<tr><td><b>F0</b></td><td><b>F1</b></td><td><b>CR</b></td><td><b>Invalid</b></td><td><b>Minterms</b></td></tr>
<tr><td>0</td><td>0</td><td>0</td><td>0</td><td></td></tr>
<tr><td>0</td><td>0</td><td>1</td><td>1</td><td>$\overline{F0}\cdot\overline{F1}\cdot CR$</td></tr>
<tr><td>1</td><td>1</td><td>0</td><td>1</td><td>$F0 \cdot F1 \cdot \overline{CR}$</td></tr>
<tr><td>1</td><td>1</td><td>1</td><td>0</td><td></td></tr>
<tr><td>0</td><td>1</td><td>0</td><td>0</td><td></td></tr>
<tr><td>0</td><td>1</td><td>1</td><td>0</td><td></td></tr>
</tbody></table>
<span class="pl-s"><br /></span>
<span class="pl-s">This amounts to the following sum of products logic function:</span><br />
<span class="pl-s"><br /></span>
<span class="pl-s">$$Invalid(F0,F1,CR)=\overline{F0} \cdot \overline{F1} \cdot CR+F0 \cdot F1 \cdot \overline{CR}$$</span><br />
<span class="pl-s"><br /></span>
<span class="pl-s">Boolean algebra tells us that what is true for one bit in a bit vector is true for all bits. So, in C code, each negated variable (e.g. $\overline{F0}$) amounts to inverting all the bits in the integer. So, the C code for this would look like:</span><br />
<span class="pl-s"><br /></span>
<br />
<pre class="vimCodeElement"><span class="pl-s"><span class="Type">int</span> is_invalid(<span class="Type">int</span> fixed0, <span class="Type">int</span> fixed1, <span class="Type">int</span> cr)
{
<span class="Statement">return</span> ((~fixed0 & ~fixed1 & cr) | (fixed0 & fixed1 & ~cr));
}
</span></pre>
<span class="pl-s"><br /></span>
<span class="pl-s">Boolean algebraic simplification can be applied to optimize this function to result in fewer logical operators, and fewer instruction opcodes emitted by the compiler. Optimization is left as an exercise for the student (hint: XOR is involved) ;-)</span></div>
<span class="pl-s">
</span>
<br />
<h2>
<span class="pl-s">
Redirecting Network Traffic</span></h2>
<span class="pl-s">
</span>
<br />
<div>
<span class="pl-s">More recently, I've been writing logic for the Linux Diverter in <a href="https://github.com/fireeye/flare-fakenet-ng" target="_blank">FakeNet-NG</a>. This project is the successor to the original <a href="https://practicalmalwareanalysis.com/fakenet/" target="_blank">FakeNet</a> tool distributed by Siko and Tank to let people simulate a network in a single virtual machine. The NG version was originally developed to make it possible to use FakeNet on newer versions of Windows, but it would be nice for it to work on Linux, too.</span><br />
<span class="pl-s"><br /></span>
<span class="pl-s">So, I am writing a Linux "Diverter", which is a component that manages how packets are redirected and manipulated to simulate the network. This is a really fun project in which I'm using <a href="https://github.com/kti/python-netfilterqueue" target="_blank">python-netfilterqueue</a> to catch and redirect packets so we can observe traffic sent to any port in the system, even ports where no service is currently bound.</span><br />
<span class="pl-s"><br /></span>
<span class="pl-s">The specification for the Linux Diverter includes redirecting traffic sent to unbound ports into a single listener (similar to the <a href="http://www.inetsim.org/" target="_blank">INetSim</a> dummy listener). </span>We want the Linux version of FakeNet-NG to be used in two modes:<br />
<ul><span class="pl-s">
<li>SingleHost: the malware and the network simulation tool are running on the same machine, like the legacy version of <a href="https://practicalmalwareanalysis.com/fakenet/" target="_blank">FakeNet</a>.</li>
<li>MultiHost: the malware and the network simulation tool are running on different machines, like INetSim.</li>
</span></ul>
<span class="pl-s">The conditions for whether a packet must be redirected boil down to a logic function of four inputs:</span></div>
<span class="pl-s">
</span>
<br />
<div>
<ul><span class="pl-s">
<li>(A) Is the source address local?</li>
<li>(B) Is the destination address local?</li>
<li>(C) Is the source port bound by a FakeNet-NG listener?</li>
<li>(D) Is the destination port bound by a FakeNet-NG listener?</li>
</span></ul>
</div>
<span class="pl-s">
</span>
<br />
<div>
The criteria for changing the destination port of a packet are as follows:<br />
<ol><span class="pl-s">
<li>When using FakeNet-NG in MultiHost mode (like INetSim), if a foreign host is trying to talk to us or any other host, ensure that unbound ports get redirected to a listener.</li>
<li>When using FakeNet-NG in SingleHost mode (like legacy FakeNet), ensure outbound destination packets are redirected <i>except </i>when the packet is a response from a FakeNet-NG listener (in other words, the packet originated from a bound port).</li>
</span></ol>
<span class="pl-s">The truth table for a decision function that redirects traffic destined for local, unbound ports to a single dummy listener in both SingleHost and MultiHost scenarios is as follows, with local IPs and bound ports in bold:</span><br />
<span class="pl-s"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjasmaYK2g-d_N32XHTuox-_2AJcX5Ab8sSuk0lG7DhyphenhyphenPug25_KyGqGiAXb_jQUzD3rUvFuUQU9JGVClwEeV1D_mcIOf91OtavhIdZ6ALbtdDKyv03sHQk7g71Rq2KIHSM8J11qPvjcnXlr/s1600/fakenet2.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="368" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjasmaYK2g-d_N32XHTuox-_2AJcX5Ab8sSuk0lG7DhyphenhyphenPug25_KyGqGiAXb_jQUzD3rUvFuUQU9JGVClwEeV1D_mcIOf91OtavhIdZ6ALbtdDKyv03sHQk7g71Rq2KIHSM8J11qPvjcnXlr/s400/fakenet2.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Sixteen cases</td></tr>
</tbody></table>
<span class="pl-s"><br /></span><span class="pl-s">Here, we translated all the different human-comprehensible conditions like "it's coming from a foreign host, don't care what port, and it's arriving at the local host to an unbound port" into a set of Boolean conditions that can be either true (1) or false (0). It's convenient to use letters like A, B, C, and D to represent the inputs (see the first row of the table above). The minterms of this function are the cases when it returns true, namely:</span><br />
<span class="pl-s"><br /></span>
<br />
<ol>
<li>$\overline{A} \cdot \overline{B} \cdot \overline{C} \cdot \overline{D}$</li>
<li>$\overline{A} \cdot \overline{B} \cdot C \cdot \overline{D}$</li>
<li>$\overline{A} \cdot B \cdot \overline{C} \cdot \overline{D}$</li>
<li>$\overline{A} \cdot B \cdot C \cdot \overline{D}$</li>
<li>$A \cdot \overline{B} \cdot \overline{C} \cdot \overline{D}$</li>
<li>$A \cdot B \cdot \overline{C} \cdot \overline{D}$</li>
</ol>
<span class="pl-s"><br /></span>
<span class="pl-s">We can see that a few terms can be simplified, such as (1) and (2). Since A, B, and D are false and C can be either true or false, these two really amount to A' B' D'. We can do this to arrive at a fairly simple function of these three disjunctive terms:</span><br />
<div>
<span class="pl-s"><br /></span>
<br />
<ul><span class="pl-s">
<li>$\overline{A} \cdot \overline{B} \cdot \overline{D}$</li>
<li>$\overline{A} \cdot B \cdot \overline{D}$</li>
<li>$A \cdot \overline{C} \cdot \overline{D}$</li>
</span></ul>
<span class="pl-s">From there, you can just get rid of B and collapse the first two cases into one:</span><br />
<span class="pl-s"></span><br />
<ul><span class="pl-s">
<li>$\overline{A} \cdot \overline{D}$</li>
<li>$A \cdot \overline{C} \cdot \overline{D}$</li>
</span></ul>
<span class="pl-s">And from there, you can take one step further by looking more closely at the truth table to find the optimal logic. </span>But there are six minterms here, so it might be time to bust out a more powerful tool: the ol' Karnaugh map. A K-map allows you to visually locate all the adjacent groups (pairs, quads, etc.) of logical "yes" outputs and arrive at the simplest possible logic function to yield the desired output. If you're not familiar, you should definitely <a href="https://en.wikipedia.org/wiki/Karnaugh_map" target="_blank">check this out</a>.<br />
<span class="pl-s"><br /></span>
<span class="pl-s">Here's the K-map for our logic function:</span><br />
<span class="pl-s"><br /></span>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkvrqucdECXETqHK-6F4oxtQM9lz3TvkRMC6oTaIpAAEu1QMy2KOzJGwnNsVKNqaHqBGGKQClTtKzukI6hFOgQ2BG6E_3n6h3Vd5cWqY1TN0F8GI4lu8naGYdP4vKcEPYNgLrEj0n8r2eI/s1600/IMAG0640.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkvrqucdECXETqHK-6F4oxtQM9lz3TvkRMC6oTaIpAAEu1QMy2KOzJGwnNsVKNqaHqBGGKQClTtKzukI6hFOgQ2BG6E_3n6h3Vd5cWqY1TN0F8GI4lu8naGYdP4vKcEPYNgLrEj0n8r2eI/s400/IMAG0640.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Short and sweet: two disjunctive terms in three variables</td></tr>
</tbody></table>
<span class="pl-s"><br /></span>
<span class="pl-s">The K-map here has only two groups of adjacent minterms: those that occur when $A$ and $D$ are zero, and those that occur when $C$ and $D$ are zero. This results in two disjunctive terms, each requiring only two input variables apiece (three in total):</span><br />
<ul><span class="pl-s">
<li>$\overline{A} \cdot \overline{D}$</li>
<li>$\overline{C} \cdot \overline{D}$</li>
</span></ul>
<span class="pl-s">So, the minimal sum of products logic function would be:</span><br />
<span class="pl-s"><br /></span>
<span class="pl-s">$$R(A, B, C, D) = \overline{A}\overline{D}+\overline{C}\overline{D}$$</span><br />
<span class="pl-s">Or, in Python:</span></div>
<span class="pl-s"><br /></span>
<br />
<pre class="vimCodeElement"><span class="pl-s"> <span class="Statement">return</span> ((<span class="Statement">not</span> src_local <span class="Statement">and</span> <span class="Statement">not</span> dport_bound) <span class="Statement">or</span>
(<span class="Statement">not</span> sport_bound <span class="Statement">and</span> <span class="Statement">not</span> dport_bound))
</span></pre>
<span class="pl-s"><br /></span>
<br />
<h2>
<span class="pl-s">
Done and done.</span></h2>
</div>
<span class="pl-s">
</span>
<br />
<div>
<span class="pl-s">The next time you find yourself rolling around in freaky nested conditionals and arriving at the wrong logic, try approaching it the way an electrical engineer would: express the problem in terms of a set of Boolean conditions and then figure out how you want each case to go. If you want the simplest result, you can use Boolean algebra or learn to use a K-map to handle the rest. But even if you stop at the truth table, at least this process will help you sort out logic that you might otherwise be confused about and ensure that you know all the test cases you'll need to include to properly test your code.</span></div>
<span class="pl-s">
</span>Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0tag:blogger.com,1999:blog-5433555458120308127.post-74021261220029759442017-04-23T07:47:00.001-07:002017-04-23T18:26:45.208-07:00MathJaxI fixed MathJax for mobile on my blog. I also made it retrieve the scripts via cloudflare over https rather than in the plain from mathjax.org, in case any of my readers live in repressive regimes that punish people for reading beautifully rendered equations.<br />
<div>
<br /></div>
<div>
This might be useful to other blogger.commers who like LaTeX/MathJax and want to see it render on the mobile version of their blog:<br />
<a href="http://stackoverflow.com/questions/42592013/blog-with-mathjax-seen-on-a-cellphone">http://stackoverflow.com/questions/42592013/blog-with-mathjax-seen-on-a-cellphone</a></div>
<div>
<br /></div>
<div>
Okay. Carry on.</div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0tag:blogger.com,1999:blog-5433555458120308127.post-18988192527367123872017-04-22T08:09:00.000-07:002018-01-13T11:02:30.436-08:00Chasing Heisenbugs: Troubleshooting Frustratingly Unresponsive SystemsI did some work in January 2016 on automated performance profiling and diagnosis. As <a href="https://twitter.com/arvanaghi" target="_blank">@arvanaghi</a> pointed out, this can be useful for investigating observables resulting from potentially malicious activity. So, I'm figuring out where I left off by turning my documentation into a blog post. What I wrote is pretty stuffy, but since I am sharing it in blog format, I will take some artistic license here and there. Without further ado, I present to you:<br />
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjoOvaWWG1ljBuOdLLhuYaIWwqkvq2hKJbhmHzyrgO5I67s4YP0QuXhs-iOh5gpBW6C4Smy-b3wIdPVuXKgVTSD2rCml99mlukWQm76ewP9G27wg-veWilqErleTXdkUQ6_jzpeQoSe-4Wh/s1600/heisen.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="201" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjoOvaWWG1ljBuOdLLhuYaIWwqkvq2hKJbhmHzyrgO5I67s4YP0QuXhs-iOh5gpBW6C4Smy-b3wIdPVuXKgVTSD2rCml99mlukWQm76ewP9G27wg-veWilqErleTXdkUQ6_jzpeQoSe-4Wh/s640/heisen.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">I'm just going to paste the whole thing here and draw emoticons on it</td></tr>
</tbody></table>
<div>
<br /></div>
<h2>
Scope</h2>
<div>
Several challenges make performance profiling and diagnosis of deployed applications a difficult task:</div>
<div>
<div>
<ul>
<li>Difficulty reproducing intermittent issues</li>
<li>Investigation inhibited by user interface latency (UI) due to resource consumption</li>
<li>Unavailability of appropriate diagnostic tools on user machines</li>
<li>Inability of laypeople to execute technically detailed diagnostic tasks</li>
<li>Unavailability of live artifacts in cases of dead memory dump analysis</li>
</ul>
</div>
</div>
<div>
My studies have included two following types of disruptive resource utilization:</div>
<div>
<div>
<ul>
<li>User interface latency due to prolonged high CPU utilization</li>
<li>User interface latency due to prolonged high I/O utilization</li>
</ul>
</div>
</div>
<div>
I'll just be talking about CPU here.<br />
<br />
Where applicable, this article will primarily discuss automated means for solving the above problems. Tools can be configured to trigger only when issues appear that are specific to the software and issues that the diagnostic software is meant to address. Where feasible, I will share partial software specifications, rapidly prototyped proofs of concept (PoCs), empirical results, and discussion of considerations relevant to production environments.</div>
<div>
<br /></div>
<h2>
Prolonged High CPU Utilization</h2>
<div>
Automated diagnostics in general can be divided into two classes: (1) those that are meant to identify conditions of interest (e.g. high CPU utilization); and (2) those that are meant to collect diagnostic information relevant to that condition. Each is treated subsequently.</div>
<div>
<br /></div>
<h3>
Identifying Conditions of Interest</h3>
<div>
For purposes of this discussion, two classes of CPU utilization will be used: single-CPU percent utilization and system-wide percent CPU utilization. Single-CPU percent utilization is defined to be the percent time spent in both user and kernel mode (Equation 1); system-wide CPU utilization is defined to be the same figure, divided by the number of CPUs in the system (Equation 2). For example, if a process uses 100% of a single logical CPU in a four-CPU system, its system-wide CPU utilization is 25%. System-wide CPU utilization is the figure that is displayed by applications such as taskmgr.exe and tasklist.exe.<br />
<div>
</div>
</div>
<div>
<br /></div>
<div>
$$u_1=\frac{\Delta t_{user} + \Delta t_{kernel}}{\Delta t}$$</div>
<div style="text-align: right;">
(Eq. 1)</div>
<div>
<br /></div>
<div>
$$u_2=\frac{u_1}{n_{cpus}}$$</div>
<div style="text-align: right;">
(Eq. 2)</div>
<div>
<br /></div>
<div>
High CPU utilization can now be defined from the perspective of user experience. Single-threaded applications will only be capable of consuming <100% of the CPU time on a single CPU (e.g. on a two-CPU system, <50% of system CPU resources). Multi-threaded applications have a much higher potential impact on <i>whole-system CPU utilization</i> because they can create enough CPU-intensive threads to run all logical CPUs at 100%. For purposes of this article, 85% CPU system-wide CPU utilization or greater will constitute high CPU utilization.</div>
<div>
<br /></div>
<div>
As for prolonged high CPU utilization, that is a subjective matter. From a user experience perspective, this can vary depending upon the user. For purposes of this article, high CPU utilization lasting 5 seconds or greater will be considered to be prolonged high CPU utilization. In practice, engineers might also need to consider how to classify and measure spikes in CPU utilization that occur frequently but for a shorter time than might constitute "prolonged" high CPU utilization; however, these considerations are left out of the scope of this article.<br />
<br />
<div>
I've implemented a Windows PoC (<a href="https://gist.github.com/strictlymike/2176540fbfc3e688d539f8077b6f2e53" target="_blank">trigger.cpp</a>) to assess the percent CPU utilization (both single-CPU and system-wide) for a given process. I don't know of any APIs for process-wide or thread-specific CPU utilization, but Windows does expose the GetProcessTimes() API which can be used to determine how much time a process or thread has spent executing in user and kernel space over its life. I've used this to measure the change in kernel and user execution times versus the progression of real time as measured using the combination of the QueryPerformanceCounter() and QueryPerformanceFrequency() functions. Figure 1 shows the PoC in operation providing processor utilization updates that closely track the output of Windows Task Manager. The legacy SysInternals' CPUSTRES.EXE tool was used to exercise the PoC.</div>
</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0sHzzhfFKL356TAn-6HQbN2KV5UzyWoVr2TAUEthFL7LDaRBSe1DeRiuw59jf8czoZo5Xh_4VLdqVcyFBXztvJaXyBiWdHhgoI3I5hNQ0FCtnKsx6TrrwyCel3NsxdYVuxLAt2Sflxhyphenhyphenk/s1600/2016.01.31-00-TriggerInitialOperationalLevel.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="212" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0sHzzhfFKL356TAn-6HQbN2KV5UzyWoVr2TAUEthFL7LDaRBSe1DeRiuw59jf8czoZo5Xh_4VLdqVcyFBXztvJaXyBiWdHhgoI3I5hNQ0FCtnKsx6TrrwyCel3NsxdYVuxLAt2Sflxhyphenhyphenk/s400/2016.01.31-00-TriggerInitialOperationalLevel.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Fig. 1: CPU utilization tool</td></tr>
</tbody></table>
<div>
<br /></div>
<div>
There's one more thing to think about. If a diagnostic utility is executed indefinitely, it would be nice to make it consolidate successive distinct CPU utilization events into a single diagnostic event.</div>
<div>
<br /></div>
<div>
For example, the CPU utilization graph in Figure 1 below depicts a single high-CPU event lasting from $t=50s$ through $t=170s$. Although there are two dips in utilization around $t=110s$ and $t=150s$, this would likely be considered a single high-CPU event from both an engineering and a user experience perspective. Therefore, rather than terminating and restarting monitoring to record two events, a more coherent view might be obtained by recording a single event.</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4raPl3ydfE10CdOPTbHVTm8eVB2n0zVD8QcxivA6zF68OZ0p_YT8jGQM-vh9P29gBacWt4Uwo9uVhcl1aiCdmIPfmVanFcbJMT1SAF5oSlDvVl4ooLKB1YXi-nI_jXkYVc1e9P2-uhX4B/s1600/graph.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="231" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4raPl3ydfE10CdOPTbHVTm8eVB2n0zVD8QcxivA6zF68OZ0p_YT8jGQM-vh9P29gBacWt4Uwo9uVhcl1aiCdmIPfmVanFcbJMT1SAF5oSlDvVl4ooLKB1YXi-nI_jXkYVc1e9P2-uhX4B/s400/graph.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Fig. 2: Single high CPU utilization event with two short dips</td></tr>
</tbody></table>
<div>
<br /></div>
<div>
A dip in utilization might also represent a transition from one phase of high CPU utilization to another, in which the target process performs markedly different activities than prior to the dip. This information can be preserved within a single diagnostic event for later identification provided that sufficiently robust data are collected.</div>
<div>
<br /></div>
<div>
One way to prevent continual collection of instantaneous events and to coalesce temporally connected events together is to define a finite state machine that implements hysteresis. Thresholds can be defined and adhered to in order to satisfy both the definition of "prolonged" high CPU utilization and the requirement that diagnostic information is not collected multiple times for a single "event". Such a state machine could facilitate a delay before diagnostics are terminated and launched again, which can in turn prevent the processing, storage, and/or transmission of excessive diagnostic reports representing a single event. Figure 3 depicts a finite state machine (FSM) for determining when to initiate and terminate diagnostic information collection.</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRfhlYCOuoYN630pfAq9-eWDej_wGGZ4qc2rFqksWfyKGH8-MiY-aVTVJ7IUSm4ylQrLMLYiYrbKRfBXR6LTyoVKwk6xh7vGyYTatZAykVeokRhpAWdDXFMpbmA3_HhfQj9nnSMdOERlZE/s1600/2016.01.31-02-FSM_barely.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="341" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRfhlYCOuoYN630pfAq9-eWDej_wGGZ4qc2rFqksWfyKGH8-MiY-aVTVJ7IUSm4ylQrLMLYiYrbKRfBXR6LTyoVKwk6xh7vGyYTatZAykVeokRhpAWdDXFMpbmA3_HhfQj9nnSMdOERlZE/s400/2016.01.31-02-FSM_barely.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Fig. 3: It's an FSM. Barely.</td></tr>
</tbody></table>
<div>
<br /></div>
<div>
The state machine in Figure 3 above would be evaluated every time diagnostic information is sampled, and would operate as follows:</div>
<div>
<ol>
<li>The machine begins in the Normal state.</li>
<li>Normal state is promoted to High CPU at any iteration when CPU utilization exceeds the threshold (85% for purposes of this article).</li>
<li>When the state is High CPU, it can advance to Diagnosis or regress to Normal, as follows:</li>
<ol>
<li>If CPU utilization returns below a threshold before the threshold number of seconds have elapsed, then this does not qualify as a "prolonged" high CPU utilization event, and state is demoted to Normal;</li>
<li>If CPU utilization remains above the threshold utilization value for the threshold number of seconds, then diagnostics are initiated and state is promoted to Diagnosis.</li>
</ol>
<li>The Diagnosis state is used in combination with the Normal-Wait state to avoid continually repeating the diagnostic process over short periods. When the state is Diagnosis, it can either advance to Normal-Wait, or remain at Diagnosis. The condition for advancing to Normal-Wait is realized when CPU utilization for the target process falls below the threshold value.</li>
<li>When the state is Normal-Wait, the next transition can be either a regression to Diagnosis, no change, or an advance back to the Normal state:</li>
<ol>
<li>If the CPU utilization of the target process returns to high utilization before the time threshold expires, the threshold is reset and the state regresses to Diagnosis. In this case, diagnostic information collection continues.</li>
<li>If CPU utilization remains low but the threshold duration has not elapsed, the state machine remain in Normal-Wait.</li>
<li>If the CPU utilization of the target process remains below the threshold value for the threshold duration, then diagnostics are terminated, the state transitions back to Normal, and the machine can return to a state in which it may again consider escalating to further diagnostics of subsequent events.</li>
</ol>
</ol>
</div>
<div>
The accuracy of this machine in identifying the correct conditions to initiate and terminate diagnostic information collection could be improved by incorporating fuzzy evaluation of application conditions, such as by using a moving average of CPU utilization or by omitting statistical outliers from evaluation. Other definitions, thresholds, and behaviors described above may be refined further depending upon the specific scenario. Such refinements are beyond the scope of this brief study.</div>
<div>
<br /></div>
<h3>
Collecting Diagnostic Information</h3>
<div>
When prolonged high CPU utilization occurs, the high-level question on the user's mind is: WHAT THE HECK IS MY COMPUTER DOINGGGGGG????!!</div>
<div>
<br /></div>
<div>
And to answer this question, we can investigate where the application is spending its time. Which, incidentally, is available to us through exposed OS APIs.</div>
<div>
<br /></div>
<div>
To address where the application is spending its time, two pieces of information are relevant:</div>
<div>
<div>
<ol>
<li>What threads are consuming the greatest degree of CPU resources, and</li>
<li>What instructions are being executed in each thread?</li>
</ol>
</div>
</div>
<div>
<div>
This information may allow engineers to identify which threads and subroutines or basic blocks are consuming the most significant CPU resources. In order to obtain this information, an automated diagnostic application must first enumerate and identify running threads. Because threads may be created and destroyed at any time, the diagnostic application must continually obtain the list of threads and then collect CPU utilization and instruction pointer values per thread. The result may be that threads appear and disappear throughout the timeline of the diagnostic data.</div>
<div>
<br /></div>
<div>
Ideally, output would be to a binary or XML file that could be loaded into a user interface for coherent display and browsing of data. In this study and the associated PoC, information will be collected over a fixed number of iterations (i.e. 100 samples) and displayed on the console before terminating.</div>
<div>
<br /></div>
<div>
For purposes of better understanding the origin of each thread, it can be useful to obtain module information and determine whether the thread entry point falls within the base and end address of any module. If it does, then slightly more informational name information, such as modname.dll+0xNNNN, can be displayed. Note that I said <i>slightly</i> more informational. Sometimes this just points to a C runtime thread start stub. But it's still worth having.</div>
<div>
<br /></div>
<div>
In the PoC, data is displayed by sorting the application's threads by percent utilization and displaying the most significant offenders last. Figure 4 shows threads from a legacy SysInternals CPU load simulation tool, CPUSTRES.EXE, sorted in order of CPU utilization.</div>
</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJ31ABafRISS1slFVbHSPqiUjhC-ROIAINVl318VddOAZuzm7fDPDut71Q4fwYoUiq-RTUrAwraSs0_PJJUZyAh6YRieK2qXgkyHPdlzP9gVOTlmBC39tCunAfr1Av1pqpjLWuliqcO-ZX/s1600/2016.01.31-01-InspectInitialOperationalLevel.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="340" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJ31ABafRISS1slFVbHSPqiUjhC-ROIAINVl318VddOAZuzm7fDPDut71Q4fwYoUiq-RTUrAwraSs0_PJJUZyAh6YRieK2qXgkyHPdlzP9gVOTlmBC39tCunAfr1Av1pqpjLWuliqcO-ZX/s640/2016.01.31-01-InspectInitialOperationalLevel.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Fig. 4: Threads sorted in ascending order of CPU utilization</td></tr>
</tbody></table>
<div>
<br /></div>
<div>
Although this answers the high-level question of what the program is doing (i.e., executing the thread whose start routine is located at CPUSTRES.EXE+0x1D7B in two separate threads), it does not indicate specifically what part of each routine is executing at a given time.</div>
<div>
<br /></div>
<div>
To answer more specific questions about performance, two techniques are available:</div>
<div>
<div>
<ul>
<li>Execution tracing</li>
<li>Instruction pointer sampling</li>
</ul>
</div>
</div>
<div>
Execution tracing can be implemented to observe detailed instruction execution by using either single-stepping via Windows debug APIs or by making use of processor-specific tracing facilities. Instruction pointer sampling on the other hand can be implemented quickly, albeit at a cost to diagnostic detail. Even so, this method offers improved performance since single-stepping is ssssllllloooowwwwwwwwwwww.</div>
<div>
<br /></div>
<div>
This PoC (<a href="https://gist.github.com/strictlymike/c7495013a6e505f02d0f8e260b13435c" target="_blank">inspect.cpp</a>) implements instruction pointer sampling by suspending each thread with the SuspendThread() function and obtaining the control portion of the associated thread context via the GetThreadContext() function. Figure 5 depicts the PoC enumerating instruction pointers for several threads within taskmgr.exe. Notably, thread 3996 is executing a different instruction in sample 7 than it is in sample 8, whereas most threads in the output remain at the same instruction pointer across various samples, perhaps waiting on blocking Windows API calls.</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgi4hVL4HKIl7RRN11OLC_pPIaakdJkSxAJWp68_4wP1gTiEnoBRq4fmAEUDciiUBm5C44qoYF1ml7N7EDQ2zXrr8EdA7cgqCCHV3uNSSYhDQTjb7Cx40URNgZgLbyrXNS99E0eN_iat7UF/s1600/2016.01.31-03-InspectRipOperational.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="486" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgi4hVL4HKIl7RRN11OLC_pPIaakdJkSxAJWp68_4wP1gTiEnoBRq4fmAEUDciiUBm5C44qoYF1ml7N7EDQ2zXrr8EdA7cgqCCHV3uNSSYhDQTjb7Cx40URNgZgLbyrXNS99E0eN_iat7UF/s640/2016.01.31-03-InspectRipOperational.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Fig. 5: Instruction pointer for thread 3996</td></tr>
</tbody></table>
<div>
<br /></div>
<div>
This information can provide limited additional context as to what the application is doing. More useful information might include a stack trace (provided frame pointers are not omitted for anti-reverse engineering or optimization reasons).</div>
<div>
<br /></div>
<h2>
Conclusion</h2>
<div>
It's nice to be able to identify and collect information about events like this. But CPU utilization is only one part of the problem, and there are other ways of detecting UI latency than measuring per-process statistics. Also, what of inherent instrumentation available for collecting diagnostic information? What of kernrate, which was mentioned in Windows Internals book and is <a href="http://www.nynaeve.net/?p=45" target="_blank">covered here</a>. It looks as if this can be used instead of custom diagnostics, as long as there are sufficient CPU resources to launch it (either via starting a new process or by invoking the APIs that it uses to initiate logging). Would kernrate.exe (or the underlying APIs) suffer too much resource starvation to be useful in the automatically detected conditions I outlined above? In addition to this, what ETW information might give us a better glimpse into what is happening when a system becomes frustratingly unresponsive?</div>
<div>
<br /></div>
<div>
These are the questions I want to dig into to arrive at a fully automated system for pinpointing the reasons slowness and arriving at human-readable explanations for what is happening.</div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com1tag:blogger.com,1999:blog-5433555458120308127.post-86481494793143447362017-04-19T13:51:00.002-07:002018-01-13T10:52:16.204-08:00Troubleshooting NetfilterI'm developing a Linux "Diverter" to handle packets for <a href="https://github.com/fireeye/flare-fakenet-ng" target="_blank">FakeNet-NG</a>, and I've run into some mind-bending issues. Here is a fun one.<style>
code {
font-family: Courier;
margin:.75em 0;
border:1px solid #596;
border-width:1px 1px;
padding:5px 15px;
display: block;
background-color: #dedede;
white-space: pre;
}
</style><br />
<div>
<br /></div>
<div>
I needed to make FakeNet-NG respond when clients use it as their gateway to talk to arbitrary IP addresses. This is done easily enough: </div>
<div>
<div>
<br /></div>
<div>
<pre class="cmd">iptables -t nat -I PREROUTING -j REDIRECT
</pre>
</div>
<div>
<br /></div>
<div>
At the same time, I needed to make it possible for clients asking for arbitrary ports (where no service was bound), to be redirected to a dummy service. And I needed to write pcaps, produce logging, and allow other on-the-fly decisions to be made. This I did using <a href="https://github.com/kti/python-netfilterqueue" target="_blank">python-netfilterqueue</a> and dpkt to mangle port numbers on the way in, fix them on the way out, and recalculate checksums as necessary.<br />
<br />
These solutions each worked great. But as I learned while demonstrating this functionality, they just didn't work <i>at the same time</i>:<br />
<br />
<pre class="cmd">root@ubuntu:/home/mykill# echo fdsa | nc -v 5.5.5.5 45678
nc: connect to 5.5.5.5 port 45678 (tcp) failed: Connection timed out
</pre>
</div>
<div>
<br /></div>
<div>
I compared pcaps from successful and unsuccessful conversations between the client system and an arbitrary IP address (say, 5.5.5.5). In successful cases (where my packet mangling code was inactive), the FakeNet system responded with whatever IP the client asked to talk to, and the two systems successfully finished the TCP three-way handshake necessary to establish a connection and exchange information. But when my packet mangling code was active, the FakeNet system responded with a SYN/ACK erroneously bearing its own IP address, and the client responded with an RST.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhK09-Q27dYn_qSheJXTQMagtN1qknT2GY6y8bagZrb9rP75XmLF0B1k8nr1RN10nnKTdPtULS486Oiu3Bd_W500OQaZASxGU43RoVZ0fiUwUbOQyRJMZXlaBh7EGStkIZ0CxyMZ6WcgCXY/s1600/tmp.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="424" data-original-width="500" height="271" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhK09-Q27dYn_qSheJXTQMagtN1qknT2GY6y8bagZrb9rP75XmLF0B1k8nr1RN10nnKTdPtULS486Oiu3Bd_W500OQaZASxGU43RoVZ0fiUwUbOQyRJMZXlaBh7EGStkIZ0CxyMZ6WcgCXY/s320/tmp.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">RST is TCP-ese for "Sit down, I wasn't even talking to you."</td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br /></div>
<div>
This behavior led me to the suspicion that my packet mangling activity was preventing the system from recognizing and fixing up response packets so that their IP addresses would match the IP address of the incoming packet (say, 5.5.5.5).</div>
<div>
<br />
To investigate this, I started by looking at <a href="http://lxr.linux.no/#linux+v4.10.1/net/netfilter/xt_REDIRECT.c" target="_blank">net/netfilter/xt_REDIRECT.c</a> with the goal of learning whether the kernel was using things like the TCP port numbers I was mangling to try to keep track of what packets to fix up. I found that in the case of IPv4, redirect_tg4() calls <a href="http://lxr.linux.no/#linux+v4.10.1/net/netfilter/nf_nat_redirect.c#L32" target="_blank">nf_nat_redirect_ipv4() in nf_nat_redirect.c</a> which unconditionally accesses conntrack information in the skb (short for socket buffer, i.e. the packet), finally calling <a href="http://lxr.linux.no/#linux+v4.10.1/net/netfilter/nf_nat_core.c#L409" target="_blank">nf_nat_setup_info() in nf_nat_core.c</a>. The latter function manipulates the destination IP address and calculates a "tuple" and "inverse tuple" that will be used to identify corresponding packets by their endpoint (and other protocol characteristics) and fix up any fields that were mangled by the NAT logic.<br />
<br />
I was surprised conntrack was involved because I hadn't needed to use the -m conntrack argument to implement redirection. To confirm what I was seeing, I used lsmod to peek at the dependencies among Netfilter modules. Sure enough, I found that xt_REDIRECT.ko (which implements the REDIRECT target in my iptables rule) relies on nf_nat.ko, which itself relies on nf_conntrack.ko.<br />
<br /></div>
<div>
I still didn't have the full picture, but it seemed more and more like I was on to something. Perhaps the system was calculating a "tuple" based on the TCP destination port of the incoming packet, my code was modifying the TCP destination port, and then the system was getting a whack at the response packet before I had a chance to fix up its TCP source port to something that would result in a match.<br />
<br />
I wanted to figure out when the REDIRECT logic was executing versus when my own logic was executing so I could confirm that hypothesis. While I puzzled over this, I happened upon some <a href="http://netfilter.org/documentation/HOWTO/netfilter-hacking-HOWTO-4.html" target="_blank">relevant documentation</a> that led me to believe I might be correct about the use of TCP ports (rather than, say, socket ownership) to track connections:<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjIEHf5SdcTRv8vHQ8VtcdNBxxc-Ie_6dvIyiCdodV0oIl8q2iu2Fq3pY8yPVTi3rOLqolwjnIV9i9cV0f4ofHr7axrztVCI-1xo9FKHm2Hlz85PoE-KyNfrO366r_FNQwY-CBbvQ5Z_x5i/s1600/nat.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="420" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjIEHf5SdcTRv8vHQ8VtcdNBxxc-Ie_6dvIyiCdodV0oIl8q2iu2Fq3pY8yPVTi3rOLqolwjnIV9i9cV0f4ofHr7axrztVCI-1xo9FKHm2Hlz85PoE-KyNfrO366r_FNQwY-CBbvQ5Z_x5i/s640/nat.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Yeah dummy. It's the port.</td></tr>
</tbody></table>
<br />
This documentation also answered my question of when the NAT tuple calculations occur:<br />
<blockquote class="tr_bq">
<i>Connection tracking hooks into high-priority NF_IP_LOCAL_OUT and NF_IP_PRE_ROUTING hooks, in order to see packets before they enter the system.</i></blockquote>
These chains were consistent with the registration structures in xt_REDIRECT.c, which further indicated that the hooks were specific to the nat table (naturally):<br />
<br />
<pre class="vimCodeElement"><span class="Type">static</span> <span class="Type">struct</span> xt_target redirect_tg_reg[] __read_mostly = {
.
.
.
{
.name = <span class="Constant">"REDIRECT"</span>,
.family = NFPROTO_IPV4,
.revision = <span class="Constant">0</span>,
.table = <span class="Constant"><span style="background-color: red;">"nat"</span></span>,
.target = redirect_tg4,
.checkentry = redirect_tg4_check,
.targetsize = <span class="Statement">sizeof</span>(<span class="Type">struct</span> nf_nat_ipv4_multi_range_compat),
.hooks = <span style="background-color: red;">(<span class="Constant">1</span> << NF_INET_PRE_ROUTING) |</span>
<span style="background-color: red;">(<span class="Constant">1</span> << NF_INET_LOCAL_OUT)</span>,
.me = THIS_MODULE,
},
};
</pre>
<br />
At this point, I really wanted a way to beat Netfilter's OUTPUT/nat hook to the punch. I needed to fix up the source port of the response packet and see if I could induce Netfilter to calculate correct inverse-tuples and fix up the source IPs in my response packets again. But the documentation says Netfilter implements its connection tracking using <i>high-priority </i>hooks in the NF_IP_LOCAL_OUT and NF_IP_PRE_ROUTING chains. That sounds a lot like Netfilter gets first dibs. I sat in uffish thought until I remembered this very detailed diagram (<a href="https://upload.wikimedia.org/wikipedia/commons/3/37/Netfilter-packet-flow.svg" target="_blank">click to enlarge</a>):<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://upload.wikimedia.org/wikipedia/commons/3/37/Netfilter-packet-flow.svg" style="margin-left: auto; margin-right: auto;" target="_blank"><img border="0" height="187" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhywxiJAtI1dQL1K01hK73Nf3CGQkM5u8BT67bSjiDcFe1Q11MLQHCRhbDbyuwzgvfBCmMdqL4utINIar7QmAsk28Uwf4E4NHYEnJZCgFZYeb_VA9zEOZg4hOlJgTtKuCL8Ek12U5T-OERL/s640/nf.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">You are here. No, wait.</td></tr>
</tbody></table>
<br />
If this graphic was correct, I was about to get in before Netfilter -- by changing my fix-up hook to run in the OUTPUT/raw chain and table. I gave it a shot, and...<br />
<br />
<pre class="cmd">root@ubuntu:/home/mykill# echo asdf | nc -v 5.5.5.5 9999
Connection to 5.5.5.5 9999 port [tcp/*] succeeded!
asdf
</pre>
</div>
<div>
<br /></div>
<div>
VICTORY!<br />
<br />
It was a hard-fought battle, but I actually was able to confirm my suspicions thanks to the very readable code in the Netfilter portion of the kernel and some very helpful documentation. It's fun to be working on Linux again!</div>
</div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com1tag:blogger.com,1999:blog-5433555458120308127.post-65974062582063749382017-02-03T21:19:00.000-08:002017-04-23T17:51:41.840-07:00Remember the Registers (or, the ABCs of x86)I have a friend who is about to learn assembly language on x86. This means understanding how processor registers are used. So, I'm sharing some mnemonics for remembering the registers and their roles on x86. Maybe they will help you, too.<br />
<br />
A register, by the way, is just a memory location that is directly (and quickly!) accessible by the processor, usually located in the processor circuitry as opposed to being accessible at a memory address (although this depends on the hardware).<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjPpE4un6ViikMRyB1oaspWUhsLzh10108LXWF0oXHrwULnck9JWkc4Ruz4vOWpbo1s7hqkA0ezIUmYDlOTHLMJOK6Vh4xPnAVeKUbxydJVJCeDguiMoyzFu-ngKBeeQ8stfvi3QqyB-GE8/s1600/kids-toy-cash-register-pretend-supermarket-bruin-play-99223.jpeg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjPpE4un6ViikMRyB1oaspWUhsLzh10108LXWF0oXHrwULnck9JWkc4Ruz4vOWpbo1s7hqkA0ezIUmYDlOTHLMJOK6Vh4xPnAVeKUbxydJVJCeDguiMoyzFu-ngKBeeQ8stfvi3QqyB-GE8/s1600/kids-toy-cash-register-pretend-supermarket-bruin-play-99223.jpeg" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Oh. Okay, so not one of these. Got it.</td></tr>
</tbody></table>
<br />
<br />
I'll completely omit floating point (and MMX/SSE) stuff, but I'll briefly mention amd64 for the sake of example.<br />
<h2>
A, B, C, and D</h2>
<div>
There are four general purpose registers whose names are basically A, B, C, and D. On old 16-bit Intel processors, they were named AX, BX, CX, and DX. From each of these, you could also access the low eight bits (e.g. AL, as in "A low") and the high eight bits (e.g. AH, as in "A high"), and you still can.<br />
<br />
When 32-bit came along, they prefixed them all with an E (meaning "extended"), so now they are EAX, EBX, ECX, and EDX.<br />
<br />
And when AMD devised their derivative amd64 architecture, they prefixed them all with an R, presumably meaning "really-friggin'-extended". So, on amd64, they are named RAX, RBX, RCX, and RDX.</div>
<div>
<br /></div>
<div>
The "A" register has a few special roles, but most importantly it is used to hold the return value (i.e. the result) when functions return to their callers.</div>
<div>
<br /></div>
<div>
The "C" register sometimes has a special role, too, that I'll describe next.</div>
<div>
<br /></div>
<div>
So much for A, B, C, and D.</div>
<h2>
String</h2>
<div>
Several string instructions (e.g. cmps and movs, stos, lods) have implied operands:</div>
<div>
<ul>
<li>E<u><b>S</b></u>I = <b><u>S</u></b>ource</li>
<li>E<b><u>D</u></b>I = <b><u>D</u></b>estination</li>
</ul>
</div>
<div>
<ul>
<li>E<b><u>C</u></b>X = <b><u>C</u></b>ounter</li>
<li>EAX = Value to write (sometimes applicable, sometimes not)</li>
</ul>
</div>
<h2>
Stack</h2>
<div>
The stack starts at a high address in memory and as things are added to it, it grows down to lower addresses. The processor keeps track with a stack pointer, and then functions can further use a "base" pointer to point to the boundary between the caller's stack (and two other pieces of data), and their own local variables.<br />
<ul>
<li>E<b><u>SP</u></b> = <b><u>S</u></b>tack <b><u>P</u></b>ointer</li>
<li>E<b><u>BP</u></b> = <b><u>B</u></b>ase <b><u>P</u></b>ointer</li>
</ul>
</div>
<h2>
Instruction Pointer</h2>
<div>
So special that it's all alone under its own heading:<br />
<ul>
<li>E<b><u>IP</u></b> = <b><u>I</u></b>nstruction <b><u>P</u></b>ointer</li>
</ul>
This register points to the instruction that is about to be executed.</div>
<h2>
Extra Credit: Segment Registers</h2>
<div>
Remember in Windows 95 when blue screens used to report "CS:EIP = xxxxxxx"? Neither do I, let's pretend I didn't admit that. Anyway, that <b><u>CS</u></b> is a segment selector indicating the location of the <b><u>c</u></b>ode <b><u>s</u></b>egment. Modern operating systems use a flat model, so they're mostly set the same.</div>
<div>
<br />
<ul>
<li><b><u>CS</u></b> = <b><u>C</u></b>ode <b><u>S</u></b>egment</li>
<li><b><u>DS</u></b> = <b><u>D</u></b>ata <b><u>S</u></b>egment</li>
<li><b><u>ES</u></b> = "<b><u>E</u></b>xtra" <b><u>S</u></b>egment (meh)</li>
<li><b><u>SS</u></b> = <b><u>S</u></b>tack <b><u>S</u></b>egment</li>
<li><u><b>F</b>S</u> & <u><b>G</b>S</u> = Even more extra segments, using the letters that follow C, D, and E, namely <b><u>F</u></b> and <u style="font-weight: bold;">G</u> <b><u>S</u></b>egments</li>
</ul>
</div>
<div>
These registers are 16 bits long and contain integers called segment descriptors; the CPU reads the Global or the Local Descriptor Table (the GDT or the LDT) to find the base address and size of each memory segment indicated by those descriptors. Operating systems commonly use a "flat" model instead of a segmented model, so these may all contain the same value. Since segmentation is largely a non-issue now, the extra segment registers are sometimes used to point to interesting structures. For example: FS => Thread Environment Block (TEB) in userspace Windows 32-bit applications.</div>
<div>
<h2>
Summary</h2>
<div>
In review:</div>
<br />
<ul>
<li>E<b><u>A</u></b>X, E<b><u>B</u></b>X, E<b><u>C</u></b>X, E<b><u>D</u></b>X = <b><u>A</u></b>, <b><u>B</u></b>, <b><u>C</u></b>, <b><u>D</u></b>; Note that the 'A' register holds function return values</li>
<li>E<b><u>S</u></b>I, E<b><u>D</u></b>I = <b><u>S</u></b>ource, <b><u>D</u></b>estination (for string operations) - E<b><u>C</u></b>X may be the <b><u>c</u></b>ounter and EAX may used, too.</li>
<li>E<b><u>SP</u></b>, E<b><u>BP</u></b> = <b><u>S</u></b>tack <b><u>P</u></b>ointer, <b><u>B</u></b>ase <b><u>P</u></b>ointer</li>
<li>E<b><u>IP</u></b> = <b><u>I</u></b>nstruction <b><u>P</u></b>ointer</li>
<li><u><b>CS</b></u>, <u><b>DS</b></u>, <u><b>SS</b></u>, <u><b>ES</b></u>, <u><b>FS</b></u>, <u><b>GS</b></u> = <b><u>C</u></b>ode, <b><u>D</u></b>ata, <b><u>S</u></b>tack, and <b><u>E</u></b>xtra segments, followed by <b><u>F</u></b> and <u style="font-weight: bold;">G</u> <u><b>S</b></u>egments</li>
</ul>
<br />
For the authoritative reference on all of this, see the <a href="http://www.intel.com/products/processor/manuals/" target="_blank">Intel processor manuals</a>.<br />
<br />
Happy hackin' :)</div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0tag:blogger.com,1999:blog-5433555458120308127.post-37161058201053496852016-11-11T22:46:00.002-08:002017-04-23T17:53:24.425-07:00Word.I hate the mouse, so it makes sense that I would especially hate copying and pasting from the command line. I usually just need one word of screen output to go about my business and do what I need to do, but this requires activating the control box (Alt+Space), activating the <u>E</u>dit menu, activating the Mar<u>k</u> command and then dragging the cursor over the text of interest. Woe be to me if I screw up the text selection, because then I get to start all over again. And then there's when Windows doesn't respond consistently to the Alt+Space keystrokes I'm sending in order to activate the control box (yes, this happens; no, it's not human error).<br />
<div>
<br /></div>
<div>
I hate this so much that I wrote a pair of programs (line and word) to resolve it. Here are three examples of using them: getting all IP addresses, copying one of several paths to the clipboard, and isolating a filename in a paragraph of output.<style>
code {
font-family: Courier;
margin:.75em 0;
border:1px solid #596;
border-width:1px 1px;
padding:5px 15px;
display: block;
background-color: #dedede;
white-space: pre;
}
</style></div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDhkS_2tjVVTEVIR51T4Gs9xx3xjyHOxFiWCkZWy57ZARcg8Fyklaq5JnR_QhqKuD65kkhCeo3mNi2R4aFb6jc2TIWw8wS6aM8yjCWVPHuiac54clUVk6vB2mNtXSsVsvIAQsiJ0njlJbB/s1600/word.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="402" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDhkS_2tjVVTEVIR51T4Gs9xx3xjyHOxFiWCkZWy57ZARcg8Fyklaq5JnR_QhqKuD65kkhCeo3mNi2R4aFb6jc2TIWw8wS6aM8yjCWVPHuiac54clUVk6vB2mNtXSsVsvIAQsiJ0njlJbB/s640/word.png" width="640" /></a></div>
<div>
<br /></div>
<div>
If you invoke line or word without an argument, it will print the output with line numbers. An argument will isolate the lines or words you specify. Word accepts negative indices like Python. Line does not, because I didn't want to hog memory reading all of stdin into a buffer, though it wouldn't be hard to add, and it's not unreasonable since this is mostly for use with small buffers. Neither program accepts ranges, because I didn't have lots of time to spend.</div>
<div>
<br /></div>
<div>
You can get the code here: <a href="https://github.com/strictlymike/pipe-slice" target="_blank">https://github.com/strictlymike/pipe-slice</a></div>
<div>
<br /></div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com3tag:blogger.com,1999:blog-5433555458120308127.post-43918424980659756072016-11-05T12:24:00.000-07:002017-04-23T17:53:40.043-07:00Teachers<style>
code {
font-family: Courier;
margin:.75em 0;
border:1px solid #596;
border-width:1px 1px;
padding:5px 15px;
display: block;
background-color: #dedede;
white-space: pre;
}
</style><br />
<div>
<div>
We've moved into our new home, and I was putting away all my computing books. This time, I decided to put them in a deliberate order, starting with the books that I read as a kid and moving on through the system layers, my education, my work, and my dabbling. Each time I see these books, they retell the story of how my curiosity grew into a professional pursuit, and they remind me how happy I am that I was able to connect with such an interesting profession. They make me think of my teachers...</div>
<div>
<br /></div>
<div>
During the summers, I spent a lot of time at my grandmother's house. One summer, my dad brought an old Compaq portable that the army had retired, and he set it up in the play room in her basement. I recall that the monitor supported b&w, amber, and green. Naturally, I preferred green. I used this to run newps.exe (The New Print Shop - you can play it on <a href="https://archive.org/details/msdos_broderbund_print_shop" target="_blank">archive.org</a>).<br />
<br />
<div>
<iframe allowfullscreen="" frameborder="0" height="320" src="https://archive.org/details/msdos_broderbund_print_shop" width="570"></iframe>
</div>
<br />
<br />
Newps allowed me to create small monochrome images (probably 64x64 pixels) and store a few dozen of them to a "folder". I learned that if I scrolled through the images by holding the down arrow, they would display sequentially, which allowed me to build short animations. I used this to create my masterpiece: a hand-dithered animation of my cousin, winking. Wow.</div>
<div>
<br /></div>
<div>
My dad also sometimes took me to the office with him and let me play on the computer. I remember playing some sort of DOS game and accidentally winding up in the GW-BASIC interpreter. I must have broken the game, because I saw a clubs suite, little faces, greek letters, and a mess of other inscrutable data pour out onto the screen. My dad told me these were called "characters", and that programmers probably used some of those characters to write the program. This idea blew my mind - the concept that something visual, graphical, animated - was <i>written</i> by someone. After that, every time I came in contact with a computer, I was preoccupied with working out what my dad must have meant when he said that software could be "written".</div>
<div>
<br /></div>
<div>
My dad later brought home an NEC laptop with Windows 3.11, and I stared closely at the dithering that was used to produce a realistically colored eel using 256 colors. Thinking of that thick laptop with the tiny screen reminds me that my dad was the first person to tell me to RTFM. When I wanted to understand how to play Rodents' Revenge, draw with MS Paint, or use a DOS command, he told me to read the help file.</div>
<div>
<br /></div>
<div>
<iframe allowfullscreen="" frameborder="0" height="320" src="https://www.youtube.com/embed/-r6CnPzTXKE" width="570"></iframe>
</div>
<div>
<br /></div>
<div>
After a while, CDs became more and more common, and the games I wanted to play were not available on floppy disks. I begged my dad for a CD-ROM drive until, one day, he walked in and threw one on my bed. When I asked him to install it, he told me to do it myself, and when I asked him how, he said "read the instructions." As a 13-year-old, I felt helpless and sorry for myself for about a split second. That was the day I learned about config.sys, emm386.exe, autoexec.bat, and mscdex.exe.</div>
<div>
<br /></div>
<div>
And then, RTFM I did. I somehow wound up with a copy of Dan Gookin's DOS for Dummies, which introduced me to EDIT.COM, boot disks, AUTOEXEC.BAT, and CONFIG.SYS. That summer, I became addicted to reading HELP.EXE while listening to techno all night. I unnecessarily used RAM drives to "accelerate" my file access. I used the VSAFE.COM TSR to protect myself from viruses (uh, right). I hid files from my family by placing them inside of compressed volume files (*.CVF) and applying file attributes to hide them. I had nothing worth hiding, but I just wanted to be able to hide stuff. Out of ignorance or unavailability of BIOS features, I wrote a batch file that used CHOICE.COM to password protect my computer and prevent my dad from taking up space with Internet Explorer 3.0 when I was a diehard Netscape Navigator user. And of course, I used filemgr.exe to delete Net Nanny. (<a href="https://www.netnanny.com/" target="_blank">Holy crap, are they still in business?!</a>)</div>
<div>
<br /></div>
<div>
Then when I was in high school, my dad gave me a computer that went with me to my mom's house and had Windows 3.11 on it. My friend asked if it had a modem and I said yeah, but no AOL. He decided to come over, and the minute he saw it, he connected it to the phone line, popped up Terminal, and dialed into the DEC VAX at the University of Wisconsin - Milwaukee (<a href="http://www.uwm.edu/" target="_blank">UWM</a>). From there, he ran telnet and connected to jgsdos.brooktrout.com, port 5000, and it was then that I learned about Zolstead's MUD (Multi-User Dungeon). The fact that a buddy could just walk up to a computer he had never seen before and connect over a phone line to a computer and then to another computer from there, once again blew my mind.</div>
<div>
<br /></div>
<div>
After that, it mushroomed. I found QBASIC.EXE and learned to write software. Awful software! But software, nonetheless. I started reading about C and my dad got me books about that. My friend introduced me to his buddy who, at the age of 14 or 15, seemed to already know everything about computers. While evading my girlfriend's parents, I hid at the public library and checked out an <a href="https://www.amazon.com/Interactive-Course-Fast-Mastery/dp/1571690638" target="_blank">800-page beginner's book on C++</a>. I read the entire book before it was due back at the library and took copious notes. I was so intent on learning C++ that I wanted to buy a compiler, but Visual Studio was way too expensive, so I bought Boreland C++ 3.1, for $30.</div>
<div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://winworldpc.com/res/img/screenshots/30-34fa6e1775d9b8c9e0217326fd93582c-Borland%20CPP%203.1%20-%20DOS%20IDE.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="218" src="https://winworldpc.com/res/img/screenshots/30-34fa6e1775d9b8c9e0217326fd93582c-Borland%20CPP%203.1%20-%20DOS%20IDE.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">High tech.</td></tr>
</tbody></table>
After that, I started looking around for where I could get more of this. I audited classes at UWM and attended two semesters of C++ courses at MSOE. When I realized that this was what college could be like, I just barely pulled out of my academic nose dive and persuaded the MSOE admissions staff that I was worth making an exception for.</div>
<div>
<br /></div>
<div>
I fondly remember many dozens of teachers and moments on the journey. My grandmother, who did me the incredible favor of teaching me to type. It was one of those boring basics that made everything else easier. John Carmack, who innovated video games and made me wonder about the connection between mathematics and 3D graphics. My cousin, who patiently explained, multiple times, the difference between RAM, CPU, hard disks, and floppy disks; let me use his AOL account; introduced me to GIFs; and repeatedly reminded me how to get my computer to run Gorillas.<br />
<br />
<iframe allowfullscreen="" frameborder="0" height="320" src="https://www.youtube.com/embed/UDc3ZEKl-Wc" width="570"></iframe>
<br />
<br />
Then there were the teachers of my professional career. Professors, University staff, and in later years, other students. My boss at my first job. The friend who saw me reading about shellcode and Linux drivers and decided to pull me into a community of hackers. The people I worked with there. And as a consultant. And as a reverse engineer.</div>
<div>
<br /></div>
<div>
Now, I see almost everyone I meet or know or even read about as a teacher. It's what we all are, because we're doing what we love and we want to connect other people with something we feel is very powerful. These books remind me of all the teachers I ever had. The MS DOS 5.0 user manual, Patterson & Hennesy's Computer Organization & Design, The Shellcoder's Handbook, Mastering Algorithms with C, Rootkits... My dad, my friends, my colleagues, and my co-workers.</div>
<div>
<br /></div>
<div>
To this day, I am trying to figure out what the sequence of events has to be to elicit and maintain a learner's interest in some discipline of study. The learners I am interested are young, old, and in-between. Here are some of my thoughts on how to engage them:</div>
<div>
<ul>
<li>Get the learner past the "boring basics" that make everything else available, possible, and achievable.</li>
<li>Create opportunities for "magic moments" by exposing the learner to as many facets of our world as are available to you; when you see a "magic moment" take place, figure out how to encourage and help create more of them - gently though, because the moment you find yourself leading the horse to water, you will be disappointed to find that it is no longer thirsty.</li>
<li>Never let yourself get in the way of a learning opportunity. When you are asked to facilitate an obsession, provide the materials and let the learner do the work.</li>
<li>When we are willing to invest precious resources (time, education, or equipment), we show that we take the learner's studies seriously, and the learner has an opportunity to take themselves and their studies more seriously as well, which can lead to more focus and more courage.</li>
<li>Always be open to explaining how things work.</li>
</ul>
</div>
<div>
Maybe if I'm lucky, I will think of more things to add to this list.</div>
</div>
<div>
<br /></div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0tag:blogger.com,1999:blog-5433555458120308127.post-23455701710547182482016-10-23T11:19:00.000-07:002018-01-14T12:56:24.912-08:00The case for reinventing the wheelI really like to use IDA Pro as my debugger, and shellcode is no exception. Initially I couldn't see why anyone would ever write their own loader for analyzing shellcode. Siko et al released shellcode_launcher.exe along with the <a href="https://practicalmalwareanalysis.com/labs/" target="_blank">Practical Malware Analysis labs</a>, so why rewrite that code? shellcode_launcher.exe does the work of ReadFile / VirtualAlloc / VirtualProtect, et cetera, so I just make that my database and pull in the VirtualAlloc'd memory using IDA Pro's memory snapshot facilities. Then, I go to town.<br />
<br />
Well, I changed my tune when I discovered that VirtualAlloc was not receptive to my suggestions for where to allocate memory. (WinDbg: bp <callsite>; g; ed esp <lpAddress>; p). Without a consistent shellcode base address, none of my annotations from the IDA memory snapshot I took were lining up with the actual shellcode in subsequent debug sessions.<br />
<br />
Edit January 14th, 2018: At this point, we have a choose-your-own-adventure on our hands:<br />
<br />
<ul>
<li>If you use remote debugging, and/or you like to see IDA Pro annotations superimposed over your debugger session, and your shellcode itself allocates additional memory and executes code there, then you might be better off reading my fireeye.com blog article titled <a href="https://www.fireeye.com/blog/threat-research/2018/01/debugging-complex-malware-that-executes-code-on-the-heap.html" target="_blank">Debugging Complex Malware that Executes Code on the Heap</a>.</li>
<li>If you don't use remote debugging, then you might be satisfied capturing snapshots of your debugging VM at critical points in the debug session so you can iteratively debug and understand the shellcode.</li>
<li>Finally, if your shellcode does not execute additional code on the heap and you just want to give it a uniform memory map in which to iteratively debug it, then read on...</li>
</ul>
<div>
For simple cases, you can reinvent the wheel and write your own shellcode loader to force shellcode to live at the same virtual address each time you debug it. But no need to start from scratch; here's the path of least resistance...</div>
<br />
Assuming you have the shellcode as a raw binary, use xxd's feature of outputting shellcode as a C include file:<br />
<div>
<br /></div>
<div>
<pre class="cmd">xxd -i myshellcode > myshellcode.c</pre>
</div>
<div>
<br />
That gives you a hexdump in C form:<br />
<br />
<pre class="vimCodeElement"><span class="Type">unsigned</span> <span class="Type">char</span> myshellcode[] = {
<span class="Constant">0x55</span>, <span class="Constant">0x8b</span>, <span class="Constant">0xec</span>, <span class="Constant">0x01</span>, <span class="Constant">0x23</span>, <span class="Constant">0x45</span>, <span class="Constant">0x67</span>, <span class="Constant">0x89</span>, <span class="Constant">0xab</span>, <span class="Constant">0xcd</span>, <span class="Constant">0xef</span>, <span class="Constant">0x01</span>,
<span class="Constant">0x01</span>, <span class="Constant">0x23</span>, <span class="Constant">0x45</span>, <span class="Constant">0x67</span>, <span class="Constant">0x89</span>, <span class="Constant">0xab</span>, <span class="Constant">0xcd</span>, <span class="Constant">0xef</span>, <span class="Constant">0x01</span>, <span class="Constant">0x23</span>, <span class="Constant">0x45</span>, <span class="Constant">0x67</span>,
...
<span class="Constant">0x01</span>, <span class="Constant">0x23</span>, <span class="Constant">0x45</span>, <span class="Constant">0x67</span>, <span class="Constant">0x89</span>, <span class="Constant">0xab</span>, <span class="Constant">0xcd</span>, <span class="Constant">0xef</span>, <span class="Constant">0x01</span>, <span class="Constant">0x23</span>, <span class="Constant">0x45</span>, <span class="Constant">0x67</span>,
};
<span class="Type">unsigned</span> <span class="Type">int</span> myshellcode_len = <span class="Constant">4242</span>;
</pre>
<br />
So now, write your loader:<br />
<br />
<pre class="vimCodeElement"><span class="PreProc">#include </span><span class="Constant">"myshellcode.c"</span>
<span class="Type">typedef</span> <span class="Type">void</span> (*fptr)(<span class="Type">void</span>);
<span class="Type">int</span>
main(<span class="Type">void</span>)
{
fptr sc = (fptr) myshellcode;
__asm <span class="Type">int</span> <span class="Constant">3</span> ; Safety - so I don't execute this on my analysis box (or worse!)
sc();
}
</pre>
<br />
Then do this (in an SDK prompt):<br />
<br />
<pre class="cmd">cl.exe loader.c</pre>
<br />
If you don't have Visual Studio, just get <a href="https://www.microsoft.com/en-us/download/details.aspx?id=44266" target="_blank">Microsoft's compiler for Python 2.7</a>.<br />
<br />
After compiling and linking, you'll get this:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgiDLwECUxlhVZ5vPXnuhTMYCOzylJDeWQebEL3IE5GBYZpncRLppgioXY4jbCcQaXcnhE9itTEw2CvrlN7NeMNHk5vht1uZx0YTZYDWbPzyt_StkYgnQ6urLDvqj9C0I3xti7_QtCZSGt0/s1600/loader.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="243" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgiDLwECUxlhVZ5vPXnuhTMYCOzylJDeWQebEL3IE5GBYZpncRLppgioXY4jbCcQaXcnhE9itTEw2CvrlN7NeMNHk5vht1uZx0YTZYDWbPzyt_StkYgnQ6urLDvqj9C0I3xti7_QtCZSGt0/s640/loader.png" width="640" /></a></div>
<br />
I was worried about execute permissions when calling into my shellcode, but happily, it Just Works, perhaps because I ran cl.exe directly without using Visual Studio to specify its usual flags. The program loads in IDA as a PE-COFF, it can be debugged using IDA's debugging plugins, and the shellcode is always at the same address (in my case, unk_40A000). Therefore, you can annotate the shellcode without using the IDA Pro memory snapshot facilities to save it from a debugger session, and (this is the important part) without worrying that VirtualAlloc will return a different address during the next debug session, rendering your annotations less useful to you. The same applies to breakpoints: they will actually work, from session to session. That makes life easier.</div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0tag:blogger.com,1999:blog-5433555458120308127.post-77023891495478887182016-10-20T06:16:00.001-07:002017-04-23T18:24:46.968-07:00This one weird trick for decoding DLL malware stringsTL;DR: <a href="https://www.fireeye.com/blog/threat-research/2015/11/flare_ida_pro_script.html" target="_blank">argtracker</a> and ctypes. It's the ctypes part that surprised me. Read on to see why.<br />
<br />
This procedure can make light work of decoding strings in a DLL that has a horrifying string decoder or contains a metric ton of strings. The first stage leans on code that's already out there, with a bit of duct tape to get to the second stage; the second stage is to load your malware and call into it. There's just one stick-in-the-mud limitation: it has to be a file you can load into your address space using LoadLibrary, such as a DLL. Otherwise, you have to use a different kind of tool (I'll discuss this later).<br />
<div>
<br /></div>
<div>
First of all, gather all the strings you want to decode. Jay Smith wrote a very cool tool for this that uses Vivisect to emulate code and locate arguments. It's called <a href="https://www.fireeye.com/blog/threat-research/2015/11/flare_ida_pro_script.html" target="_blank">argtracker</a>. Don't duplicate it like I was starting to do with idaapi. Please, for the love of all that is lazy, just <a href="https://github.com/fireeye/flare-ida" target="_blank">download it</a> and get it installed.</div>
<div>
<br /></div>
<div>
The IDA Python script below is basically the code from the FireEye blog with a second function added to print all the encoded strings out so you can feed them to the second stage of this procedure. If your strings aren't printable prior to decoding, then you'll need to change this up a bit.</div>
<div>
<br /></div>
<pre class="vimCodeElement"><span class="PreProc">import</span> vivisect
<span class="PreProc">import</span> flare.argtracker <span class="Statement">as</span> c_argtracker
<span class="PreProc">import</span> flare.jayutils <span class="Statement">as</span> c_jayutils
<span class="Comment"># Obtain the address where each argument is referenced by the decoder along</span>
<span class="Comment"># with the offset that was referenced</span>
<span class="Statement">def</span> <span class="Identifier">get_first_push_arg</span>(decoder):
ret = []
vw = c_jayutils.loadWorkspace(c_jayutils.getInputFilePath())
tracker = c_argtracker.ArgTracker(vw)
xrefs = idautils.CodeRefsTo(decoder, <span class="Constant">1</span>)
<span class="Statement">for</span> xref <span class="Statement">in</span> xrefs:
argslist = tracker.getPushArgs(xref, <span class="Constant">1</span>)
<span class="Statement">for</span> argdict <span class="Statement">in</span> argslist:
va_at, offset = argdict[<span class="Constant">1</span>]
ret.append(argdict[<span class="Constant">1</span>])
<span class="Statement">return</span> ret
<span class="Comment"># Now go get each string</span>
<span class="Statement">def</span> <span class="Identifier">print_va_off_and_contents</span>(pushed_args):
<span class="Identifier">print</span>(<span class="Constant">'refva, off, argcontents'</span>)
<span class="Statement">for</span> (va_at, offset) <span class="Statement">in</span> pushed_args:
<span class="Identifier">print</span>(<span class="Identifier">hex</span>(va_at) + <span class="Constant">', '</span> + <span class="Identifier">hex</span>(offset) + <span class="Constant">', '</span> + GetString(offset, -<span class="Constant">1</span>, <span class="Constant">0</span>))
<span class="Comment"># <a href="https://www.hex-rays.com/products/ida/support/idadoc/283.shtml">https://www.hex-rays.com/products/ida/support/idadoc/283.shtml</a></span>
<span class="Comment"># 0 <= ASCSTR_C</span>
<span class="Comment"># 3 <= ASCSTR_UNICODE</span>
</pre>
<div>
<br /></div>
<div>
Provide your decoder's virtual address to get_first_push_arg, and then supply the returned list to print_va_off_and_contents to get something you can massage into shape for the second stage. Yes, I know, I'm using print instead of Python's <a href="https://docs.python.org/2/library/logging.html" target="_blank">logging module</a>. The title of this blog was actually going to have the word "lazy" in it. Maybe it still should. Anyway...<br />
<br />
Second and final step: load the malware and call its decoder. The interesting thing I learned is that Python ctypes <a href="https://mail.python.org/pipermail/python-list/2010-January/563313.html" target="_blank">can call non-exported functions</a>. What a happy surprise! First, you have to define a function prototype, then you obtain a callable by hooking that prototype to an address in your binary where the function lives. There are prototypes for stdcall (WINFUNCTYPE) and cdecl (CFUNCTYPE). We're using stdcall. Here's a convenient snippet along with the string decoding goodness.</div>
<div>
<br />
<pre class="vimCodeElement"><span class="PreProc">from</span> ctypes <span class="PreProc">import</span> *
<span class="Comment"># Modify all this</span>
offset = <span class="Constant">0x4321</span> <span class="Comment"># Decoder offset in your mal DLL</span>
strings = [ <span class="Comment"># Populate from stage 1 (above)</span>
[<span class="Constant">0x10001234</span>, <span class="Constant">"ABCdef"</span>],
[<span class="Constant">0x10005678</span>, <span class="Constant">"ZYX990"</span>],
...
]
dll = cdll.my_malware_dll <span class="Comment"># Modify to load your DLL</span>
prototype = WINFUNCTYPE(c_char_p, c_char_p) <span class="Comment"># Stdcall, accepts & returns char*</span>
<span class="Comment"># Leave this alone</span>
string_decoder_addr = dll._handle + offset
decode = prototype(string_decoder_addr);
<span class="Statement">for</span> (va, s) <span class="Statement">in</span> strings:
<span class="Identifier">print</span>(<span class="Identifier">hex</span>(va) + <span class="Constant">' '</span> + s + <span class="Constant">' -> '</span> + decode(s))
</pre>
<br />
Simple, dimple. Paste the strings from IDA Pro into this script, ctypes loads and calls into the malware, and Bob's your uncle. For extra credit, you can update this script to emit another script that will create the appropriate comments or bookmarks in IDA Pro. This ctypes procedure works great for DLLs. Unfortunately, next time, it'll probably be an EXE and not a DLL. For those cases, you'll have to adapt this to a different tool, such as <a href="https://www.fireeye.com/blog/threat-research/2015/12/flare_script_series.html" target="_blank">flare-dbg</a>, to control malware execution and feed it the strings you want to decode. I'll talk more about tools and techniques for this another time.</div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0tag:blogger.com,1999:blog-5433555458120308127.post-73641429697619672972016-10-13T21:40:00.001-07:002017-04-23T18:23:14.578-07:00Script Kitties Early Trick or Treat, Part 2<style>
code {
font-family: Courier;
margin:.75em 0;
border:1px solid #596;
border-width:1px 1px;
padding:5px 15px;
display: block;
background-color: #dedede;
white-space: pre;
}
</style><br />
<div>
I promised a treat. Well, as scripts go, this will probably be like the time you went trick-or-treating as a kid and the old couple gave you three pennies and then you walked down the street and realized the pennies seemed to smell bad, but hey, it's money you didn't have before, so what the hay. It's not quite that bad, it's just I wrote it in 2006 and I didn't do much to bring it into modern times. But, here we go...</div>
<div>
<br /></div>
<div>
In 2006, PowerShell was just about to be released and around the same time I was thinking, darn it, wouldn't it be easier to experiment with VBScript if they had given me a command line? So I made one.</div>
<div>
<br /></div>
<div>
As it turns out, some malware is written in VBScript, so this came in handy a while back for me to decode a few lazily "encoded" strings that were assembled using the VBScript Chr() function and string concatenation. It let me figure out what COM objects were being created and move on with my life, so maybe it'll be useful to you.<br />
<br />
I also added the ability to switch to JScript, because people also write malware in JScript, so hell, why not.<br />
<br />
Here's a little demo:</div>
<div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj_zwgWGa8vvVGlhOfh5vGpHDYTlwE15lFmUeW3dvSu7kvFOm2uqxHqFtX4JBdbXBd4WMbqv_f3AZfPdSMkPituAoNT-rhhakvUn_3Z2MhMuxk1S6e7TVqmlttm8N1gc5OvOvbOqwbcV8U8/s1600/eval.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="302" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj_zwgWGa8vvVGlhOfh5vGpHDYTlwE15lFmUeW3dvSu7kvFOm2uqxHqFtX4JBdbXBd4WMbqv_f3AZfPdSMkPituAoNT-rhhakvUn_3Z2MhMuxk1S6e7TVqmlttm8N1gc5OvOvbOqwbcV8U8/s640/eval.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">You're wishing I just gave you the pennies now, aren't you.</td></tr>
</tbody></table>
<br />
Yeah, that's it. If you look at the code, you'll find out why it stinks just like those pennies. But it serves its purpose. So, enjoy!<br />
<br />
Here's the code: <a href="https://github.com/strictlymike/eval-hta" target="_blank">https://github.com/strictlymike/eval-hta</a><br />
<div class="separator" style="clear: both; text-align: center;">
</div>
</div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0tag:blogger.com,1999:blog-5433555458120308127.post-9078360028640984372016-09-14T06:57:00.002-07:002017-04-23T18:22:24.400-07:00Script Kitties Early Trick or Treat, Part 1Some of my old sysadmin tricks became useful again when I analyzed some malware targeting Windows Scripting Host (WSH). In this article I'll share a trick, and in the next, I'll share a treat.<style>
code {
font-family: Courier;
margin:.75em 0;
border:1px solid #596;
border-width:1px 1px;
padding:5px 15px;
display: block;
background-color: #dedede;
white-space: pre;
}
</style><br />
<div>
<br /></div>
<div>
When logic gets hairy, both developers and malware analysts open a debugger to get more information. But what can be done when the target platform is WSH? As it happens, there are debuggers for this, too, and they can be had by installing either Microsoft Office or Microsoft Visual Studio in your dynamic analysis VM. To invoke the debugger, use the /X switch of either cscript.exe or wscript.exe, e.g.:</div>
<div>
<br /></div>
<pre class="cmd">wscript.exe /X rat3ie.vbs
</pre>
<div>
<br /></div>
<div>
Here's the Visual Studio debugger, halting on line 1 of a craptacular VBScript RAT:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxQtEFtBUqnO8SWgjsUjYfOKgSzzpptFn1zkomYerwWN3zTC4HySuuu1H41POC6g62eOw6GrWlMsiIqgBajqomu3EiEaQENJA5q6LDL5ng4405HYRflaMgOQL2fuOqqz5NBIppWchdk41F/s1600/wsh1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="337" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxQtEFtBUqnO8SWgjsUjYfOKgSzzpptFn1zkomYerwWN3zTC4HySuuu1H41POC6g62eOw6GrWlMsiIqgBajqomu3EiEaQENJA5q6LDL5ng4405HYRflaMgOQL2fuOqqz5NBIppWchdk41F/s640/wsh1.png" width="640" /></a></div>
<br />
This gives the ability to view local variables in the Locals tab (at bottom), set breakpoints, and step through code.<br />
<br />
That's all for this little nugget. Next time, I'll post a tool I wrote in 2006 that came in handy for conveniently and interactively evaluating VBScript and JScript to de-obfuscate strings and experiment with malware functionality.</div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0tag:blogger.com,1999:blog-5433555458120308127.post-61245485445973116832016-09-06T14:18:00.003-07:002017-04-23T18:19:53.127-07:00"Advanced" OllyDbg Scripting<div>
Alternative possibilities:</div>
<div>
<ul>
<li>I'm daft;</li>
<li>OllyDbg's "Warn when breakpoint is outside the code section" option can't (always?) be truly disabled in odbg110; or,</li>
<li>This is not the droid (i.e. option) that I'm looking for.</li>
</ul>
<div>
In any case:<br />
<br /></div>
</div>
<pre class="vimCodeElement"><span class="Statement">Set</span> sh <span class="Statement">=</span> <span class="Identifier">CreateObject</span><span class="Statement">(</span><span class="Constant">"WScript.Shell"</span><span class="Statement">)</span>
<span class="Statement">While</span> <span class="Constant">True</span>
<span class="Statement">Call</span> sh<span class="Statement">.</span><span class="Statement">SendKeys</span><span class="Statement">(</span><span class="Constant">"%Y"</span><span class="Statement">)</span>
<span class="Statement">Call</span> WScript<span class="Statement">.</span>Sleep<span class="Statement">(</span><span class="Constant">100</span><span class="Statement">)</span>
<span class="Statement">Wend</span>
</pre>
<div>
<br /></div>
<div>
And goodbye to this dialog when attempting to find the OEP by tracing into:</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmBPEWmMEg2ZtkJWyf4iauv-WRxgBFX9x_wLExbZXiw9wMNRrSvQcQsqIEhK1BB2pDCskeJyL3NLW5xHx5cZXsQcs3oni-Px_aNvfuVjX0pbYgNdEHfV84njYI1YCZOXdAPTejhHZXSWto/s1600/olly.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="157" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmBPEWmMEg2ZtkJWyf4iauv-WRxgBFX9x_wLExbZXiw9wMNRrSvQcQsqIEhK1BB2pDCskeJyL3NLW5xHx5cZXsQcs3oni-Px_aNvfuVjX0pbYgNdEHfV84njYI1YCZOXdAPTejhHZXSWto/s400/olly.png" width="400" /></a></div>
<div>
<br /></div>
<div>
Next episode, we answer the question: did OllyDump ever finish? ;-)<br />
<br />
Edit 10/14/2016: It never finished, so I ended up doing it manually by catching the unpacker in a memcpy and dumping its payload from poi(esp+4). You live, you learn.</div>
<style>
code {
font-family: Courier;
margin:.75em 0;
border:1px solid #596;
border-width:1px 1px;
padding:5px 15px;
display: block;
background-color: #dedede;
white-space: pre;
}
</style>Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0tag:blogger.com,1999:blog-5433555458120308127.post-15852501695595440162016-08-29T15:10:00.001-07:002017-04-23T18:19:21.140-07:00Process Monitoring for the Curious and ParanoidIt's been months since I had time for any of this, but I've been thinking for a long time about what I would discover if I were to monitor process creation with some sort of balloon notification. Between coming to bed late one night, some scraps of time here and there to document it, and a day home to polish it off while my sick daughter naps, here's a useful tool. I want to emphasize, it's hacky, but for a busy father's casual/opportunistic research, it's enough to play jazz.<br />
<h2>
Objective</h2>
I would like to expediently answer a few questions, including:<br />
<div>
<ul>
<li>Is a new process the reason why my mouse pointer changed to the wait icon?</li>
<li>Was a new process responsible for my computer slowing down?</li>
<li>How often do new processes start, anyway?</li>
<li>What are some commonly executed processes that I haven't noticed yet?</li>
<li>Does this process run any sub-processes?</li>
<li>Is there any process associated with that pop-up, or is it an already-running process?</li>
</ul>
<div>
Poring over my event logs is the wrong answer because eventvwr is slow to pop up and navigate, so when I am experiencing slowness, it doesn't allow me to get up-to-the-moment answers. Also, it can be tedious and time consuming to go back and find the right event, and my boss doesn't pay me to stare at event logs. And then how do I know that this event occurred at the same time as the phenomenon I'm observing?</div>
<div>
<br /></div>
<div>
What I want is a way to casually take note of interesting process creation events throughout the day without really spending time on it.</div>
<div>
<h2>
Alternative Solutions</h2>
I've had a few options rolling around in my head for a while:<br />
<ul>
<li><a href="https://msdn.microsoft.com/en-us/library/aa394649(v=vs.85).aspx" target="_blank">Instance creation event query</a> on <span style="font-family: inherit;">Win32_Process</span> creation - Around 2005 I experimented with this and found that it cannot catch short-lived processes because they are created and destroyed between polling intervals which must last, at minimum, one second.</li>
<li><span style="font-family: inherit;">Win32_ProcessTrace</span> - I started out with this, but alas, they do not contain full image name information, so I needed to query the OS for further information, and again, short-lived processes result in information loss.</li>
<li>Monitoring event logs - Event ID 4688 provides image names, but advanced configuration is required to obtain full command lines. Alternatively, <a href="https://technet.microsoft.com/en-us/sysinternals/sysmon" target="_blank">SysInternals' Sysmon</a> logs this information by default. WMI or other methods could be used to notify on event creation.</li>
<li>The Windows kernel exports <a href="https://msdn.microsoft.com/en-us/library/windows/hardware/ff542860(v=vs.85).aspx" target="_blank">PsSetCreateProcessNotifyRoutineEx</a>, which provides access to a convenient <a href="https://msdn.microsoft.com/en-us/library/windows/hardware/ff559960(v=vs.85).aspx" target="_blank">PS_CREATE_INFO</a> structure containing the full image name and command line. Alas, this requires either purchasing and protecting an expensive driver signing certificate, or leaving a kernel code execution vulnerability unpatched so as to inject a driver as described in the <a href="https://www.fireeye.com/blog/threat-research/2016/03/lessons-from-operation-russian-doll.html" target="_blank">whitepaper I published in February</a>.</li>
</ul>
</div>
<div>
<h2>
Implementation</h2>
Unfortunately, mucking with drivers is not lazy enough for me. Since short-lived processes are important (they are commonly used as part of post-exploitation / recon), a Win32_Process instance creation query won't work. For ease of use, I've created a first draft solution by Frankensteining two C# StackOverflow answers together to use systray balloon notifications with WMI's Win32_ProcessTrace. I put this on the Internet so I could compile it and use it to see what was going on with my work computer.<br />
<br />
<a href="https://gist.github.com/strictlymike/46d717a929e38460b5774476878db125" target="_blank">Here's the gist of it</a><br />
<br />
It's lazy, but for casual/opportunistic research, it's enough to play jazz. It doesn't capture command-line arguments and doesn't always capture the full image name, because it just uses Win32_ProcessTrace and then the .NET System.Diagnostics classes to get process information after the fact.<br />
<br />
Alas, it bothers me not to have full image names or command-line arguments. The best source of information I know of in userspace is event logs, but I had trouble getting the info I needed on advanced logging configuration for my Windows 8.1 box, I just installed Sysmon. Now what?<br />
<br />
<a href="https://gist.github.com/strictlymike/b0ce3ea54686da4fb10f14fc1adf30a2" target="_blank">Another gist</a><br />
<br />
As it turns out, it is necessary to <a href="http://stackoverflow.com/questions/2382896/how-to-collect-the-new-applications-and-services-logs-found-on-windows-7-or-wi" target="_blank">modify the registry</a> and restart the Windows Management Instrumentation service (and its dependent services) to make this work. I added a Microsoft-Windows-Sysmon/Operational key to HKLM\SYSTEM\CurrentControlSet\Services\EventLog and restarted the winmgmt service, and it all came together.<br />
<br />
This gist has a detailed console view along with the systray notification to prevent me from having to necessarily open eventvwr to see more details. Here's how it looks:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvWkBOTC30r7M4aei1N3rtT74iRySxK3auXTwvXlghVTOWk7i6wd0zdszdHUAxSJH_QYwdcJn8P99lD6ZI_7eNYHQxdyuDd9F7yk08uTJuhzYgS9GI0KHuahDsCxKkI1UOu8OkrOFys4jb/s1600/ptray.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvWkBOTC30r7M4aei1N3rtT74iRySxK3auXTwvXlghVTOWk7i6wd0zdszdHUAxSJH_QYwdcJn8P99lD6ZI_7eNYHQxdyuDd9F7yk08uTJuhzYgS9GI0KHuahDsCxKkI1UOu8OkrOFys4jb/s400/ptray.png" width="292" /></a></div>
<br />
<h2>
Observations</h2>
Here are a few startling events and associated discoveries sure to send a chill down your spine, all from tracking down process activity during my journey:</div>
<div>
<ul>
<li>netsh.exe just ran. Is this some post-exploitation alteration of my firewall rules? No, a certain VPN client executes netsh.exe to get its work done. This was not the only software I caught doing that.</li>
<li>Windows Remote Assistance COM Server (raserver.exe) executed and terminated immediately. What interfaces does this provide? Could this be post-exploitation enabling of remote assistance for future access? No. It's a <a href="https://support.microsoft.com/en-us/kb/939039" target="_blank">scheduled task that triggers upon group policy updates</a> so remote assistance knows to update its configuration.</li>
<li>reg.exe just got run by cmd.exe. Holy schnikes, now I'm truly pwned. Is the parent process a backdoor executing persistence or other post-exploitation commands? Nope. It's just some endpoint management software that IT confirms they deploy and manage.</li>
<li>Added 9/7/16: Heart rate increases as I read C:\Windows\system32\rundll32.exe C:\Windows\system32\inetcpl.cpl,ClearMyTracksByProcess Flags:525568 WinX:0 WinY:0 IEFrame:0000000000000000. Then I remember I just closed an IE in-private window. Take a look at the parent process command line and see: "C:\Program Files\Internet Explorer\IEXPLORE.EXE" -private. WHEW!</li>
</ul>
<div>
<h2>
Value</h2>
So, as you can see, sometimes situational awareness is not all it's cracked up to be! As most DFIR people and hunters are aware, there is plenty of noise just sitting there waiting to alarm you.<br />
<br />
Even so, I think this tool could be useful for noticing anomalous process observables such as:<br />
<br />
<ul>
<li>Post-exploitation commands a la <a href="https://www.amazon.com/Rtfm-Red-Team-Field-Manual/dp/1494295504" target="_blank">RTFM</a>, e.g. whoami, net.exe, netsh.exe, and so on</li>
<li>svchost.exe executing from within your user profile</li>
<li><a href="https://blog.fortinet.com/2016/02/17/a-closer-look-at-locky-ransomware-2" target="_blank">Ransomware deleting shadow copies using vssadmin.exe</a></li>
</ul>
<div>
This tool can increase your awareness of what applications are responsible for certain behaviors, such as the Get Windows 10 user prompts that everybody loved so much. It can also raise your awareness of cases where security policies do not appear to be doing their job, such as application whitelisting. If you're a paranoid or curious power user, this may all be useful to you. In case it is, here are those gists again:</div>
</div>
</div>
</div>
<div>
<ul>
<li><a href="https://gist.github.com/strictlymike/46d717a929e38460b5774476878db125" target="_blank">ptray 1: Win32_ProcessTrace with augmented info from .NET FCL</a></li>
<li><a href="https://gist.github.com/strictlymike/b0ce3ea54686da4fb10f14fc1adf30a2" target="_blank">ptray 2: Win32_NTLogEvent instance creation against SysInternals Sysmon (requires registry change)</a></li>
</ul>
</div>
<div>
If I polish this up into something nicer, I'll try to update this article with the link.</div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0tag:blogger.com,1999:blog-5433555458120308127.post-51947031506433207212016-03-30T02:52:00.005-07:002017-04-23T18:18:19.767-07:00TIL: Accessing memory in another process under LinuxToday it was hit home for me that I am now a "Windows guy", because I couldn't remember the name for the select or epoll syscalls, only muttering "WaitForMultipleObjects?" and scratching my head. This was hit further home because I couldn't think of anything other than ptrace for accessing another process's data. Granted, my friend says I've always been a Windows guy and I should get over it. But I really only started learning about how computers work when I began working with Linux, so this bothered me. Hence, I took a little walk down syscalls.h in 3.7.1 to see what would jog my memory or what new things I would find. Indeed, I did find something interesting and relevant.<br />
<br />
include/linux/syscalls.h:<br />
<div>
<pre class="vimCodeElement"><span class="LineNr" id="L856">856 </span>asmlinkage <span class="Type">long</span> sys_process_vm_readv(pid_t pid,
<span class="LineNr" id="L857">857 </span> <span class="Type">const</span> <span class="Type">struct</span> iovec __user *lvec,
<span class="LineNr" id="L858">858 </span> <span class="Type">unsigned</span> <span class="Type">long</span> liovcnt,
<span class="LineNr" id="L859">859 </span> <span class="Type">const</span> <span class="Type">struct</span> iovec __user *rvec,
<span class="LineNr" id="L860">860 </span> <span class="Type">unsigned</span> <span class="Type">long</span> riovcnt,
<span class="LineNr" id="L861">861 </span> <span class="Type">unsigned</span> <span class="Type">long</span> flags);
<span class="LineNr" id="L862">862 </span>asmlinkage <span class="Type">long</span> sys_process_vm_writev(pid_t pid,
<span class="LineNr" id="L863">863 </span> <span class="Type">const</span> <span class="Type">struct</span> iovec __user *lvec,
<span class="LineNr" id="L864">864 </span> <span class="Type">unsigned</span> <span class="Type">long</span> liovcnt,
<span class="LineNr" id="L865">865 </span> <span class="Type">const</span> <span class="Type">struct</span> iovec __user *rvec,
<span class="LineNr" id="L866">866 </span> <span class="Type">unsigned</span> <span class="Type">long</span> riovcnt,
<span class="LineNr" id="L867">867 </span> <span class="Type">unsigned</span> <span class="Type">long</span> flags);
</pre>
</div>
<div>
<br />
<div>
And <a href="http://lxr.free-electrons.com/source/include/linux/syscalls.h?v=3.8#L871" target="_blank">here</a> is a bookmark to the relevant file in LXR.<br />
<br />
It's been a long time since I hacked on Linux, but I wonder what other interesting things have been added since I went over to the dark side (or came back to it, depending upon how you look at it).</div>
</div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0tag:blogger.com,1999:blog-5433555458120308127.post-17529585488126439332016-03-24T07:50:00.000-07:002017-04-23T18:18:03.638-07:00Beasting it<div>
Wherein, I share brute force tools based on treating strings like numbers.</div>
<div>
<br /></div>
<div>
Working in offensive security has opened my mind to the fact that hacks don't have to be beautiful. So in working a couple CTFs recently, brute force has readily come to mind for me (as you can see from <a href="http://baileysoriginalirishtech.blogspot.com/2015/08/flare-on-2015-2-write-up-part-1.html" target="_blank">other</a> <a href="http://baileysoriginalirishtech.blogspot.com/2015/10/flare-on-2015-2-write-up-part-2.html" target="_blank">articles</a> on my blog). This recently happened again, when I had an opportunity to run a brute force over the network (yes, very slow, but it was a small character set, so why not) in tandem with working out the real solution, as well as in BCTF 2016 where we were asked to calculate a string whose SHA-256 hash begins with 20 cleared bits.</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEivD1BdJiGmv28rLPXMuY2y5BWMRkzfyN1S7viBceQeSLqfCfBjnGMhPwjJxX929TZ2mH-bDz_ED18YqexgEs5TA5EHI6Bx9HtNg_rFyrh-NqLeaeig7TbwJpXPg0CaMXKIIWNyhhOulSoE/s1600/brutify.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="221" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEivD1BdJiGmv28rLPXMuY2y5BWMRkzfyN1S7viBceQeSLqfCfBjnGMhPwjJxX929TZ2mH-bDz_ED18YqexgEs5TA5EHI6Bx9HtNg_rFyrh-NqLeaeig7TbwJpXPg0CaMXKIIWNyhhOulSoE/s400/brutify.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Sure, what the hay</td></tr>
</tbody></table>
<div>
<div>
<br /></div>
<div>
I wrote a tool in Python, and another in C++, to treat strings as numbers of radix equal to the number of characters in the set of valid characters for the problem. By incrementing each "digit" of the string, and rolling over to the next when necessary, an incrementable string class iterates through all possible values for strings of that length and character set. Both tools use this to brute force a solution using strings of increasing length until either the solution is found or the sequence terminates.</div>
<div>
<br />
<div>
<div>
It seems that these sorts of things arise semi-frequently in CTFs, so I generalized these into a single-source-file "framework", polished them up a little bit, and am sharing them below.</div>
</div>
<br /></div>
<div>
As an example, here is the amount of C++ code I would have needed to write using <openssl/sha.h> and linking with -lcrypto to brute force the hash in the BetaFour challenge using this framework. It includes an evaluator callback (try_a_value) that determines whether the current brute force buffer value satisfies the problem, and two supporting functions to hash the value and to determine whether the hash begins with 20 bits of zeroes (it assumes little-endian).</div>
<div>
<br /></div>
<div>
<pre class="vimCodeElement"><span class="Type">bool</span>
try_a_value(<span class="Type">unsigned</span> <span class="Type">char</span> *val)
{
<span class="Type">unsigned</span> <span class="Type">char</span> md[SHA256_DIGEST_LENGTH];
PDEBUG(<span class="Constant">"Trying </span><span class="Special">%s</span><span class="Special">\n</span><span class="Constant">"</span>, val);
hash(val, md);
<span class="Statement">return</span> first20bits0(md);
}
<span class="Type">bool</span>
first20bits0(<span class="Type">unsigned</span> <span class="Type">char</span> *md) { <span class="Statement">return</span> !(*((<span class="Type">uint32_t</span> *)md) & <span class="Constant">0x00f0ffff</span>); }
<span class="Comment">/*</span><span class="Comment"> Calculate SHA-256 digest of string </span><span class="Comment">*/</span>
<span class="Type">void</span>
hash(<span class="Type">unsigned</span> <span class="Type">char</span> *startingwith, <span class="Type">unsigned</span> <span class="Type">char</span> *md) {
SHA256_CTX ctx;
<span class="Type">unsigned</span> <span class="Type">char</span> *data = startingwith;
<span class="Type">int</span> len = strlen((<span class="Type">const</span> <span class="Type">char</span> *)startingwith);
SHA256_Init(&ctx);
SHA256_Update(&ctx, data, len);
SHA256_Final(md, &ctx);
}
</pre>
</div>
<div>
<br /></div>
<div>
Only 24 lines including whitespace and comments. This will make it easier to for me to work on such challenges in the future, so in the spirit of openness and nerdy hackery, I thought I would share it.<br />
<br />
<div>
Brutiful C++ and Python brute force tools for Windows and Linux:</div>
<div>
</div>
<br />
<div>
<a href="https://github.com/strictlymike/brutiful">ht<span id="goog_2106978114"></span><span id="goog_2106978115"></span>tps://github.com/strictlymike/brutiful</a></div>
</div>
</div>
<style>
code {
font-family: Courier;
margin:.75em 0;
border:1px solid #596;
border-width:1px 1px;
padding:5px 15px;
display: block;
background-color: #dedede;
white-space: pre;
}
</style>Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0tag:blogger.com,1999:blog-5433555458120308127.post-86632450417348247632016-03-18T00:52:00.000-07:002017-04-23T18:17:03.833-07:00CPUID, SMSW, and Other DelightsI wrote a quick and dirty utility to collect info a la <a href="https://en.wikipedia.org/wiki/Blue_Pill_%28software%29#Red_Pill" target="_blank">redpill</a>, nopill (props to Danny Quist but I can't find that whitepaper anymore!), etc. Nothing really novel about it, but I thought others may find it useful for researchy scenarios. I used it to investigate a hypervisor running on an Intel microprocessor, so each output line includes an indication of whether <a href="https://en.wikipedia.org/wiki/X86_virtualization" target="_blank">VMX</a> appears to be supported. My intent was to train a Bayes learner to identify systems that are lying about whether they support VMX (thus likely detecting a hypervisor), similar to a previous project of mine, except in the course of this project, that became so very unnecessary.<br />
<br />
Here is a snippet of its output:<br />
<div>
<br /></div>
<div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjecqPJeSxjaX4BhVYyrHjw225IIrMYtWGMVcz578FBMMZp97Q3CburF5vLJ75geRU4YQAnAlQvGb7jHvQhsXxrKNhAxS_VAwzL24Z9FdWLxCQ8tN5qe7ICpOkpTBKimlZCwuqNiskAdj5t/s1600/cpuid1.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="202" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjecqPJeSxjaX4BhVYyrHjw225IIrMYtWGMVcz578FBMMZp97Q3CburF5vLJ75geRU4YQAnAlQvGb7jHvQhsXxrKNhAxS_VAwzL24Z9FdWLxCQ8tN5qe7ICpOkpTBKimlZCwuqNiskAdj5t/s400/cpuid1.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">So hex. Much flashy.</td></tr>
</tbody></table>
This tool works by creating and affinitizing a thread to each logical CPU in the system, executing a few <a href="https://msdn.microsoft.com/en-us/library/26td21ds.aspx" target="_blank">compiler intrinsics</a> and assembly functions, and outputting the desired information for each CPU. Like most research code I post, this tool is only as complete as I needed it to be for my own purposes. Therefore, it does not support 32-bit platforms, does not collect the value of the SLDT instruction from each processor, and is not meant for AMD microprocessors. If you can tolerate all that, then the source code is here:</div>
<div>
<br /></div>
<div>
<a href="https://github.com/strictlymike/cpuinfo">https://github.com/strictlymike/cpuinfo</a></div>
<div>
<br />
For more information about CPUID and using microprocessor instructions, I thought the book <a href="http://www.wiley.com/WileyCDA/WileyTitle/productCd-0764579010.html" target="_blank">Professional Assembly Language Programming</a> was very helpful, and of course, the <a href="http://www.intel.com/products/processor/manuals/" target="_blank">Intel 64 and IA-32 microprocessor manuals</a> are the authoritative reference on all things Intel x86 and x64.</div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0tag:blogger.com,1999:blog-5433555458120308127.post-23329050263223211672016-03-09T22:25:00.000-08:002018-01-13T11:21:45.688-08:00Extreme Rubber Ducking<blockquote class="tr_bq">
<i>"And now for something... completely different."</i></blockquote>
<div>
<br /></div>
<div>
In 2013, I started to think about how I could barely remember calculus. This made me a sad panda, so I started reviewing undergraduate math and then whipped out my electrical engineering textbooks (yes, I kept those). That gave way to a comprehensive review of my undergraduate that is still in progress.</div>
<div>
<br /></div>
<div>
In pounding through old textbooks without any teachers or tutors, I've learned that <i>preparing</i> to ask for help is actually a great way to solve problems independently. It forces me to walk through my case like a lawyer and rigorously present my assertions so any contradictions are laid bare. I find it most effective when I write it down and commit to posting it on a forum or asking a friend if I don't manage to figure it out by myself. This is like a slightly more stringent form of <a href="https://en.wikipedia.org/wiki/Rubber_duck_debugging" target="_blank">Rubber Duck Debugging</a>.</div>
<div>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><img alt="https://upload.wikimedia.org/wikipedia/commons/8/8e/Rubber_duckies_So_many_ducks.jpg" class="shrinkToFit" height="300" src="https://upload.wikimedia.org/wikipedia/commons/8/8e/Rubber_duckies_So_many_ducks.jpg" style="margin-left: auto; margin-right: auto;" width="400" /></td></tr>
<tr><td class="tr-caption" style="text-align: center;">I like to use several ducks at once. It's much more powerful that way.</td></tr>
</tbody></table>
Here, I share two examples of this in a context that is unusual to my blog: circuit analysis. In the first scenario, I was wrong, and in the second, it was the book's fault. Thanks, book! Great job!</div>
<h2>
Currently Going Nowhere</h2>
<div>
I first got stuck on a problem that entailed analyzing multiple circuit nodes as a single "supernode". The book did not work an example with three nodes and a dependent source, so I worked the problem repeatedly, getting the same wrong answer each time. After looking at the math over half a dozen times, I concluded that I was misunderstanding how to apply the new concept in this special case (three nodes, dependent source). I tried reading several articles about supernodes and it seemed like I was doing this correctly. I finally gave in and prepared to phone a friend.</div>
<div>
<br /></div>
<div>
For depicting this circuit, I found a pretty cool tool provided by DigiKey called <a href="http://www.digikey.com/schemeit/" target="_blank">SchemeIt</a>. I threw together this schematic:</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEguP2UZzgtXo_auB9iIUCW3k4RfJZ6YwRCk_2nj3c3BiLOlkwcBmAUKo1FQP7A4sHSIWL6Q_vmMwaR9oT30ssCY8QcMiVck5i8_w8eUe66pYdjem4lsshiP38sWtTs9VzIApizSsgymC3Kv/s1600/schemeit-project.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="268" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEguP2UZzgtXo_auB9iIUCW3k4RfJZ6YwRCk_2nj3c3BiLOlkwcBmAUKo1FQP7A4sHSIWL6Q_vmMwaR9oT30ssCY8QcMiVck5i8_w8eUe66pYdjem4lsshiP38sWtTs9VzIApizSsgymC3Kv/s320/schemeit-project.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">In the words of my EE professor: Very simple.</td></tr>
</tbody></table>
<div>
Then I began mounting my case. I started with describing the currents that enter and exit the supernode (nodes 1, 2, and 3). My discussion didn't get very far:</div>
<blockquote class="tr_bq">
<blockquote class="tr_bq">
Apply Kirchoff's Current Law to the supernode:</blockquote>
<blockquote class="tr_bq">
First, I define $i_1$ to be the same as $i$, $i_2$ to be the current from node 2 down to the reference node (through the 4-ohm resistor), $i_3$ to be the current from node 3 down to the reference node (through the 3-ohm resistor), and $i_4$ to be the current flowing from node 1 through the 6-ohm resistor to node 3. Hence,</blockquote>
<blockquote class="tr_bq">
$i_4 = i_1 + i_2 + i_3$</blockquote>
<blockquote class="tr_bq">
Wait.</blockquote>
<blockquote class="tr_bq">
Wait, wait, wait.</blockquote>
<blockquote class="tr_bq">
This is a flawed equation. It doesn't take into account the fact that $i_4$ both leaves AND enters the supernode.</blockquote>
</blockquote>
<div>
So, as far as the supernode was concerned, the current $i_4$ was going... Nowhere. It was both exiting and entering the node, so from the perspective of the supernode, $i_4$ cancelled itself out. I realized this because I took the time to carefully make my case and then the contradiction stood right out: "$i_4$ [is] the current flowing <b><u>from</u></b> node 1 through the 6-ohm resistor <b><u>to</u> </b>node 3" (both part of the same supernode). Onward!</div>
<h2>
This Practice Problem is Not Operational</h2>
<div>
I later ran into an issue applying the circuit equivalent model of an operational amplifier (an op-amp). I ran through the same process. First, I drew my rendition of the original schematic and my equivalent model. I transformed the circuit using the "non-ideal" model wherein the op-amp's inverting and non-inverting terminals are connected through an input resistance, and the op-amp's output terminal is modeled as a voltage-controlled voltage source in series with a small output resistance.</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgg3E4nxDy82WMHOJn0NUZGYebF7eSU_cRcU_WDbFKR1pmxJi0YnOXLQ3mYFKjzKLN022uqslr5IvUd6XTQeHmKC1cLz4zJ0kDoIrP7JRcTqJQkkBdqIzG5y4mycudAzu23VG4vzHOiQQ4t/s1600/tmp.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="312" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgg3E4nxDy82WMHOJn0NUZGYebF7eSU_cRcU_WDbFKR1pmxJi0YnOXLQ3mYFKjzKLN022uqslr5IvUd6XTQeHmKC1cLz4zJ0kDoIrP7JRcTqJQkkBdqIzG5y4mycudAzu23VG4vzHOiQQ4t/s640/tmp.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Equivalent. See? Very simple.</td></tr>
</tbody></table>
<br />
<br /></div>
<div>
The exercise was to find (a) the closed-loop gain $v_o/v_s$, and (b) the output current $i_o$ when $v_s = 1 V$. I went to work explaining myself, walking through the application of Ohm's law to define current-voltage relationships, Kirchoff's Voltage Law, mesh analysis, etc., until I arrived at a system of equations. I punched it all into GNU Octave and got those same old familiar answers:</div>
<div>
<br /></div>
<div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3QsDFrDufSou4X2Zgk_CuKH-Etog_OxS4kULA9nX0Piw9MF6CAoB9JQ-w3mj-iHSEx3xdIZS7n4mZ6NX9N_Aej4oZu2UeZTnAVppkgpYRIr4UHsk2FsPS4_j-YH2s0hSgazp4eSlVx5Q0/s1600/tmp.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="456" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3QsDFrDufSou4X2Zgk_CuKH-Etog_OxS4kULA9nX0Piw9MF6CAoB9JQ-w3mj-iHSEx3xdIZS7n4mZ6NX9N_Aej4oZu2UeZTnAVppkgpYRIr4UHsk2FsPS4_j-YH2s0hSgazp4eSlVx5Q0/s640/tmp.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Matlab, eat yer heart out!</td></tr>
</tbody></table>
From the 3x1 matrix above, I concluded $i_3 = -6.5 \times 10^{-4} A$, or -650 uA. Substituting some more equations, I got:<br />
<blockquote class="tr_bq">
$i_o = - (-650 uA) = +650 uA$</blockquote>
However, the book's answer was that $i_o = -362 mA$.<br />
<br />
For the life of me, I couldn't find my error by talking it through. Before submitting the question to a friend or a forum, I thought I would work a few more examples and then revisit this one. I turned the page and worked the next example, which turned out to be a re-working of the same problem. The end of that example reads:<br />
<blockquote class="tr_bq">
<i>"This, again, is close to the value of 0.649 mA [aka 649uA] obtained in Practice Prob. 5.1 with the nonideal model."</i></blockquote>
Wait... That's... Not what the book said on the previous page! But that <i>is</i> what I got, every time I worked that problem! I'd been working this problem over and over, and it was a MISPRINT. ARRRRRRGH!!<br />
<h2>
Lessons Learned</h2>
<div>
I learned two things from these exercises:</div>
<div>
<ol>
<li>You can become more self-reliant in solving problems if you discipline yourself to write up the details like you're truly about to defend your thought process to someone else; and,</li>
<li>You're not always wrong ;-)</li>
</ol>
</div>
The glory of this process is that often I can use it to ferret out my own stupid mistakes without ever having to share them with anyone (unless I decide to write a ducking blog article about it).<br />
<br />
The experience I got from this also ties in closely with my professional observation that having to write about one's work forces the author to explain why the work is correct, which tends to yield ideas about how that work could have been done better. Which is to say, whether you're working out issues or you already think you're all the way there, reporting on your work will invariably improve the outcome.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3ICKSvjEWeyfXlCHXN-n9jCr6R-mcpZetuzJ9MS9PhGqVGDXI996xH6X7ksH6hquyUwCQSOmYBRRoiZBOim0ZxU_uPnUrNgavODC21bAvNQOU2porWjI_8u4BYAx-m4BUXGu9r35oR9H8/s1600/PBScouple012.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="221" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3ICKSvjEWeyfXlCHXN-n9jCr6R-mcpZetuzJ9MS9PhGqVGDXI996xH6X7ksH6hquyUwCQSOmYBRRoiZBOim0ZxU_uPnUrNgavODC21bAvNQOU2porWjI_8u4BYAx-m4BUXGu9r35oR9H8/s400/PBScouple012.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">You can celebrate by taking your ducky for a bubble bath!</td></tr>
</tbody></table>
<br />
In addition to the moral of the story, I also wanted to point out the following:</div>
<div>
<ol>
<li><a href="https://www.gnu.org/software/octave/" target="_blank">GNU Octave</a> is super useful</li>
<li><a href="http://digikey.com/schemeit/" target="_blank">SchemeIt</a> ain't too terrible, either</li>
</ol>
<div>
So there.</div>
</div>
Michael Baileyhttp://www.blogger.com/profile/03334775875024877919noreply@blogger.com0