Sunday, April 23, 2017

Truthful ProgrammEEng

In this article I'll share how to approach complex logic problems by borrowing from the Electrical Engineering (EE) discipline. I'll discuss how I've used techniques from digital logic circuit design to write efficient and accurate code for hypervisor startup and for packet redirection.

This approach can reduce unnecessary nested/multiple conditional statements down to a one-liner. It applies to software engineering scenarios when you can express your problem in terms of a set of strictly Boolean (true/false) conditions.

If you find yourself contemplating a tangled web of conditional (if/else) code or you find it difficult to understand what conditions contribute to the decision you want your software to make, take a step back and consider applying this technique from digital logic circuit design. If nothing else, the first three or five steps will get you on the right track to write the correct conditional logic (in if/else style) to solve your problem. If you can carry this process to its conclusion, then you can arrive at the most optimized one-liner to solve your logic problem.

It consists of the following steps:
  1. Figure out what Boolean (true/false) conditions are necessary to the decision at hand
  2. Define a truth table that specifies all the possible conditions
  3. Mark which conditions you want to yield in a TRUE result
  4. Turn the TRUE cases (EEs call these minterms) into logical expressions
  5. Logical-OR those expressions together to render a decision (EEs call this the canonical sum-of-products, or CSOP, logic function)
  6. Simplify if possible

Here's a hypervisor programming example followed by a network filtering example to demonstrate the process.

Starting a VT-x Hypervisor


I needed to write startup code for a tiny hypervisor project using Intel Virtualization Technology Extensions (VT-x). I knew nothing about VT-x, but I did know that Intel documented the corresponding VMX instruction set and all its requirements in their processor manual, so I started there. Most of the publicly available code I could get my hands on seemed to specifically check one-off control flags to check if it was okay to enter VMX root operation, ignoring the exact specifications that Intel provided in more modern documentation. I decided to implement my own check based on the data sheet.

The Intel manual describes some requirements for control registers CR0 and CR4 to allow the VMXON instruction to work. Here is some not-very-succinct text from the Intel manual (in case this makes your eyes bleed, I'll provide a more succinct summary next):

Clear as mud.

In summary, there is a FIXED0 and a FIXED1 model-specific register (MSR) for CR0, and another pair of these for CR4. The most important part of the text above is the last sentence starting with "Thus, each bit...". This amounts to the following possible values of FIXED0:FIXED1 along with their corresponding meaning:


F0F1CR must be
00fixed to 0
11fixed to 1
01don't care

Notice that one case is missing from Intel's description: $F0:F1 = 10$. When we create a truth table including $CR$ as an input, this will amount to two missing / don't-care cases (one for $CR=0$ and another for $CR=1$).

So, expanding this out to take into account the two possible values of any bit in the control register we are checking, we have the following truth table for the possible values of $F0:F1$ and $CR$:


F0F1CRInvalid
000FALSE
001TRUE (control register bit should be 0 but is 1)
110TRUE (control register bit should be 1 but is 0)
111FALSE
010FALSE
011FALSE

Since there are two cases where a given control register bit can be in an invalid state, that means there are two logical cases, or minterms, either of which should result in a verdict of true (i.e. it is true that the control register is in an invalid state). Here's the same table from above, but formatted a little bit more like a truth table, with minterms called out.


F0F1CRInvalidMinterms
0000
0011$\overline{F0}\cdot\overline{F1}\cdot CR$
1101$F0 \cdot F1 \cdot \overline{CR}$
1110
0100
0110

This amounts to the following sum of products logic function:

$$Invalid(F0,F1,CR)=\overline{F0} \cdot \overline{F1} \cdot CR+F0 \cdot F1 \cdot \overline{CR}$$

Boolean algebra tells us that what is true for one bit in a bit vector is true for all bits. So, in C code, each negated variable (e.g. $\overline{F0}$) amounts to inverting all the bits in the integer. So, the C code for this would look like:


int is_invalid(int fixed0, int fixed1, int cr)
{
    return ((~fixed0 & ~fixed1 & cr) | (fixed0 & fixed1 & ~cr));
}

Boolean algebraic simplification can be applied to optimize this function to result in fewer logical operators, and fewer instruction opcodes emitted by the compiler. Optimization is left as an exercise for the student (hint: XOR is involved) ;-)

Redirecting Network Traffic


More recently, I've been writing logic for the Linux Diverter in FakeNet-NG. This project is the successor to the original FakeNet tool distributed by Siko and Tank to let people simulate a network in a single virtual machine. The NG version was originally developed to make it possible to use FakeNet on newer versions of Windows, but it would be nice for it to work on Linux, too.

So, I am writing a Linux "Diverter", which is a component that manages how packets are redirected and manipulated to simulate the network. This is a really fun project in which I'm using python-netfilterqueue to catch and redirect packets so we can observe traffic sent to any port in the system, even ports where no service is currently bound.

The specification for the Linux Diverter includes redirecting traffic sent to unbound ports into a single listener (similar to the INetSim dummy listener). We want the Linux version of FakeNet-NG to be used in two modes:
  • SingleHost: the malware and the network simulation tool are running on the same machine, like the legacy version of FakeNet.
  • MultiHost: the malware and the network simulation tool are running on different machines, like INetSim.
The conditions for whether a packet must be redirected boil down to a logic function of four inputs:

  • (A) Is the source address local?
  • (B) Is the destination address local?
  • (C) Is the source port bound by a FakeNet-NG listener?
  • (D) Is the destination port bound by a FakeNet-NG listener?

The criteria for changing the destination port of a packet are as follows:
  1. When using FakeNet-NG in MultiHost mode (like INetSim), if a foreign host is trying to talk to us or any other host, ensure that unbound ports get redirected to a listener.
  2. When using FakeNet-NG in SingleHost mode (like legacy FakeNet), ensure outbound destination packets are redirected except when the packet is a response from a FakeNet-NG listener (in other words, the packet originated from a bound port).
The truth table for a decision function that redirects traffic destined for local, unbound ports to a single dummy listener in both SingleHost and MultiHost scenarios is as follows, with local IPs and bound ports in bold:


Sixteen cases

Here, we translated all the different human-comprehensible conditions like "it's coming from a foreign host, don't care what port, and it's arriving at the local host to an unbound port" into a set of Boolean conditions that can be either true (1) or false (0). It's convenient to use letters like A, B, C, and D to represent the inputs (see the first row of the table above). The minterms of this function are the cases when it returns true, namely:


  1. $\overline{A} \cdot \overline{B} \cdot \overline{C} \cdot \overline{D}$
  2. $\overline{A} \cdot \overline{B} \cdot C \cdot \overline{D}$
  3. $\overline{A} \cdot B \cdot \overline{C} \cdot \overline{D}$
  4. $\overline{A} \cdot B \cdot C \cdot \overline{D}$
  5. $A \cdot \overline{B} \cdot \overline{C} \cdot \overline{D}$
  6. $A \cdot B \cdot \overline{C} \cdot \overline{D}$

We can see that a few terms can be simplified, such as (1) and (2). Since A, B, and D are false and C can be either true or false, these two really amount to A' B' D'. We can do this to arrive at a fairly simple function of these three disjunctive terms:


  • $\overline{A} \cdot \overline{B} \cdot \overline{D}$
  • $\overline{A} \cdot B \cdot \overline{D}$
  • $A \cdot \overline{C} \cdot \overline{D}$
From there, you can just get rid of B and collapse the first two cases into one:

  • $\overline{A} \cdot \overline{D}$
  • $A \cdot \overline{C} \cdot \overline{D}$
And from there, you can take one step further by looking more closely at the truth table to find the optimal logic. But there are six minterms here, so it might be time to bust out a more powerful tool: the ol' Karnaugh map. A K-map allows you to visually locate all the adjacent groups (pairs, quads, etc.) of logical "yes" outputs and arrive at the simplest possible logic function to yield the desired output. If you're not familiar, you should definitely check this out.

Here's the K-map for our logic function:


Short and sweet: two disjunctive terms in three variables

The K-map here has only two groups of adjacent minterms: those that occur when $A$ and $D$ are zero, and those that occur when $C$ and $D$ are zero. This results in two disjunctive terms, each requiring only two input variables apiece (three in total):
  • $\overline{A} \cdot \overline{D}$
  • $\overline{C} \cdot \overline{D}$
So, the minimal sum of products logic function would be:

$$R(A, B, C, D) = \overline{A}\overline{D}+\overline{C}\overline{D}$$
Or, in Python:


        return ((not src_local and not dport_bound) or
                (not sport_bound and not dport_bound))


Done and done.


The next time you find yourself rolling around in freaky nested conditionals and arriving at the wrong logic, try approaching it the way an electrical engineer would: express the problem in terms of a set of Boolean conditions and then figure out how you want each case to go. If you want the simplest result, you can use Boolean algebra or learn to use a K-map to handle the rest. But even if you stop at the truth table, at least this process will help you sort out logic that you might otherwise be confused about and ensure that you know all the test cases you'll need to include to properly test your code.

No comments:

Post a Comment