Usage

Home
What is DOC
Eclipse
A Quick Guide to Setting Up DOC
Usage
Creating External Library (DLL) Stubs
Editing the Parsing Code
Unit Testing
Acceptance Testing
Sample Files
Papers
Credits

Creating a New Project

The application must be launched as an Eclipse application. To do this, right-click the project and select Run As->Eclipse Aplication (screenshot). If this is the first time you've launched the project, the Eclipse Welcome Screen is displayed (screenshot). To open the DOC interface, select the Window menu, choose Open Perspective->Other..., then select Obfuscation Detection.

At the DOC interface (screenshot), open a new file for interpretation by selecting File->New Project... and select the Obfusction Detection project wizard (screenshot). At the next screen, enter a project name (this can be anything of your choice) and select an assembly file (for a sample file, choose DOC_WORKSPACE\Samples\sample1.asm). After creating the project, double-click the assembly file in the Navigator view on the left. The assembly file is displayed (screenshot).

Running the Interpreter

To interpret the file using the abstract stack graph with value-set analysis, select Interpreter->VSA-ASG Obfuscation Analysis. For a small file like sample1.asm, the interpretation should be very quick.

Navigating the Resulting File

After performing abstract interpretation, the interface is updated to include several annotations to the assembly listing, a listing of register values, and a representation of the abstract stack (screenshot). Selecting an instruction in the assembly listing updates the register and stack display. By moving through the instruction line by line, you can see how each instruction affects the state of the system.

The bottom of the screen consists of tabs for valid call-return sites, obfuscated call sites, and obfuscated return sites. From here, you can easily find obfuscated calls and returns.

Understanding the Results

Hopefully the user interface will be mostly intuitive. One area that needs some explanation is how the register and stack values are displayed.

On a real system, each value is represented as a 32- or 64-bit number. In an abstract interpreter, each real value is represented by an abstract value. For DOC, that abstract value is a combination of a reduce-interval congruence (RIC) along with a stack location. This will be briefly covered here; for more information, please see our published papers.

A register can hold either a number or a memory address. To represent a number, DOC uses the RIC. An RIC consisely represents a set of numbers. Thus, if an instruction places a number in a register, that number will be represented with an RIC. On the other hand, the instruction might place a memory address in the register. If the memory address belongs to the stack (for example, an instruction like MOV eax, esp, the interpreter has no way of knowing what that address is exactly. It does, however, know which node in the abstract stack graph it is referring to. Therefore, each value in DOC consists of an RIC and a stack node. DOC does not model the heap, so if memory on the heap is accessed, it is unable to determine the location.

To illustrate how the values work, look at the screenshot, the value in register ebx is (a, T). The left-side value (in hex) represents the numerical value stored in this register; the right-side value references one or more nodes in the abstract stack graph. In the screenshot, ebx has the value Ah. The T (top) on the right-hand side means ebx doesn't reference any nodes in the stack graph.

As another example, look at register esp. It is equal to (T, {1}). The left-side (the RIC portion) is T, so there is no numerical value assigned to this register. The register is equal to the set of stack nodes {1}. We can look up the value stored at that stack location by looking at the stack graph. In this case, the value stored at stack node 1 is (5, T). Thus, if we would dereference esp, we would get the value 5.

Many of the registers are given the value (T, T). This means that the value is unknown.