Creating External Library (DLL) Stubs

One problem with interpreting an executable is determining how to handle external library calls, such as Windows DLLs. Two possible approaches include:

  1. When a DLL call is made, interpret the DLL function. This can work if the machine doing the interpreting has the DLL on disk. To do this, the DLL must be disassembled and the function being called must be passed through the interpreter.

    A drawback of this approach is that many DLL functions rely on other DLLs, causing a cascading effect requiring many DLL functions to be disassembled and interpreted. Even very simple programs can require several DLLs to be included.

  2. Another option is to create stub functions that resemble the actual DLLs. The stub functions should closely approximate the actual function implementation, but without requiring the overhead of including other DLLs. The stub function might be written in assembly or some other pseudo-language created specifically for this purpose. The stubs can be created by hand or can be automatically generated.

DOC handles DLLs using the second method.

To create a stub DLL for DOC, two files must be created: a .exports and a .asm. The name of the file must match the name of the DLL being stubbed. For example, if you are stubbing the DLL kernel32.dll, the two files must be named kernel32.asm and kernel32.exports. When DOC encounters a call to a function inside the DLL kernel32.dll, it will look for these two files to find the stub for the function.

The stubbing process somewhat mimics how real DLLs work. The .exports file contains the listing of exported files. It acts like the exports section of a DLL. It provides a mapping from function name to function address. DOC uses this to find where the stub for a particular function is located within the .asm file.

Once the address is known, DOC finds the stub within the .asm file and interprets it. The stub is written in ordinary assembly, with one exception: the addition of the UNDEF instruction. The UNDEF instruction is used in cases where the DLL function returns or writes to memory a value that cannot be determined during abstract interpretation. For example, imagine a function that reads an integer from a socket and returns what it reads. It's not possible to know what it will read during abstract interpretation, so you want it to return an undefined value. In this case, the stub would contain the instruction "UNDEF eax," (eax holds the return address).

Take a look at the stub files in DOC_WORKSPACE/Imports for example stubs. These files are by no means complete. I've created these by hand, but there are way too many DLLs out there to do it all by hand. An automated method needs to be developed.