EECS 677 - Secure Software Evaluation (Research)

Coding Project: LLVM Analysis

Due: October 11th, 11:59 PM

For this assignment, you will create a standalone program named graph (in the language of your choice). The code should be substantially your own. Although you may use auxiliary libraries, the fundamental tasks described must be performed by code you wrote for this assignment. You may work with 1 partner on this assignment. For credit on the assignment, each submission should be no more than incidentally similar in design to any other and each member of a submitting team must be able to explain their solution in detail. Each member of the team should be indicated in a file called COLLAB.txt included in the submitted tarball, which should also disclose any references or third-party code used as part of completing the assignment.

The analysis program will perform some basic analysis over an input file in a restricted subset of LLVM that we covered in class. In particular, the only pseudoinstructions that you need to cover are:

alloca
getelementpointer
load
store
ret
icmp
br
add
sub
div
mul
phi
call
define

Additionally, your analysis should accomodate programs with both global and local variable allocations. You may assume that all variables are of scalar integer type.

Assignment Details

This assignment consists of CFG inference, i.e. inducing a control-flow graph over an input.

The basic idea of is to construct the control-flow graph of each function declared in a given input file and output those control-flow graphs in the graphviz dot format. For each function in the input, the program will output a graphviz control-flow graph. The nodes should have record shape. For the sake of grading, it is sufficient to simply output the correct structure of the CFG with the correct edge labels. The labels of each node will not be graded (however it is probably a good idea to print out the llvm instructions within each node label for your own debugging). Edges should be directed and have a label according to their position in the branch, starting at index 0. For example, consider the input

Example 1 Input:

define i32 @main(i32 %argc) {
        %noArgs = icmp eq i32 %argc, 1
        br i1 %noArgs, label %lbl_t, label %lbl_f
lbl_t:
        %varT = add i32 1, 0
        br label %end
lbl_f:
        %varF = add i32 2, 0
        br label %end
end:
        %var = phi i32 [%varT, %lbl_t], [%varF, %lbl_f]
        ret i32 %var
}

The output would be of the form

Example 1 Output:

digraph {
        Node0 [shape=record,label=""]
        Node0 -> Node1 [label=0];
        Node0 -> Node2 [label=1];
        Node1 [shape=record,label=""]
        Node1 -> Node3 [label=0];
        Node2 [shape=record,label=""]
        Node2 -> Node3 [label=0];
        Node3 [shape=record,label=""]
}

Note that for this behavior to be correct, the output must be valid in the dot format.

Your Submission

You should submit a gzipped tarball named P2.tgz containing a single directory named P2. Within P2 you should include whatever code you need to complete the project as well as your COLLAB.txt.

For the purpose of grading, your code should be able to be tested with the following sequence of commands on the cycle servers:

tar -xf P2.tgz
cd P2
make
./graph test1.ll
dot main.dot -o main.png -Tpng

where test1.ll is a file containing a main function.

Command-line compliance

As described above, you may use any coding language of your choice to complete this assignment. However, the above sequence of commands must work without error for full credit. For example, if you are using python, you might include a Makefile that immediately exits 0 (since there is no code to build). Furthermore, you should either include a helper script named graph that launches the python interpreter with its argument.