Coding Project: LLVM Analysis
For this assignment, you will create a standalone program named analysis
(in the language of your choice).
The code should be substantially your own. Although you may use auxiliary libraries, the fundamental tasks described must
be performed by code you wrote for this assignment. You may work with 1 partner on this assignment. For credit on the assignment,
each submission should be no more than incidentally similar in design to any other and each member of a submitting team must
be able to explain their solution in detail.
Each member of the team should be indicated in a file called COLLAB.txt
included in the submitted tarball, which should also disclose any references or
third-party code used as part of completing the assignment.
The analysis program will perform some basic analysis over an input file in a restricted subset of LLVM that we covered in class. In particular, the only pseudoinstructions that you need to cover are:
- alloca
- load
- store
- ret
- icmp
- br
- add
- sub
- div
- mul
- call
- define
Additionally, your analysis should accomodate programs with both global and local variable allocations. You may assume that all variables are of scalar integer type. Although they are not instructions, you should also handle phi nodes in your program!
Assignment Details
This assignment consists of two parts, CFG inference and static dataflow:CFG Inference
The basic idea of this part is to construct the control-flow graph of each function declared in a given input file and output those control-flow graphs in the graphviz dot format. For each function in the input, the program will output a graphviz control-flow graph. The nodes have record shape and an empty label. Edges should be directed and have a label according to their position in the branch. For example, consider the input
define i32 @main(i32 %argc) { %noArgs = icmp eq i32 %argc, 1 br i1 %noArgs, label %lbl_t, label %lbl_f lbl_t: %varT = add i32 1, 0 br label %end lbl_f: %varF = add i32 2, 0 br label %end end: %var = phi i32 [%varT, %lbl_t], [%varF, %lbl_f] ret i32 %var }
The output would be of the form
digraph { Node0 [shape=record,label=""] Node0 -> Node1 [label=1]; Node0 -> Node2 [label=2]; Node1 [shape=record,label=""] Node1 -> Node3 [label=1]; Node2 [shape=record,label=""] Node2 -> Node3 [label=1]; Node3 [shape=record,label=""] }
Static Dataflow
This part is optional for undergraduate students. It is required for graduate studentsFor this part, your goal will be to detect if a flow-sensitive, static dataflow analysis over the control flow graph detects a flow from the result of invoking a function SOURCE to the (single) argument of a function SINK in any function in the input. If so, the program should exit with code 0 after printing the single line
LEAKIf there is no flow, the program should exit with code 0 after printing the single line
NO LEAKFor example, consider the following input:
define i32 @main(i32 %0, i8** %1) { %res = call i32 @SOURCE () call i32 @SINK (i32 %res) ret i32 0 } declare dso_local i32 @SOURCE() declare dso_local i32 @SINK(i32)
This program SHOULD print LEAK.
Note that the analysis is over the control-flow graph. Thus, should tagged data from SOURCE enter any function call, it should be considered a leak. For example, consider the program:define i32 @foo(i32 %0, i8** %1) { ret i32 7 } define i32 @main(i32 %0, i8** %1) { %res = call i32 @SOURCE () call i32 @foo (i32 %res) ret i32 0 } declare dso_local i32 @SOURCE() declare dso_local i32 @SINK(i32)
This program SHOULD print LEAK, even though there is no (interprocedural) flow.
Building and running your code
This section describes how your project will be graded.Build scripts
Grading will proceed by downloading your submission and running the following command to build your code:tar -xf <submission.tgz> cd c1 make
Please ensure that this sequence of commands is sufficient for building your code on the cycle servers (if your build requires special accomodations, please contact Drew). You are certainly welcome to use a dynamic language, in which case you should include a makefile that is simply a no-op. If the above sequence results in a broken build your submission will be awarded 0 points.
For CFG Inference, your code will be graded by correctness in adherence to the following command-line format:
Your program should take the following two arguments:-
-i <infile>
where<infile>
is replaced by the name of an input file in the LLVM human-readable bitcode format (i.e. a .ll file). -g <outdir>
where<outdir>
is replaced by the name of a target output directory in the graphviz format. For each function <F> declared in <infile>, the program will add a file named <F>.dot in the directory <outdir>.
callTest.ll
. Then the command-line
./analysis -i callTest.ll -g mydir
For Static Dataflow, your code will be graded by correctness in adherence to the following command-line format:
-
-i <infile>
where<infile>
is replaced by the name of an input file in the LLVM human-readable bitcode format (i.e. a .ll file). -s
In other words, when the -s switch is provided, static analysis is performed on the input and LEAK or NO LEAK is printed to the terminal in accordance with the leaking behavior of ANY function in <infile>