Coding Project: LLVM Analysis
Due: December 18th, 4:00 PM
For this assignment, you will create a standalone program named dataflow
(in the language of your choice).
The code should be substantially your own. Although you may use auxiliary libraries, the fundamental tasks described must
be performed by code you wrote for this assignment. You may work with 1 partner on this assignment. For credit on the assignment,
each submission should be no more than incidentally similar in design to any other and each member of a submitting team must
be able to explain their solution in detail.
Each member of the team should be indicated in a file called COLLAB.txt
included in the submitted tarball, which should also disclose any references or
third-party code used as part of completing the assignment.
The analysis program will perform some basic analysis over an input file in the same format used for previous assignments (detailed here).
Assignment Overview
This assignment will build upon your previous work in building control-flow graphs, but this time you will analyze CFGs to look for dataflows. If a dataflow is detected, your program should output the single line FLOW. If there is no dataflow, your program should output the single line NO FLOW.
Analysis Details
Your analysis should consist of a flow-sensitive, context-insensitive, intraprocedureal dataflow analysis. To simplify the assignment, you may assume that there will be no function calls in the program, with the exception of the special SOURCE and SINK functions, described below. You do not need to worry about implicit dataflows, dead code, or aliasing (there will be no pointer assignments or non-scalar operations).
SOURCE and SINK
The analysis will consider the only source of secret data to be an externally-defined special function named SOURCE and the only sink of data to be an externally-defined special function named SINK (these functions have no other behavior). These two functions will be the only externally-defined functions. file.Consider the following examples:
Example 1
define i32 @main() #0 { %aVar = alloca i32 %bVar = alloca i32 %secret = call i32 () @SOURCE() store i32 %secret, ptr %aVar %a1 = load i32, ptr %aVar store i32 %a1, ptr %bVar %b1 = load i32, ptr %bVar call void @SINK(i32 %b1) ret i32 0 } declare i32 @SOURCE() declare void @SINK(i32)
Example 2
define i32 @main() { %aVar = alloca i32 %secret = call i32 () @SOURCE() store i32 %secret, ptr %aVar store i32 0, ptr %aVar %a1 = load i32, ptr %aVar call void @SINK(i32 %a1) ret i32 0 } declare i32 @SOURCE() declare void @SINK(i32 )
aVar
does get a secret value, that value is overwritten by the constant
0 and thus there is no flow from SOURCE to SINK.
Example 3
define i32 @main() { %aVar = alloca i32 store i32 0, ptr %aVar %a1 = load i32, ptr %aVar %ifTruth = icmp ne i32 %a1, 0 br i1 %ifTruth, label %ifBody, label %afterIf ifBody: %secret = call i32 (...) @SOURCE() store i32 %secret, ptr %aVar br label %afterIf afterIf: %a2 = load i32, ptr %aVar call void @SINK(i32 %a2) ret i32 0 } declare i32 @SOURCE(...) declare void @SINK(i32)
aVar
is only set in the true branch of an if statement, it represents one
path of dataflow from SOURCE to SINK. Also note that the FLOW here is
a false positive - the body of the if statement will never actually
be executed because aVar's value must be non-zero, and aVar's value
is always set to 0. That's ok - you should not worry about tracking
concrete data values. Only track whether or not a variable is part of
a direct dataflow from SOURCE.
Your Submission
You should submit a gzipped tarball named P3.tgz containing a single directory named P3. Within P3 you should include whatever code you need to complete the project as well as your COLLAB.txt.
For the purpose of grading, your code should be able to be tested with the following sequence of commands on the cycle servers:
tar -xf P3.tgz cd P3 make ./dataflow test1.ll
test1.ll
is a file containing code in our analysis target language
again, referenced here.
Command-line compliance
As described above, you may use any coding language of your choice to complete
this assignment. However, the above sequence of commands must work without
error for full credit. For example, if you are using python, you might include
a Makefile that immediately exits 0 (since there is no code to build). Furthermore,
you should either include a helper script named graph
that
launches the python interpreter with its argument.