Coding Project: LLVM Analysis

Due: December 18th, 4:00 PM

For this assignment, you will create a standalone program named dataflow (in the language of your choice). The code should be substantially your own. Although you may use auxiliary libraries, the fundamental tasks described must be performed by code you wrote for this assignment. You may work with 1 partner on this assignment. For credit on the assignment, each submission should be no more than incidentally similar in design to any other and each member of a submitting team must be able to explain their solution in detail. Each member of the team should be indicated in a file called COLLAB.txt included in the submitted tarball, which should also disclose any references or third-party code used as part of completing the assignment.

The analysis program will perform some basic analysis over an input file in the same format used for previous assignments (detailed here).

Assignment Overview

This assignment will build upon your previous work in building control-flow graphs, but this time you will analyze CFGs to look for dataflows. If a dataflow is detected, your program should output the single line FLOW. If there is no dataflow, your program should output the single line NO FLOW.

Analysis Details

Your analysis should consist of a flow-sensitive, context-insensitive, intraprocedureal dataflow analysis. To simplify the assignment, you may assume that there will be no function calls in the program, with the exception of the special SOURCE and SINK functions, described below. You do not need to worry about implicit dataflows, dead code, or aliasing (there will be no pointer assignments or non-scalar operations).

SOURCE and SINK
The analysis will consider the only source of secret data to be an externally-defined special function named SOURCE and the only sink of data to be an externally-defined special function named SINK (these functions have no other behavior). These two functions will be the only externally-defined functions. file.

Consider the following examples:

Example 1

define i32 @main() #0 {
  %aVar = alloca i32
  %bVar = alloca i32
  %secret = call i32 () @SOURCE()
  store i32 %secret, ptr %aVar
  %a1 = load i32, ptr %aVar
  store i32 %a1, ptr %bVar
  %b1 = load i32, ptr %bVar
  call void @SINK(i32 %b1)
  ret i32 0
}

declare i32 @SOURCE()
declare void @SINK(i32)
This input code must print out FLOW, because the analysis should detect that the return from SOURCE (named %secret here) flows through the variable a to the variable b to SINK.

Example 2

define i32 @main() {
  %aVar = alloca i32
  %secret = call i32 () @SOURCE()
  store i32 %secret, ptr %aVar
  store i32 0, ptr %aVar
  %a1 = load i32, ptr %aVar
  call void @SINK(i32 %a1)
  ret i32 0
}

declare i32 @SOURCE()
declare void @SINK(i32 )
This code must print out NO FLOW, because although aVar does get a secret value, that value is overwritten by the constant 0 and thus there is no flow from SOURCE to SINK.

Example 3

define i32 @main() {
  %aVar = alloca i32
  store i32 0, ptr %aVar
  %a1 = load i32, ptr %aVar
  %ifTruth = icmp ne i32 %a1, 0
  br i1 %ifTruth, label %ifBody, label %afterIf

ifBody:
  %secret = call i32 (...) @SOURCE()
  store i32 %secret, ptr %aVar
  br label %afterIf

afterIf:
  %a2 = load i32, ptr %aVar
  call void @SINK(i32 %a2)
  ret i32 0
}

declare i32 @SOURCE(...)
declare void @SINK(i32)
This input code must print out FLOW, because although aVar is only set in the true branch of an if statement, it represents one path of dataflow from SOURCE to SINK. Also note that the FLOW here is a false positive - the body of the if statement will never actually be executed because aVar's value must be non-zero, and aVar's value is always set to 0. That's ok - you should not worry about tracking concrete data values. Only track whether or not a variable is part of a direct dataflow from SOURCE.

Your Submission

You should submit a gzipped tarball named P3.tgz containing a single directory named P3. Within P3 you should include whatever code you need to complete the project as well as your COLLAB.txt.

For the purpose of grading, your code should be able to be tested with the following sequence of commands on the cycle servers:

tar -xf P3.tgz
cd P3
make
./dataflow test1.ll
where test1.ll is a file containing code in our analysis target language again, referenced here.

Command-line compliance

As described above, you may use any coding language of your choice to complete this assignment. However, the above sequence of commands must work without error for full credit. For example, if you are using python, you might include a Makefile that immediately exits 0 (since there is no code to build). Furthermore, you should either include a helper script named graph that launches the python interpreter with its argument.