bicA C interpreter and API explorer.
bic
: A C interpreter and API explorer
This a project that allows developers to explore and test C-APIs using a read eval print loop, also known as a REPL.
Dependencies
BIC’s run-time dependencies are as follows:
To build BIC, you’ll need:
Please ensure you have these installed before building bic. The following command should install these on a Debian/Ubuntu system:
apt-get install build-essential libreadline-dev autoconf-archive libgmp-dev expect flex bison automake m4 libtool pkg-config
You can also use the following command to install the required dependencies via Homebrew on a MacOS system.
brew install bison flex gmp readline autoconf-archive
Installation
You can compile and install bic with the following commands:
autoreconf -i ./configure --enable-debug make make install
For building on a MacOS system, you need to change the configure line to:
YACC="$(brew --prefix bison)/bin/bison -y" ./configure --enable-debug
Docker
You can use docker to build and run bic with the following command:
docker build -t bic https://github.com/hexagonal-sun/bic.git#master
Once the image is build you can then run bic with:
docker run -i bic
Arch Linux
If you are using Arch Linux, you can install bic from AUR:
yay -S bic
Usage
REPL
When invoking bic with no arguments the user is presented with a REPL prompt:
BIC>
Here you can type C statements and #include
various system headers to provide access to different APIs on the system. Statements can be entered directly into the REPL; there is no need to define a function for them to be evaluated. Say we wish to execute the following C program:
#include <stdio.h>
int main()
{
FILE *f = fopen("out.txt", "w");
fputs("Hello, world!\n", f);
return 0;
}
We can do this on the REPL with BIC using the following commands:
BIC> #include <stdio.h> BIC> FILE *f; f BIC> f = fopen("test.txt", "w"); BIC> fputs("Hello, World!\n", f); 1 BIC>
This will cause bic to call out to the C-library fopen()
and fputs()
functions to create a file and write the hello world string into it. If you now exit bic, you should see a file test.txt
in the current working directory with the string Hello, World\n
contained within it.
Notice that after evaluating an expression bic will print the result of evaluation. This can be useful for testing out simple expressions:
BIC> 2 * 8 + fileno(f); 19
The Inspector
You can use bic to obtain information about any variable or type that has been declared by prefixing it’s name with a ?
. This special syntax only works in the REPL but will allow you to obtain various characteristics about types and variables. For example:
BIC> #include <stdio.h> BIC> ?stdout stdout is a pointer to a struct _IO_FILE. value of stdout is 0x7ff1325bc5c0. sizeof(stdout) = 8 bytes. stdout was declared at: /usr/include/stdio.h:138.
Startup Files
When the REPL starts, bic will see if ~/.bic
exists. If it does it is automatically evaluated and the resulting enviroment is used by the REPL. This can be useful for defining functions or varibles that are commonly used. For instance, say our ~/.bic
file contains:
#include <stdio.h>
int increment(int a)
{
return a + 1;
}
puts("Good morning, Dave.");
When we launch the REPL we get:
$ bic Good morning, Dave. BIC> increment(2); 3
Evaluating Files
If you pass bic a source file, along with -s
, as a command line argument it will evaluate it, by calling a main()
function. For example, suppose we have the file test.c
that contains the following:
#include <stdio.h>
int factorial(int n)
{
if (!n)
{
return 1;
}
return n * factorial(n - 1);
}
int main()
{
printf("Factorial of 4 is: %d\n", factorial(4));
return 0;
}
We can then invoke bic with -s test.c
to evaluate it:
$ bic -s test.c Factorial of 4 is: 24
Passing Arguments
If you wish to pass arguments to a C file, append them to bic’s command line. Once bic has processed the -s
argument all other arguments are treated as parameters to be passed to the program. These parameters are created as argc
and argv
variables and passed to main()
. The value of argv[0]
is the name of the C file that bic is executing. Consider the following C program:
#include <stdio.h>
int main(int argc, char *argv[])
{
for (int i = 0; i < argc; i++)
printf("argv[%d] = %s\n", i, argv[i]);
return 0;
}
If we don’t pass any arguments:
$ bic -s test.c argv[0] = test.c
Whereas if we invoke bic with more arguments, they are passed to the program:
$ bic -s test.c -a foo -s bar a b c argv[0] = test.c argv[1] = -a argv[2] = foo argv[3] = -s argv[4] = bar argv[5] = a argv[6] = b argv[7] = c
Dropping Into a REPL
You can also use a special expression: <REPL>;
in your source code to make bic drop you into the repl at a particular point in the file evaluation:
Exploring external libraries with the REPL
You can use bic to explore the APIs of other libraries other than libc. Let’s suppose we wish to explore the Capstone library, we pass in a -l
option to make bic load that library when it starts. For example:
Notice that when bic prints a compound data type (a struct
or a union
), it shows all member names and their corresponding values.
Implementation Overview
Tree Objects
At the heart of bic’s implementation is the tree
object. These are generic objects that can be used to represent an entire program as well as the current evaluator state. It is implemented in tree.h
and tree.c
. Each tree type is defined in c.lang
. The c.lang
file is a lisp-like specification of:
- Object name, for example
T_ADD
. - A human readable name, such as
Addition
. - A property name prefix, such as
tADD
. - A list of properties for this type, such as
LHS
andRHS
.
The code to create an object with the above set of attributes would be:
(deftype T_ADD "Addition" "tADD"
("LHS" "RHS"))
Once defined, we can use this object in our C code in the following way:
tree make_increment(tree number)
{
tree add = tree_make(T_ADD);
tADD_LHS(add) = number;
tADD_RHS(add) = tree_make_const_int(1);
return add;
}
Notice that a set of accessor macros, tADD_LHS()
and tADD_RHS()
, have been generated for us to access the different property slots. When --enable-debug
is set during compilation each one of these macros expands to a check to ensure that when setting the tADD_LHS
property of an object that the object is indeed an instance of a T_ADD
.
The c.lang
file is read by numerous source-to-source compilers that generate code snippets. These utilities include:
-
gentype
: Generates a list of tree object types. -
gentree
: Generates a structure that contains all the property data for tree objects. -
genctypes
: Generates a list of C-Type tree objects - these represent the fundamental data types in C. -
genaccess
: Generate accessor macros for tree object properties. -
gengc
: Generate a mark function for each tree object, this allows the garbage collector to traverse object trees. -
gendump
: Generate code to dump out tree objects recursively. -
gendot
: Generate a dot file for a giventree
hierarchy, allowing it to be visualised.
Evaluator
The output of the lexer & parser is a tree
object hierarchy which is then passed into the evaluator (evaluator.c
). The evaluator will then recursively evaluate each tree element, updating internal evaluator state, thereby executing a program.
Calls to functions external to the evaluator are handled in a platform-dependent way. Currently x86_64 and aarch64 are the only supported platforms and the code to handle this is in the x86_64
and aarch64
folders respectively. This works by taking a function call tree
object (represented by a T_FN_CALL
) from the evaluator with all arguments evaluated and marshalling them into a simple linked-list. This is then traversed in assembly to move the value into the correct register according to the x86_64 or aarch64 calling-conventions and then branching to the function address.
Parser & Lexer
The parser and lexer are implemented in parser.m4
and lex.m4
respectively. After passing through M4 the output is two bison parsers and two flex lexers.
The reason for two parsers is that the grammar for a C REPL is very different than that of a C file. For example, we want the user to be able to type in statements to be evaluated on the REPL without the need for wrapping them in a function. Unfortunately writing a statement that is outside a function body isn’t valid C. As such, we don’t want the user to be able to write bare statements in a C file. To achieve this we have two different set of grammar rules which produces two parsers. Most of the grammar rules do overlap and therefore we use a single M4 file to take care of the differences.