Abstract
C and C++ are the most popular programming languages used to implement browsers,runtime libraries, internet of things devices and operating system kernels. Due to
the important nature of these devices and their software, it is important to identify
security vulnerabilities in the software before adversaries find them. One of the
common low-level vulnerabilities in C/C++ programming languages involves the
misuse of variadic functions. Variadic functions take a variable number of arguments
and pass them to other functions. Misusing variadic functions can lead to memory
safety violations, mismatching of function arguments or can enable execution of
remote code. The most common attack vectors involve providing input that forces
a function in the program to assume it has received more arguments than were
actually passed. This allows the attacker to read and possibly write values on the
control stack, and in effect dynamically patch the code while it is running.
The goal of this research is to develop a theory and proof of concept tool for
the automated detection of variadic functions in stripped binaries and the actual
numbers of arguments passed to that function. This technology will enable future automated
patching of vulnerable variadic functions. We implemented an automated
tool, called Detector for identifying variadic functions to assist software developers
and security analysts in pinpointing and repairing vulnerable code. The approach
presented in this dissertation focuses on analyzing stripped binaries, which are those
with all debug and symbol data removed. These binaries represent the difficulty
found in fixing security vulnerabilities in legacy code and third-party libraries, although
they can also be used to represent newly developed software. The target
binaries were compiled by three different compilers: GCC, Clang, and ICC in both
popular Intel x86 and x64 architectures.
Our major contribution in this research is using syntactic and semantic analysis
to detect the variadic functions based on their behavior in the stripped binary code.
Our experimental results indicate that the Detector is more accurate than other existing tools. The average of precision and recall for GCC x64 and x86 are more
than 99%, Clang X64 is around 98%, and ICC X64 is around 94%.