Back

Dynamic Binary Analysis with Intel Pin

Intro to Intel Pin

Dynamic Binary Instrumentation (DBI) is a technique for analyzing a running program by dynamically injecting analysis code. The added analysis code, or instrumentation code, is run in the context of the instrumented program with access to real, runtime values. DBI is a powerful technique since it does not require the source code for a program, as opposed to static analysis methods. In addition, it can instrument programs that generate code dynamically. To security researchers, DBI frameworks are invaluable tools as they allow for efficient ways to perform fuzzing, control flow analysis, and vulnerability detection with minimal overhead.

For this blog, I’ll explore Intel’s Pin tool and Linux system call hooking. Pin offers a comprehensive framework for creating pin tools to instrument at differing levels of granularity. You can find links to the Pin documentation in the references section. Also check out Gal Diskin’s slides from BlackHat for a more hands on overview of Pin’s functionality.

Identifying Linux System Calls

The main function of our pin tool example will be to intercept and identify the system calls made by a program. For reference, we can view the Linux x86_64 system call table here: https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/.

This table will help to identify the system calls by the mapped system call number.

One of the advantages of DBI is that we do not need the source code for analysis. For the sake of simplicity, the python script below will be our target for instrumentation. We know that it returns the response of a GET request to Google.

import urllib2
page = urllib2.urlopen("https://www.google.com").read()

We can use the strace tool to see the system calls made.

# strace python http.py
execve("/usr/bin/python", ["python", "http.py"], [/* 19 vars */]) = 0
[TRUNCATED]
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
sendto(3, "GET / HTTP/1.1\r\nAccept-Encoding:"..., 117, 0, NULL, 0) = 117
recvfrom(3, "HTTP/1.1 200 OK\r\nDate: Mon, 15 M"..., 8192, 0, NULL, NULL) = 1418
recvfrom(3, "d\"><meta content=\"@GoogleDoodles"..., 7422, 0, NULL, NULL) = 2836
recvfrom(3, "ocation,b=a.href.indexOf(\"#\");if"..., 4586, 0, NULL, NULL) = 4586
recvfrom(3, "b\" value=\"Google Search\" name=\"b"..., 8192, 0, NULL, NULL) = 3154
recvfrom(3, "", 5038, 0, NULL, NULL)    = 0
recvfrom(3, "", 8192, 0, NULL, NULL)    = 0
close(3)                                = 0
[TRUNCATED]

The strace output above gives us an abundance of information to work with, but we will focus on the system calls we want to intercept: sendto and recvfrom. These system calls are used to transmit messages to and from sockets. We can see the arguments provided to both of the system calls and we will try to read those same arguments with our pin tool.

Hooking sendto and recvfrom

The Pin API for system calls starts with two main functions: PIN_AddSyscallEntryFunction and PIN_AddSyscallExitFunction. These functions register callback functions for before and after the execution of the system call, respectively. The registered callback functions allow us to add instrumentation code before and after every system call is executed.

PIN_AddSyscallEntryFunction(&syscallEntryCallback, NULL);
PIN_AddSyscallExitFunction(&syscallExitCallback, NULL);

We can get the system call number with the PIN_GetSyscallNumber function. This function will get the system call number in the current context. Likewise, we can get the arguments for the current system call with PIN_GetSyscallArgument where ‘i’ is the ordinal number of the argument value.

//sendto: 44, recvfrom: 45
PIN_GetSyscallNumber(ctxt, std);
PIN_GetSyscallArgument(ctxt, std, i);

By referencing the man pages for our intercepted system calls we know that the second argument holds a pointer to a buffer containing the message contents to be sent or received. The third argument is the length of that buffer. Once we intercept our system call, we can read the value of the buffer with the code below.

ADDRINT buf = PIN_GetSyscallArgument(ctxt, std, 1);
ADDRINT len = PIN_GetSyscallArgument(ctxt, std, 2);
int buflen = (int)len;
char *bufptr = (char *)buf;
for (int i = 0; i < buflen; i++, bufptr++) {
    fprintf(stdout, "%c", *bufptr);
}

The buffer pointer is our starting point and we walk “byte-by-byte” dereferencing the buffer pointer to read the value at each point until we hit the end length. Putting it all together, we can see some of the results below.

#../../../pin -t obj-intel64/syscalltest.so -- python http.py
call PIN_AddSyscallEntryFunction
call PIN_AddSyscallExitFunction
call PIN_StartProgram()
[TRUNCATED]
systemcall sendto: 44
buffer start: 0x7ff81ef26eb4
length: 117
GET / HTTP/1.1
Accept-Encoding: identity
Host: www.google.com
Connection: close
User-Agent: Python-urllib/2.7
[TRUNCATED]
systemcall recvfrom: 45
buffer start: 0x5644e5db7934
length: 8192
emtype="https://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script>
[TRUNCATED]

The output of the example is far from clean but it does contain the information we want to intercept, the GET request and response. We can identify the system calls associated with network communications and even see the values of the arguments passed back and forth. Imagine if our binary from before sent login credentials in a GET request. We can retrieve that information.

systemcall sendto: 44
buffer start: 0x7f3b3dcf61c4
length: 146
GET /login?user=admin&pass=badpass HTTP/1.1
Accept-Encoding: identity
Host: www.notarealhost.com
Connection: close
User-Agent: Python-urllib/2.7

This example only scrapes the surface of the functionality that the Pin framework has to offer. In the future, I hope to create more complex tools for fuzzing.

You can find the example code at https://github.com/NetSPI/Pin.

References

Discover how the NetSPI BAS solution helps organizations validate the efficacy of existing security controls and understand their Security Posture and Readiness.

X