Dynamic Binary Analysis with Intel Pin
Intro to Intel Pin
Dynamic Binary Instrumentation (DBI) is a technique for analyzing a running program by dynamically injecting analysis code. The added analysis code, or instrumentation code, is run in the context of the instrumented program with access to real, runtime values. DBI is a powerful technique since it does not require the source code for a program, as opposed to static analysis methods. In addition, it can instrument programs that generate code dynamically. To security researchers, DBI frameworks are invaluable tools as they allow for efficient ways to perform fuzzing, control flow analysis, and vulnerability detection with minimal overhead.
For this blog, I’ll explore Intel’s Pin tool and Linux system call hooking. Pin offers a comprehensive framework for creating pin tools to instrument at differing levels of granularity. You can find links to the Pin documentation in the references section. Also check out Gal Diskin’s slides from BlackHat for a more hands on overview of Pin’s functionality.
Identifying Linux System Calls
The main function of our pin tool example will be to intercept and identify the system calls made by a program. For reference, we can view the Linux x86_64 system call table here: https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/.
This table will help to identify the system calls by the mapped system call number.
One of the advantages of DBI is that we do not need the source code for analysis. For the sake of simplicity, the python script below will be our target for instrumentation. We know that it returns the response of a GET request to Google.
import urllib2 page = urllib2.urlopen("https://www.google.com").read()
We can use the strace tool to see the system calls made.
# strace python http.py execve("/usr/bin/python", ["python", "http.py"], [/* 19 vars */]) = 0 [TRUNCATED] socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 3 sendto(3, "GET / HTTP/1.1\r\nAccept-Encoding:"..., 117, 0, NULL, 0) = 117 recvfrom(3, "HTTP/1.1 200 OK\r\nDate: Mon, 15 M"..., 8192, 0, NULL, NULL) = 1418 recvfrom(3, "d\"><meta content=\"@GoogleDoodles"..., 7422, 0, NULL, NULL) = 2836 recvfrom(3, "ocation,b=a.href.indexOf(\"#\");if"..., 4586, 0, NULL, NULL) = 4586 recvfrom(3, "b\" value=\"Google Search\" name=\"b"..., 8192, 0, NULL, NULL) = 3154 recvfrom(3, "", 5038, 0, NULL, NULL) = 0 recvfrom(3, "", 8192, 0, NULL, NULL) = 0 close(3) = 0 [TRUNCATED]
The strace output above gives us an abundance of information to work with, but we will focus on the system calls we want to intercept: sendto and recvfrom. These system calls are used to transmit messages to and from sockets. We can see the arguments provided to both of the system calls and we will try to read those same arguments with our pin tool.
Hooking sendto and recvfrom
The Pin API for system calls starts with two main functions: PIN_AddSyscallEntryFunction and PIN_AddSyscallExitFunction. These functions register callback functions for before and after the execution of the system call, respectively. The registered callback functions allow us to add instrumentation code before and after every system call is executed.
PIN_AddSyscallEntryFunction(&syscallEntryCallback, NULL); PIN_AddSyscallExitFunction(&syscallExitCallback, NULL);
We can get the system call number with the PIN_GetSyscallNumber function. This function will get the system call number in the current context. Likewise, we can get the arguments for the current system call with PIN_GetSyscallArgument where ‘i’ is the ordinal number of the argument value.
//sendto: 44, recvfrom: 45 PIN_GetSyscallNumber(ctxt, std); PIN_GetSyscallArgument(ctxt, std, i);
By referencing the man pages for our intercepted system calls we know that the second argument holds a pointer to a buffer containing the message contents to be sent or received. The third argument is the length of that buffer. Once we intercept our system call, we can read the value of the buffer with the code below.
ADDRINT buf = PIN_GetSyscallArgument(ctxt, std, 1); ADDRINT len = PIN_GetSyscallArgument(ctxt, std, 2); int buflen = (int)len; char *bufptr = (char *)buf; for (int i = 0; i < buflen; i++, bufptr++) { fprintf(stdout, "%c", *bufptr); }
The buffer pointer is our starting point and we walk “byte-by-byte” dereferencing the buffer pointer to read the value at each point until we hit the end length. Putting it all together, we can see some of the results below.
#../../../pin -t obj-intel64/syscalltest.so -- python http.py call PIN_AddSyscallEntryFunction call PIN_AddSyscallExitFunction call PIN_StartProgram() [TRUNCATED] systemcall sendto: 44 buffer start: 0x7ff81ef26eb4 length: 117 GET / HTTP/1.1 Accept-Encoding: identity Host: www.google.com Connection: close User-Agent: Python-urllib/2.7 [TRUNCATED] systemcall recvfrom: 45 buffer start: 0x5644e5db7934 length: 8192 emtype="https://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script> [TRUNCATED]
The output of the example is far from clean but it does contain the information we want to intercept, the GET request and response. We can identify the system calls associated with network communications and even see the values of the arguments passed back and forth. Imagine if our binary from before sent login credentials in a GET request. We can retrieve that information.
systemcall sendto: 44 buffer start: 0x7f3b3dcf61c4 length: 146 GET /login?user=admin&pass=badpass HTTP/1.1 Accept-Encoding: identity Host: www.notarealhost.com Connection: close User-Agent: Python-urllib/2.7
This example only scrapes the surface of the functionality that the Pin framework has to offer. In the future, I hope to create more complex tools for fuzzing.
You can find the example code at https://github.com/NetSPI/Pin.
References
- https://media.blackhat.com/bh-us-11/Diskin/BH_US_11_Diskin_Binary_Instrumentation_Slides.pdf
- https://software.intel.com/sites/landingpage/pintool/docs/81205/Pin/html/
- https://software.intel.com/sites/landingpage/pintool/docs/81205/Pin/html/group__PIN__SYSCALL__API.html
- https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/
- https://linux.die.net/man/2/sendto
- https://linux.die.net/man/2/recvfrom
Explore more blog posts
Bytes, Books, and Blockbusters: The NetSPI Agents’ Top Cybersecurity Fiction Picks
Craving a cybersecurity movie marathon? Get recommendations from The NetSPI Agents on their favorite media to get inspired for ethical hacking.
Social Engineering Stories: One Phish, Two Vish, and Tips for Stronger Defenses
Hear real-world social engineering stories from The NetSPI Agents and tips to enhance your social engineering testing.
Hacking CICS: 7 Ways to Defeat Mainframe Applications
Explore how modern penetration testing tools uncover vulnerabilities in mainframe applications, highlighting the need for methodical techniques and regular testing to protect these critical systems from threats.