After discovering strace, and while using it on itself I've came across the ptrace
system call,
and it's really amazing !
I've managed to implement a simple strace-like program using the ptrace
syscall, which works on my 64bits laptop,
this whole low-level stuff is pretty new to me, and this mini project was a cool introduction into it :).
ptrace
is a linux system call, it allows a process to spy another one: its memory, its registers, its execution flow... EVERYTHING !
As far as I know, ptrace
is mainly used to build debugging tools like strace or even gdb, but its features make it a perfect tool
to retro-engineer obscure binaries. I've also heard about a rootkit using it.
Basically, ptrace
allows a process (the tracer) to spy a tracee process only if the tracee permits it.
To do so, the tracee must call ptrace
itself (using the right arguments) to be "traceable".
So, imagine you want to spy ls
, how do you make it call ptrace
?
This is done in 2 steps:
you make a simple programme that call ptrace()'
, with the PTRACE_TRACEME argument (both are defined in the sys/ptrace.h header file):
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
In that same programme you call execlp()
, which will replace the current process image by one from another binary file (in this case ls
):
execlp("/bin/ls", "ls", (char *) NULL);
TADA ! By running it, you both call ptrace and execute ls
in the same process, making it "traceable".
But, managing two processes can be tricky (remember that I'm a newbie), to make it easier, let's launch the tracee process and the pracer in the same programme, simply by using the fork fonction:
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <unistd.h>
int main(void) {
// duplicate the current process
pid_t pid = fork();
if (pid == 0) {
// This part is only executed by the child process
// in this case: the TRACEE
// Allow the parent process to trace it
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
// replace this process image by "/bin/ls"
execlp("/bin/ls", "ls", (char *) NULL);
} else {
// Executed by the TRACER
int status;
// wait for child notification
wait(&status);
// HERE We can start to spy the child process
printf("Ready to spy process %d\n", pid);
}
return 0;
}
Let's run it:
groogroot@laptop $ gcc main.c
groogroot@laptop $ ls
a.out main.c
groogroot@laptop $ ./a.out
Ready to spy process 6146
a.out main.c
YOOhOO! a.out
successfully executed ls
, now we can get to the fun part: spying it !
To properly use ptrace, we need to roughly understand the process of making a system call, here is the basic stuff we need to know to get started :
On a x86_64 CPU, calling a syscall is made by the asm instruction syscall
(no shit ! Oo), when this instruction is reached, the kernel stops the process
and reads its registers to figure out what action the process requested.
Basically, the kernel reads the rax register which must contain an ID representing a syscall (SYS_open, SYS_read ...).
Then, depending of the syscall called, the kernel reads some other registers which act as function arguments. A kind person summarized what the kernel expects inside each registers for every syscall in this handy table.
Finally, after performing the requested syscall, the kernel sets some registers values (which act as return values), and let the process continue its execution.
ptrace
provide several ways to spy a process, here we will use only 3 of them:
ptrace(PTRACE_SYSCALL, pid, NULL, NULL)
: make the pid process stop when it reaches or exits a system call, and then send a signal to the tracer, the tracee execution will continue only when the tracer calls ptrace(PTRACE_CONT, pid, NULL, NULL)
, but in fact PTRACE_SYSCALL calls PTRACE_CONT itself.ptrace(PTRACE_GETREGS, pid, NULL, ®s)
: copy a snapsot of the CPU's registers running the tracee (when it stopped), into a user_regs_struct structure (defined in sys/user.h).ptrace(PTRACE_PEEKTEXT, pid, addr, NULL)
: return a WORD (8 bytes on a x86_64 machine) read from the address addr, from the memory of the process pid (the tracee).For the sake of simplicity, I've only made a "strace-like" programme that logs open and write syscall.
The main algorithm is in fact pretty easy:
ptrace(PTRACE_SYSCALL, pid, NULL, NULL)
and wait for a signal from the tracee, (which means it reached or exited a syscall).ptrace(PTRACE_GETREGS, pid, NULL, NULL)
to get the value of the registers rax:ptrace(PTRACE_PEEKTEXT, pid, addr, NULL)
ptrace(PTRACE_PEEKTEXT, pid, addr, NULL)
Finally, the whole implementation looks like that :
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <sys/reg.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#define MAX_LEN 1000
/**
* Copy data from "addr" (from the process "pid") to "buff"
*/
int read_addr_into_buff(const pid_t pid, const unsigned long long addr, char * buff, unsigned int buff_size){
unsigned int bytes_read = 0;
long * read_addr = (long *) addr;
long * copy_addr = (long *) buff;
unsigned long ret;
memset(buff, '\0', buff_size);
do {
// this loop reads a string, word by word (8 bytes)
ret = ptrace(PTRACE_PEEKTEXT, pid, (read_addr++), NULL);
*(copy_addr++) = ret;
bytes_read += sizeof(long);
} while(ret && bytes_read < (buff_size - sizeof(long)));
return bytes_read;
}
int main(int argc, char* argv[]){
if (argc < 2) {
fprintf(stderr, "Missing arguments:\n\t%s <binary> [binary args]\n", argv[0]);
return EXIT_FAILURE;
}
pid_t pid = fork();
if (pid == 0) {
// launch child process
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
execvp(argv[1], &argv[1]);
} else {
char str[MAX_LEN];
int status;
int entry_flag = 1; // flag to distinguish before/after syscall signals
struct user_regs_struct regs; // struct representing CPU registers
// loop on signal produced by child process
while (1) {
// wait for child notification
wait(&status);
// quit if child terminated
if(WIFEXITED(status))
break;
// spy registers
ptrace(PTRACE_GETREGS, pid, NULL, ®s);
// orig_rax contains the syscall identifier
switch (regs.orig_rax) {
case SYS_write:
if (entry_flag) {
// read string at the address stored in the rsi register
read_addr_into_buff(pid, regs.rsi, str, MAX_LEN);
fprintf(stderr, "WRITE: %s\n", str);
}
entry_flag = !entry_flag;
break;
case SYS_open:
if (entry_flag) {
// read string at the address stored in the rdi register
read_addr_into_buff(pid, regs.rdi, str, MAX_LEN);
fprintf(stderr, "OPEN: %s\n", str);
}
entry_flag = !entry_flag;
break;
default:
entry_flag = 1;
break;
}
// Continue child execution, and:
// - raise a signal when it reaches a syscall,
// - raise another signal after the syscall execution,
ptrace(PTRACE_SYSCALL, pid, NULL, NULL);
}
}
return EXIT_SUCCESS;
}
wait()
function sets an integer value which represents the status on the child process, if it terminated, we exit the while loop.read_addr_into_buff()
loop on readable words (8 bytes) starting from addr
.strace
software, the tracer writes into stderr.groogroot@laptop $ gcc main.c -o strace
groogroot@laptop $ ./strace ls
OPEN: /etc/ld.so.cache
OPEN: /lib/x86_64-linux-gnu/libselinux.so.1
OPEN: /lib/x86_64-linux-gnu/libc.so.6
OPEN: /lib/x86_64-linux-gnu/libpcre.so.3
OPEN: /lib/x86_64-linux-gnu/libdl.so.2
OPEN: /lib/x86_64-linux-gnu/libpthread.so.0
OPEN: /proc/filesystems
OPEN: /usr/lib/locale/locale-archive
OPEN: .
WRITE: main.c stracemain.c strace
YOUHOU ! It Works !
Yeah ! I've learnt a lot !
I had no idea it could be so easy to inspect processes, the kernel does all the work for us !
This whole thing was new to me, and also, I'm far from beeing an experienced C programmer, so, if your eyes are bleeding right now, please tell me why by commenting this code snippet. :)
The ptrace man page is soo big that it became confusing, those articles helped me a lot: