After discovering strace, and while using it on itself I've came across the ptrace system call, and it's really amazing !

I've managed to implement a simple strace-like program using the ptrace syscall, which works on my 64bits laptop, this whole low-level stuff is pretty new to me, and this mini project was a cool introduction into it :).

Ptrace ? What is it ?

ptrace is a linux system call, it allows a process to spy another one: its memory, its registers, its execution flow... EVERYTHING !

As far as I know, ptrace is mainly used to build debugging tools like strace or even gdb, but its features make it a perfect tool to retro-engineer obscure binaries. I've also heard about a rootkit using it.

How does it works ?

Basically, ptrace allows a process (the tracer) to spy a tracee process only if the tracee permits it. To do so, the tracee must call ptrace itself (using the right arguments) to be "traceable".

Make a process "traceable"

So, imagine you want to spy ls, how do you make it call ptrace ? This is done in 2 steps:

  1. you make a simple programme that call ptrace()', with the PTRACE_TRACEME argument (both are defined in the sys/ptrace.h header file):

     ptrace(PTRACE_TRACEME, 0, NULL, NULL);
    
  2. In that same programme you call execlp(), which will replace the current process image by one from another binary file (in this case ls):

      execlp("/bin/ls", "ls", (char *) NULL);
    

TADA ! By running it, you both call ptrace and execute ls in the same process, making it "traceable".

But, managing two processes can be tricky (remember that I'm a newbie), to make it easier, let's launch the tracee process and the pracer in the same programme, simply by using the fork fonction:

#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <unistd.h>

int main(void) {
    // duplicate the current process
    pid_t pid = fork();
    if (pid == 0) {
        // This part is only executed by the child process
        // in this case: the TRACEE
        
        // Allow the parent process to trace it
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);

        // replace this process image by "/bin/ls"
        execlp("/bin/ls", "ls", (char *) NULL);

    } else {
        // Executed by the TRACER

        int status;
        // wait for child notification
        wait(&status);

        // HERE We can start to spy the child process
        printf("Ready to spy process %d\n", pid); 
    }
    return 0;
}

Let's run it:

groogroot@laptop $ gcc main.c
groogroot@laptop $ ls
a.out main.c
groogroot@laptop $ ./a.out
Ready to spy process 6146
a.out main.c

YOOhOO! a.out successfully executed ls, now we can get to the fun part: spying it !

Spying a process

What we need to know about syscalls :

To properly use ptrace, we need to roughly understand the process of making a system call, here is the basic stuff we need to know to get started :

On a x86_64 CPU, calling a syscall is made by the asm instruction syscall (no shit ! Oo), when this instruction is reached, the kernel stops the process and reads its registers to figure out what action the process requested.
Basically, the kernel reads the rax register which must contain an ID representing a syscall (SYS_open, SYS_read ...).

Then, depending of the syscall called, the kernel reads some other registers which act as function arguments. A kind person summarized what the kernel expects inside each registers for every syscall in this handy table.

Finally, after performing the requested syscall, the kernel sets some registers values (which act as return values), and let the process continue its execution.

Some ptrace features:

ptrace provide several ways to spy a process, here we will use only 3 of them:

  • ptrace(PTRACE_SYSCALL, pid, NULL, NULL): make the pid process stop when it reaches or exits a system call, and then send a signal to the tracer, the tracee execution will continue only when the tracer calls ptrace(PTRACE_CONT, pid, NULL, NULL), but in fact PTRACE_SYSCALL calls PTRACE_CONT itself.
  • ptrace(PTRACE_GETREGS, pid, NULL, &regs): copy a snapsot of the CPU's registers running the tracee (when it stopped), into a user_regs_struct structure (defined in sys/user.h).
  • ptrace(PTRACE_PEEKTEXT, pid, addr, NULL): return a WORD (8 bytes on a x86_64 machine) read from the address addr, from the memory of the process pid (the tracee).

Let's put it all together:

For the sake of simplicity, I've only made a "strace-like" programme that logs open and write syscall.
The main algorithm is in fact pretty easy:

  1. call ptrace(PTRACE_SYSCALL, pid, NULL, NULL) and wait for a signal from the tracee, (which means it reached or exited a syscall).
  2. call ptrace(PTRACE_GETREGS, pid, NULL, NULL) to get the value of the registers rax:
  • If rax == SYS_open, the tracee tries to open a file
    • read the string containing the filename, (its starting address is stored inside the register rsi), using ptrace(PTRACE_PEEKTEXT, pid, addr, NULL)
    • print it
  • If rax == SYS_write, the tracee tries to write into a file:
    • read the string to-be-written by the tracee, which is stored at the address pointed by the register rdi, using ptrace(PTRACE_PEEKTEXT, pid, addr, NULL)
    • print it
  1. go back to the step 1

Finally, the whole implementation looks like that :

#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <sys/reg.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#define MAX_LEN 1000

/**
 * Copy data from "addr" (from the process "pid") to "buff"
 */
int read_addr_into_buff(const pid_t pid, const unsigned long long addr, char * buff, unsigned int buff_size){
    unsigned int bytes_read = 0;
    long * read_addr = (long *) addr;
    long * copy_addr = (long *) buff;
    unsigned long ret;
    memset(buff, '\0', buff_size);
    do { 
        // this loop reads a string, word by word (8 bytes) 
        ret = ptrace(PTRACE_PEEKTEXT, pid, (read_addr++), NULL);
        *(copy_addr++) = ret;
        bytes_read += sizeof(long);
    } while(ret && bytes_read < (buff_size - sizeof(long))); 
    return bytes_read;
}

int main(int argc, char* argv[]){
    if (argc < 2) {
        fprintf(stderr, "Missing arguments:\n\t%s <binary> [binary args]\n", argv[0]);
        return EXIT_FAILURE;
    }

    pid_t pid = fork();
    if (pid == 0) {
        // launch child process
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        execvp(argv[1], &argv[1]);
    } else {
        char str[MAX_LEN];
        int status;
        int entry_flag = 1;  // flag to distinguish before/after syscall signals
        struct user_regs_struct regs; // struct representing CPU registers

        // loop on signal produced by child process
        while (1) {
            // wait for child notification
            wait(&status);
            // quit if child terminated
            if(WIFEXITED(status))
                break;

            // spy registers
            ptrace(PTRACE_GETREGS, pid, NULL, &regs);

            // orig_rax contains the syscall identifier
            switch (regs.orig_rax) {
                case SYS_write:
                    if (entry_flag) {
                        // read string at the address stored in the rsi register
                        read_addr_into_buff(pid, regs.rsi, str, MAX_LEN);
                        fprintf(stderr, "WRITE: %s\n", str);
                    }
                    entry_flag = !entry_flag;
                    break;

                case SYS_open:
                    if (entry_flag) {
                        // read string at the address stored in the rdi register
                        read_addr_into_buff(pid, regs.rdi, str, MAX_LEN);
                        fprintf(stderr, "OPEN: %s\n",  str);
                    }
                    entry_flag = !entry_flag;
                    break;

                default:
                    entry_flag = 1;
                    break;
            }

            // Continue child execution, and:
            // - raise a signal when it reaches a syscall,
            // - raise another signal after the syscall execution,
            ptrace(PTRACE_SYSCALL, pid, NULL, NULL);

        }
    }
    return EXIT_SUCCESS;
}

Note:

  • The wait() function sets an integer value which represents the status on the child process, if it terminated, we exit the while loop.
  • The function read_addr_into_buff() loop on readable words (8 bytes) starting from addr.
  • To mimic the strace software, the tracer writes into stderr.
  • To make that piece of code more usefull, I've replaced the hardcoded "/bin/ls" path by the main arguments.
  • Because PTRACE_SYSCALL stops the tracee before and after reaching a syscall, we need to keep track of it, in order to distinguish the entry and the exit to/from a syscall, this is done through entry_flag .

Let's run it !

groogroot@laptop $ gcc main.c -o strace
groogroot@laptop $ ./strace ls
OPEN: /etc/ld.so.cache
OPEN: /lib/x86_64-linux-gnu/libselinux.so.1
OPEN: /lib/x86_64-linux-gnu/libc.so.6
OPEN: /lib/x86_64-linux-gnu/libpcre.so.3
OPEN: /lib/x86_64-linux-gnu/libdl.so.2
OPEN: /lib/x86_64-linux-gnu/libpthread.so.0
OPEN: /proc/filesystems
OPEN: /usr/lib/locale/locale-archive
OPEN: .
WRITE: main.c strace

main.c strace

YOUHOU ! It Works !

Conclusion

Yeah ! I've learnt a lot !
I had no idea it could be so easy to inspect processes, the kernel does all the work for us !

This whole thing was new to me, and also, I'm far from beeing an experienced C programmer, so, if your eyes are bleeding right now, please tell me why by commenting this code snippet. :)

Source

The ptrace man page is soo big that it became confusing, those articles helped me a lot:

  • filippo.io: which lists all the registers needed for each syscall on a 64 bits CPU,
  • linuxjournal: which explained how to use ptrace on a 32 bits CPU.
  • Nov. 8 2016, 10:30 pm