Everything began when I've first saw a part of this cool talk about system programming, I've been so impressed by strace that I've started to track every syscall i could catch. Why ? because i did not kwow anything about linux kernel, and now, i know a tiny bit more. ( keep in mind that I'm still a newbie).
System calls (aka syscalls) are the only way for programs to talk to your linux kernel, you can see those as kernel API functions.
Basically, every time a program interacts with your computer hardware, it uses syscalls,
for instance when a python script writes into file, python doesn't actually write it itself, it asks the linux kernel to do it.
To do so, it uses the syscall function "write()", most likely by using one of the C system call wrapper defined in unistd.h, like:
ssize_t write(int fd, const void *buf, size_t count);
You can roughly understand any program behavior by tracking its syscall.
Most real world programs interact with the network or at least with the filsystem, processes that only compute things are becoming less and less common. By tracing theirs syscall, you see in real time what a program is actually doing, without checking any of its source code !
strace
is the answer, it can list every syscalls made by a process.
Let's try it on a simple program, cat
:
# Create an empty file
$ touch empty
# read it, as expected, it's empty
$ cat empty
# Now strace it !
$ strace cat empty
execve("/bin/cat", ["cat", "empty"], [/* 78 vars */]) = 0
[ ... ]
open("empty", O_RDONLY) = 3
[ ... ]
read(3, "", 65536) = 0
close(3) = 0
[ ... ]
+++ exited with 0 +++
OK, first of all, i don't understand most of the output, so I've hidden most it behind those [ ... ]
.
But you can clearly understand in this "filtered" output :
Yoohoo ! cat
reads files! Yeah... So far we've learn nothing.
Let's strace
ps
, I use it every day, and I still have no clue how it works
$ strace ps
The output is endless, but most of it is a repetition of something like this:
open("/proc/XXXX/status", O_RDONLY) = YY
read(YY, "Some string"..., 1024) = ZZZ
close(YY)
Let's think... YY is probably a file descriptor integer, ZZZ the number of bytes read by the read syscall, and XXXX kinda looks likes a PID, which makes sense with ps
...
So, what are those files : /proc/XXXX/status ?
$ cat /proc/3391/status
Name: firefox
State: S (sleeping)
Tgid: 3391
Ngid: 0
Pid: 3391
PPid: 2831
TracerPid: 0
Uid: 1066 1066 1066 1066
Gid: 65536 65536 65536 65536
[ ... ]
That looks like some stats about a firefox process,
$ pgrep firefox
3391
Yes ! it's the same number ! So, there is a directory /proc/PID/ for
at least some processes, that contains informations about them : and that's how ps
fetch those informations !
But what are those /proc/*
files ?
$ man proc
The proc filesystem is a pseudo-filesystem which provides an interface to kernel data structures. It is commonly mounted at /proc.
Cool ! The kernel directly provides us some informations about running processes, using the /proc/ filesystem , ps
just reads it.
This very same man page tells us some more useful stuff :
For instance, if you want to list every environment variables currently active in your shell session, you can now type :
$ strings /proc/self/environ
By using strace, you can learn stuff about your kernel, and even debug programs ! It's the perfect tool for newbies like me.