Total Pageviews

Saturday, April 14, 2012

Zombie process

When you run the ps command, those processes (if any) that have a status of Z are called "zombies".

Naturally, when people see a zombie process, the first thing that many try to do is to kill the zombie, using kill or (horrors!) kill -9. This won't work, however: you can't kill a zombie, it's already dead ;-)

When a process has been terminated ("died") by recieving the TERM signal to do so, it needs to stick around for a bit to finish up a few last tasks. These include closing open files and shutting down any allocated resources (memory, swap space, that sort of thing). These "housekeeping" tasks are supposed to happen very quickly. Once they're completed, the final thing that a process has to do before dying is to report its exit status to its parent. This is generally where things go wrong.

Each process is assigned a unique Process ID (PID). Each process also has an associated parent process ID (PPID), which identifies the process that spawned it (or a PPID of 1, meaning that the process has been inherited by the init process, if the parent has already terminated). While the parent is still running, it remembers the PIDs of all the children it has spawned. These PIDs can not be re-used by other (new) processes until the parent knows for certain that the child process is done.

When a child terminates and has completed its housekeeping tasks, it sends a one-byte status code to its parent (thus we have 256 possible exit codes, zero being the usual indicator of "everything went great"). If this status code never gets sent, the PID is kept alive (in "zombie" status) in order to reserve its PID number ... the parent is waiting for the status code, and until it gets it, it doesn't want any new processes to try and reuse that PID number for themselves, just in case.

To get rid of a zombie, you should try killing its parent (PPID), which will temporarily orphan the zombie. The init process will then inherent the zombie, and this might allow the process to finish terminating since the init process is always in a wait() state (i.e., ready to receive exit status reports from its children).

Generally, though, zombies clean themselves up. Whatever it is that the process was waiting for eventually occurs and the process can report its exit status to its parent and all is well.

If a zombie is already owned by init, though, and it's still sticking around (like the undead are wont to do ;-) ), then the process is almost certainly stuck in a device driver close routine, and will likely remain that way forever. You can reboot to clear out the zombies, but fixing the device driver is the only permanent solution. Killing the parent (init in this case) is definitely not recommended, since init is an extremely important process to keeping your system running.

No comments:

Post a Comment