[Gllug] Unkillable process

Nix nix at esperi.demon.co.uk
Sat Jun 29 16:01:42 UTC 2002


On Fri, 28 Jun 2002, t. clarke said:
> Nix wrote:-
> 
>>
>>On Wed, 26 Jun 2002, t. clarke muttered drunkenly:
>>> This has long struck me as a failing of Unix/linux kernels - the inability
>>> to kill processes in certain states.  Maybe someone involved in kernel
>>> development should look at ways of killing processes from within the kernel
>>> by means of a system-call, rather than sending the process a signal ?
>>
>>The problem is, what happens if that process has kernel locks held?
>>Normally things are in uninterruptible sleep for precisely that reason,
>>and they've normally got locks held because they're doing things to
>>kernel data structures that other processes would be deeply confused by
>>(i.e., because they're enforcing atomiticity).
>>
>>So to safely kill a D-state process you'd need to snap its locks and
>>roll back all the changes to data structures it had made: that is, you'd
>>need a transaction management layer deep inside the kernel which all the
>>device drivers use.
>>
>>Think of the performance hit. :(
> 
> I was perfectly sober at the time ! - but maybe showing my ignorance.
> 
> Why would any sane operating system want to allow user processes to modify
> kernel structures? I though the general idea of robust multi-user OS's was
> to protect the kernel from users and users from each other !!  I have obviously
> missed something, 'cos I though that generally speaking programs just executed
> code and did system calls, leaving the kernel code to do whatever was requested.

Yes, but a *process* on a Unix system spends part of its time in
userspace and part in kernelspace: the kernel is more like a sort of
weird shared library than anything else. If a process is in D state,
it's in kernel mode, uninterruptible, and has asked the scheduler to
ignore all signals sent to it: and unlike userspace, the kernelspace
code is quite permitted to say `I cannot be interrupted at all, not even
by SIGKILL'.

So the process in question is currently inside the kernel :)

> Does an 'uninterruptable sleep' imply that a process has asked the kernel to
> do something which for some reason it (the kernel) can't complete ??

A process's userspace component (the thing we normally think of when we
say `the process') has asked the kernel to do something on behalf of it
for which the kernel has seen fit to mark the process as being
non-killable until the kernel finishes fiddling with its data structures
/ releases a lock / whatever.  And then the kernel's got stuck (probably
blocked on a lock left open by an earlier oops, or something like
that). So it never got around to saying `OK, this process can be
interrupted again', let alone to returning so that the process's
userspace component can actually *do* anything.

If a process is *ever* in user mode and in D state at the same time,
it's a truly massive bug: I've never seen it, and I doubt that in that
state the process would get any time from the scheduler (although I
haven't checked what the scheduler does in this case, or even if it
notices).

-- 
`What happened?'
                 `Nick shipped buggy code!'
                                             `Oh, no dinner for him...'


-- 
Gllug mailing list  -  Gllug at linux.co.uk
http://list.ftech.net/mailman/listinfo/gllug




More information about the GLLUG mailing list