[Gllug] Single instance of a process - simple way

Nix nix at esperi.org.uk
Thu Jul 5 20:05:57 UTC 2007


On 5 Jul 2007, Richard Jones told this:
> After about half a day of investigation, I finally worked out what was
> going on.  sshd was writing its PID into a file under /var/run, and
> using that file as a lock file too, to make sure another instance of
> sshd wasn't running.  What it basically did was to look for an
> existing /var/run/sshd.pid (or whatever it was called - I don't
> remember exactly), it would read out the PID from the file first, then
> see if that process existed.

Owch. I thought it might be the lovely misfeature whereby it tries to
grab its well-known port, only something else has already grabbed it
because it's also usable as a dynamic privileged port: but of course by
default that problem only exists for well-known ports above 512. (I had
this problem persistently with statd on a virtual machine with no
entropy until some time post-boot: in the end I jammed some fixed junk
into the entropy pool before any network daemons ran, not as actual
entropy but merely to perturb the port assignments so that statd could
run again.)

> At boot time, that process _did_ exist -- itself, and because the boot
> sequence was predictable, it had exactly the same PID each time.
> Except if you edited the boot sequence at all which would change its
> PID.  When the machine shut down, it didn't remove the PID file, and
> the next time it booted, it'd find itself running with the same PID.

Hey, pidfiles are bad enough, but just today I had cause to rewrite the
signal-handling code on some hard-realtime financial stuff. The hard
realtime constraints aren't exactly harsh but they're immovable: this is
code which must *must* respond to certain events within three seconds or
swingeing fines result, and it goes to some lengths to do so (avoiding
databases, blocking I/O, and disk access entirely, for instance: all
file I/O is to a (nonswappable) ramfs and it's mlock()ed into memory).

Except that it has to send signals to processes it communicates with
(mostly via nonblocking pipes and network sockets: signal sending is
theoretically blocky but not in practice, at least not often enough to
have been a problem to date), and it finds their PIDs by...

... popen()ing `ps -ef' and grepping the result.

Way to go for high-performance nonblocking I/O, guys. All that effort
and then you blow it in one ten-line routine. I can't imagine the
mindset that led to that (and yes, the whole thing was written by
the same person at the same time: this wasn't and-then-a-maintenance-
programmer-cocked-it-up stuff).

A short time later and the PIDs of the cooperating processes were being
stored in pidfiles in that ramfs. Throughput shot right up.

-- 
`... in the sense that dragons logically follow evolution so they would
 be able to wield metal.' --- Kenneth Eng's colourless green ideas sleep
 furiously
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list