[Gllug] Single instance of a process - simple way
Richard Jones
rich at annexia.org
Thu Jul 5 18:33:17 UTC 2007
On Thu, Jul 05, 2007 at 07:01:29PM +0100, Ziya Suzen wrote:
> I was going to ask if there was a simple shell lock utility where I
> can use to ensure single instance of a process is run.. but I found
> one as I was typing the mail:
>
> "chiark-utils -> with-lock-ex: a simple tool for acquiring a lockfile
> before running another program or script."
>
> Anyway, question is changed a little: now I am wondering if you know
> any other utilities like with-lock-ex. Maybe one with retries and
> time-outs for example.
Perl, python and so on all have access to the POSIX locking API
(lockf(2) as mentioned by the other reply), so perhaps you can write a
trivial little wrapper around your script which grabs an exclusive
lock on a file under $HOME when the script runs, and releases it when
the script exits.
I'll share a war story with you about this, but first some general
observations. At my last company we had long running (but
unpredictable) jobs run from cron. They had all sorts of issues, like
sometimes the database would lock up, sometimes there'd be a problem
with the job itself, or the machine they were running on. So we did
use locking to ensure that only one job ran at a time, but that added
a layer of extra problems with locks not releasing properly. (It
didn't particularly help that $HOME in that case was mounted on NFS,
and locking on NFS is very shoddy). So occasionally we'd find out
that some job or other hadn't run properly for days -- oops.
War story (this was from years ago, and I'm sure the problem has been
fixed by now): I had a machine that try as I might, sshd (the SSH
daemon) would not start up on boot, or more accurately would not start
up on the second and subsequent boot. After booting however, you
could start sshd manually no problem. It was only when sshd was
started via the boot sequence that it would fail to start. Also, if I
modified the /etc/init.d/sshd script (eg. to strace the process to
find out where it was dying), it would start fine, but as soon as I
replaced the original script, it would no longer start on the second
and subsequent boots!
No messages anywhere, of course ...
After about half a day of investigation, I finally worked out what was
going on. sshd was writing its PID into a file under /var/run, and
using that file as a lock file too, to make sure another instance of
sshd wasn't running. What it basically did was to look for an
existing /var/run/sshd.pid (or whatever it was called - I don't
remember exactly), it would read out the PID from the file first, then
see if that process existed.
At boot time, that process _did_ exist -- itself, and because the boot
sequence was predictable, it had exactly the same PID each time.
Except if you edited the boot sequence at all which would change its
PID. When the machine shut down, it didn't remove the PID file, and
the next time it booted, it'd find itself running with the same PID.
Rich.
--
Richard Jones
Red Hat
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list