[Gllug] Disk wait processes, load averages, and {send,fetch,proc}mail
Tethys
tet at accucard.com
Thu Sep 26 13:53:33 UTC 2002
I have a problem. Our main file and print server has a number of process
hung in a disk wait (uninterruptible sleep) state:
root 22716 0.0 0.0 0 0 ? DW Sep17 0:00 [lockd]
root 23069 0.0 0.0 1968 720 ? D Sep17 0:00 mount accuhost01:
root 23499 0.0 0.0 1972 724 ? D Sep17 0:00 mount /accucard/d
root 24426 0.0 0.0 1972 724 ? D Sep17 0:00 mount /accucard/d
root 25402 0.0 0.0 0 0 ? DW Sep17 0:00 [lockd]
root 25433 0.0 0.0 1972 724 ? D Sep17 0:00 mount /accucard/d
root 25647 0.0 0.0 0 0 ? DW Sep17 0:00 [lockd]
root 26055 0.0 0.0 1972 724 ? D Sep17 0:00 mount /accucard/d
root 26612 0.0 0.0 1968 720 ? D Sep17 0:00 mount accuhost01:
root 32215 0.0 0.0 0 0 ? DW Sep17 0:00 [lockd]
root 13787 0.0 0.0 1972 724 ? D Sep18 0:00 mount /accucard/d
root 18714 0.0 0.0 0 0 ? DW Sep25 0:00 [lockd]
root 18911 0.0 0.0 0 0 ? DW Sep25 0:00 [lockd]
root 19022 0.0 0.0 1972 908 ? D Sep25 0:00 mount -t nfs -a
root 19282 0.0 0.0 0 0 ? DW Sep25 0:00 [lockd]
No, I don't know why they're hanging. But once they're in that state,
I know of nothing short of a reboot that can clear them. Because it's
our main file and print server, rebooting is a politically non-viable
solution at the moment.
So until we get a suitably quiet time, we're stuck with them.
Processes in disk wait state aren't in themselves a problem.
However, because they're in the run queue, they count towards
the load average. So even though the actual load on the box is
minimal, the load average is hovering around the 15.2 mark.
Unfortunately, this adversely affects sendmail[1], which stops
accepting connections when the load average reaches a certain
threshold:
1247 ? S 0:00 sendmail: rejecting connections on daemon MTA: load average: 15
Now I've tried to configure this with the RefuseLA option. But for some
reason, it isn't working. I've also changed the QueueLA and QueueFactor
parameters, and those definitely *are* now working (verified with the
-d3.30 debugging option).
Furthermore, because sendmail isn't accepting connections, I get:
fetchmail: SMTP connect to localhost failed
fetchmail: can't raise the listener; falling back to /usr/bin/procmail -d %T
This works fine with the caveat that each message is being converted to
have DOS-style CR/LF line endings, rather than just the traditional Unix CR.
This is casuing problems for mh, which doesn't play well with the extra
characters.
So I guess I have several questions:
1. Should disk wait processes contributing the the load average be
considered a bug?
2. Is there anything I can do about them short of a reboot?
3. Is there any debugging option to sendmail that will show the
current RefuseLA threshold value?
4. Is there any way I can get sendmail to accept connections when
the load average is greater than 12 (either via RefuseLA or some
other method)?
5. Does anyone know why fetchmail/procmail is adding extra LF characters,
and what I can do to change it?
Thanks,
Tet
[1] Despite what the masses claim, the more I play with the never
versions of sendmail, the more I like it -- you'd have to
have a *very* convincing argument to convince me to switch
to exim/qmail/postfix/whatever)
--
Gllug mailing list - Gllug at linux.co.uk
http://list.ftech.net/mailman/listinfo/gllug
More information about the GLLUG
mailing list