[dundee] analysing server failure
David R. Baird
dundee at lists.lug.org.uk
Tue Jul 29 10:42:01 2003
Last night, sometime after 21:15, my web server (Redhat 7.2)
stopped serving web pages and stopped allowing ssh logins. Lots
of other things stopped as well - hourly logcheck emails, and a 5
minutely cron job that checks if the web server and other daemons
are running and restarts them and emails me if not. Unfortunately
I didn't discover this until 9am today! A hardware reset brought
the thing back up, but I'd like to find out what happened.
In fact, I have a reasonable suspicion that the problem was an
enormous log file created by mod_jk from the Apache server. It
was 125MB, last modified on Jul 26th. I need to re-configure the
server to not use that module.
What I'd like to know is if I've missed anywhere to look for
useful messages. I've checked all the log files in /var/log, but
all I can get is an estimate of when the thing stopped. There
don't seem to be any unusual things going on in the messages,
secure, cron, maillog, or httpd/error_log files.
Any suggestions?
d.
--
Dr. David R. Baird
ZeroFive Web Design
dave@zerofive.co.uk
+44 [0]1738 447780
http://www.zerofive.co.uk