[dundee] analysing server failure
Andrew Clayton
dundee at lists.lug.org.uk
Thu Jul 31 20:19:05 2003
On Thu, 2003-07-31 at 12:26, David R. Baird wrote:
> The server had only been up for 10 days. Before that, for 2 or 3
> months with no problem.
Ok... what kernel version are you running?.
>
> I rebooted on the 19th July because a status reporting daemon I
> had installed (uses mrtg to generate nice graphs that are made
> available on a web page) had stopped updating its images, and I
> thought a reboot might restart it (didn't). But that thing
> stopped working late June, so can't be related to the crash.
>
> I have sar, and I've attached a file with some of its reports
> (options -r, -u and -W), for the 28th (up to the crash), and for
> this morning for comparison. I don't see anything suspicious, but
> I have very little idea what to look for. Basically all I'm
> trying to see is whether the machine started swapping memory
> pages, and as far as I can tell it hadn't.
>
This in itself isn't really a problem. Obviously for performance you
never want to have to swap... but generally it is a fact of life and is
a complex area of the VM subsystem..
> The systat cron jobs are there.
>
Good.
> grep -i oom /var/log/messages
> grep -i oops /var/log/messages
>
> both return nothing.
>
OK.
> top I had used before, but I'll start monitoring it and free and
> vmstat routinely now.
>
> The machine is used to serve half a dozen web pages, only 1 of
> which gets any traffic at all. The only other things it does are
> email, and the mysql server talks to my development server.
>
>
> free says
>
> $ free
> total used free shared buffers
> cached
> Mem: 247828 242384 5444 0 27252
> 106408
> -/+ buffers/cache: 108724 139104
> Swap: 787176 27532 759644
>
Ok.. so You have 256MB physical RAM in your machine.. and about 768MB of
swap.
> Should I be concerned that swap space is being used? Am I right
No.
>
> that swap space can be used and then not freed for quite a while,
> so used swap space reports don't really tell you much about data
> being swapped from disk to memory?
>
Not exactly...
>From your above free you see that it appears you are using almost all
your phys ram with only 5MB free.
You are using about 25MB for buffers and about 100MB for cache. This
memory can be quickly reclaimed if needed for programs.
So the +/- line gives a better indication of memory usage.
2.4 is more "swap happy" that 2.0 and 2.6 is I believe even more so, but
that can be tuned.
Early 2.4 had the "crazy" requirement of having swap be at least 2 x the
size of phys ram. As at any time the entire contents of phys ram, could
be backed by swap. Thankfully this "feature" was removed around .2.4.10
Another area where your ram can "disappear" is in the slab caches
/proc/slabinfo
Linux will happily use ram for buffers and cache and swap out idle
programs to swap... so do not be alarmed at all that you seem to be
using swap more than you may have thought you needed to.
> Thanks for any help!
>
> dave.
>
>
>
> ______________________________________________________________________
>
> $ sar -r -f sa28
> 07:30:00 PM kbmemfree kbmemused %memused kbmemshrd kbbuffers kbcached kbswpfree kbswpused %swpused
> 07:40:00 PM 26628 221200 89.26 0 30272 71040 752036 35140 4.46
> 07:50:00 PM 26544 221284 89.29 0 30488 70908 752036 35140 4.46
> 08:00:00 PM 26324 221504 89.38 0 30736 70768 752036 35140 4.46
> 08:10:00 PM 26212 221616 89.42 0 31092 70524 752036 35140 4.46
> 08:20:00 PM 26112 221716 89.46 0 31328 70388 752036 35140 4.46
> 08:30:00 PM 26012 221816 89.50 0 31580 70240 752036 35140 4.46
> 08:40:00 PM 25940 221888 89.53 0 31888 70004 752036 35140 4.46
> 08:50:00 PM 25724 222104 89.62 0 32220 69880 752036 35140 4.46
> 09:00:00 PM 25596 222232 89.67 0 32308 69920 752036 35140 4.46
> 09:10:00 PM 25568 222260 89.68 0 32368 69888 752036 35140 4.46
> Average: 12718 235110 94.87 0 27800 97127 753577 33599 4.27
>
> $ sar -r
> 06:30:00 AM kbmemfree kbmemused %memused kbmemshrd kbbuffers kbcached kbswpfree kbswpused %swpused
> 06:40:00 AM 8400 239428 96.61 0 13932 133388 759636 27540 3.50
> 06:50:00 AM 8252 239576 96.67 0 14044 133420 759636 27540 3.50
> 07:00:00 AM 7892 239936 96.82 0 14100 133472 759636 27540 3.50
> 07:10:00 AM 7648 240180 96.91 0 14336 133476 759636 27540 3.50
> 07:20:00 AM 7532 240296 96.96 0 14440 133484 759636 27540 3.50
> 07:30:00 AM 7140 240688 97.12 0 14664 133600 759636 27540 3.50
> 07:40:00 AM 8144 239684 96.71 0 14940 131456 759636 27540 3.50
> 07:50:00 AM 7616 240212 96.93 0 15160 131552 759636 27540 3.50
> 08:00:00 AM 6776 241052 97.27 0 15360 131632 759636 27540 3.50
> 08:10:01 AM 5712 242116 97.70 0 15704 131724 759636 27540 3.50
> 08:20:00 AM 23356 224472 90.58 0 14928 113056 759636 27540 3.50
> 08:30:00 AM 22900 224928 90.76 0 15164 113148 759636 27540 3.50
> 08:40:00 AM 22240 225588 91.03 0 15396 113256 759636 27540 3.50
> 08:50:00 AM 21528 226300 91.31 0 15596 113336 759636 27540 3.50
> 09:00:00 AM 20604 227224 91.69 0 15756 113416 759636 27540 3.50
> 09:10:00 AM 20040 227788 91.91 0 16064 113504 759636 27540 3.50
> 09:20:00 AM 19584 228244 92.10 0 16328 113612 759636 27540 3.50
> 09:30:00 AM 18400 229428 92.58 0 16804 114280 759636 27540 3.50
> 09:40:00 AM 17356 230472 93.00 0 17472 114504 759636 27540 3.50
> 09:50:00 AM 15704 232124 93.66 0 18408 114772 759636 27540 3.50
> 10:00:00 AM 14864 232964 94.00 0 19016 114840 759636 27540 3.50
> 10:10:00 AM 8436 239392 96.60 0 19992 115312 759636 27540 3.50
> 10:20:00 AM 8192 239636 96.69 0 20488 115372 759636 27540 3.50
> 10:30:00 AM 6828 241000 97.24 0 21028 115472 759636 27540 3.50
> 10:40:00 AM 7340 240488 97.04 0 21476 113508 759636 27540 3.50
> 10:50:00 AM 5672 242156 97.71 0 21908 113752 759636 27540 3.50
> 11:00:00 AM 5152 242676 97.92 0 22408 113556 759636 27540 3.50
> 11:10:00 AM 4036 243792 98.37 0 22456 114124 759636 27540 3.50
> 11:20:00 AM 3788 244040 98.47 0 22772 114520 759636 27540 3.50
> 11:30:00 AM 4688 243140 98.11 0 23552 112784 759636 27540 3.50
> 11:40:00 AM 8100 239728 96.73 0 24496 108312 759636 27540 3.50
> 11:50:00 AM 7240 240588 97.08 0 25308 108436 759636 27540 3.50
> Average: 18180 229648 92.66 0 21606 108279 763857 23319 2.96
>
> $ sar -u -f sa28
> 07:30:00 PM CPU %user %nice %system %idle
> 07:40:00 PM all 0.40 0.00 0.45 99.15
> 07:50:00 PM all 0.20 0.00 0.27 99.53
> 08:00:00 PM all 0.19 0.00 0.28 99.53
> 08:10:00 PM all 0.26 0.00 0.40 99.34
> 08:20:00 PM all 0.19 0.01 0.27 99.53
> 08:30:00 PM all 0.17 0.02 0.32 99.49
> 08:40:00 PM all 0.30 0.02 0.36 99.32
> 08:50:00 PM all 0.32 0.05 0.39 99.24
> 09:00:00 PM all 0.18 0.02 0.22 99.59
> 09:10:00 PM all 0.17 0.00 0.23 99.60
> Average: all 0.39 0.01 0.54 99.07
>
> $ sar -u
> 06:30:00 AM CPU %user %nice %system %idle
> 06:40:00 AM all 0.26 0.00 0.23 99.51
> 06:50:00 AM all 0.10 0.00 0.20 99.71
> 07:00:00 AM all 0.12 0.00 0.18 99.70
> 07:10:00 AM all 0.16 0.00 0.22 99.62
> 07:20:00 AM all 0.12 0.00 0.19 99.70
> 07:30:00 AM all 0.21 0.00 0.21 99.58
> 07:40:00 AM all 0.59 0.00 0.32 99.09
> 07:50:00 AM all 0.25 0.00 0.23 99.52
> 08:00:00 AM all 0.12 0.00 0.20 99.68
> 08:10:01 AM all 0.42 0.00 0.28 99.30
> 08:20:00 AM all 0.55 0.00 0.39 99.06
> 08:30:00 AM all 0.25 0.00 0.29 99.46
> 08:40:00 AM all 0.39 0.00 0.26 99.34
> 08:50:00 AM all 0.24 0.00 0.21 99.55
> 09:00:00 AM all 0.33 0.00 0.20 99.47
> 09:10:00 AM all 0.25 0.00 0.27 99.47
> 09:20:00 AM all 0.21 0.00 0.23 99.56
> 09:30:00 AM all 0.16 0.00 0.28 99.55
> 09:40:00 AM all 0.42 0.00 0.38 99.20
> 09:50:00 AM all 0.33 0.02 0.36 99.29
> 10:00:00 AM all 0.22 0.01 0.26 99.51
> 10:10:00 AM all 0.54 0.01 0.42 99.03
> 10:20:00 AM all 0.23 0.00 0.31 99.46
> 10:30:00 AM all 0.24 0.02 0.33 99.40
> 10:40:00 AM all 0.33 0.02 0.34 99.30
> 10:50:00 AM all 0.24 0.00 0.30 99.46
> 11:00:00 AM all 0.36 0.00 0.37 99.27
> 11:10:00 AM all 0.30 0.00 0.38 99.32
> 11:20:00 AM all 0.30 0.01 0.45 99.24
> 11:30:00 AM all 0.31 0.00 0.51 99.18
> 11:40:00 AM all 0.58 0.00 0.58 98.85
> 11:50:00 AM all 0.36 0.00 0.48 99.16
> 12:00:00 PM all 0.35 0.00 0.68 98.97
> Average: all 0.33 0.00 0.52 99.14
>
> $ sar -W -f sa28
> 07:30:00 PM pswpin/s pswpout/s
> 07:40:00 PM 0.00 0.00
> 07:50:00 PM 0.00 0.00
> 08:00:00 PM 0.00 0.00
> 08:10:00 PM 0.00 0.00
> 08:20:00 PM 0.00 0.00
> 08:30:00 PM 0.00 0.00
> 08:40:00 PM 0.00 0.00
> 08:50:00 PM 0.00 0.00
> 09:00:00 PM 0.00 0.00
> 09:10:00 PM 0.00 0.00
> Average: 0.00 0.03
>
> $ sar -W
> 06:30:00 AM pswpin/s pswpout/s
> 06:40:00 AM 0.00 0.00
> 06:50:00 AM 0.00 0.00
> 07:00:00 AM 0.00 0.00
> 07:10:00 AM 0.00 0.00
> 07:20:00 AM 0.00 0.00
> 07:30:00 AM 0.00 0.00
> 07:40:00 AM 0.00 0.00
> 07:50:00 AM 0.00 0.00
> 08:00:00 AM 0.00 0.00
> 08:10:01 AM 0.00 0.00
> 08:20:00 AM 0.00 0.00
> 08:30:00 AM 0.00 0.00
> 08:40:00 AM 0.00 0.00
> 08:50:00 AM 0.00 0.00
> 09:00:00 AM 0.00 0.00
> 09:10:00 AM 0.00 0.00
> 09:20:00 AM 0.01 0.00
> 09:30:00 AM 0.02 0.00
> 09:40:00 AM 0.00 0.00
> 09:50:00 AM 0.00 0.00
> 10:00:00 AM 0.00 0.00
> 10:10:00 AM 0.01 0.00
> 10:20:00 AM 0.00 0.00
> 10:30:00 AM 0.00 0.00
> 10:40:00 AM 0.00 0.00
> 10:50:00 AM 0.00 0.00
> 11:00:00 AM 0.00 0.00
> 11:10:00 AM 0.00 0.00
> 11:20:00 AM 0.00 0.00
> 11:30:00 AM 0.00 0.00
> 11:40:00 AM 0.00 0.00
> 11:50:00 AM 0.00 0.00
> 12:00:00 PM 0.02 0.00
> Average: 0.01 0.07
None of these reports seem to show anything "bad".
Perhaps some hardware stressing may be in order.
Compiling something like a kernel or php repeatedly.
For the kernel, something like this
Get the kernel src and put it in /tmp
then cd into the kernel src directory and
cp arch/i386/defconfig .config
then
make oldconfig
answer y or n to any questions asked, it doesn't really matter which.
then
while true; do make dep && make bzImage && make modules && make clean;
done
You could setup a couple of these going from separate directories. Or do
something similar with any largish src tree... php also works well for
this.
If you want to test your ram, then look at www.memtest86.com
--
Andrew