[dundee] analysing server failure

Andrew Clayton dundee at lists.lug.org.uk
Thu Jul 31 20:19:05 2003


On Thu, 2003-07-31 at 12:26, David R. Baird wrote:
> The server had only been up for 10 days. Before that, for 2 or 3 
> months with no problem. 

Ok... what kernel version are you running?. 


> 
> I rebooted on the 19th July because a status reporting daemon I 
> had installed (uses mrtg to generate nice graphs that are made 
> available on a web page) had stopped updating its images, and I 
> thought a reboot might restart it (didn't). But that thing 
> stopped working late June, so can't be related to the crash. 
> 
> I have sar, and I've attached a file with some of its reports 
> (options -r, -u and -W), for the 28th (up to the crash), and for 
> this morning for comparison. I don't see anything suspicious, but 
> I have very little idea what to look for. Basically all I'm 
> trying to see is whether the machine started swapping memory 
> pages, and as far as I can tell it hadn't. 
> 

This in itself isn't really a problem. Obviously for performance you
never want to have to swap... but generally it is a fact of life and is
a complex area of the VM subsystem..


> The systat cron jobs are there. 
> 

Good.


> grep -i oom /var/log/messages
> grep -i oops /var/log/messages
> 
> both return nothing. 
> 

OK.


> top I had used before, but I'll start monitoring it and free and 
> vmstat routinely now. 
> 
> The machine is used to serve half a dozen web pages, only 1 of 
> which gets any traffic at all. The only other things it does are 
> email, and the mysql server talks to my development server. 
> 
> 
> free says 
> 
> $ free
>              total       used       free     shared    buffers     
> cached
> Mem:        247828     242384       5444          0      27252     
> 106408
> -/+ buffers/cache:     108724     139104
> Swap:       787176      27532     759644
> 

Ok.. so You have 256MB physical RAM in your machine.. and about 768MB of
swap.


> Should I be concerned that swap space is being used? Am I right

No.


>  
> that swap space can be used and then not freed for quite a while, 
> so used swap space reports don't really tell you much about data 
> being swapped from disk to memory? 
> 

Not exactly...

>From your above free you see that it appears you are using almost all
your phys ram with only 5MB free.

You are using about 25MB for buffers and about 100MB for cache. This
memory can be quickly reclaimed if needed for programs.

So the +/- line gives a better indication of memory usage.

2.4 is more "swap happy" that 2.0 and 2.6 is I believe even more so, but
that can be tuned.

Early 2.4 had the "crazy" requirement of having swap be at least 2 x the
size of phys ram. As at any time the entire contents of phys ram, could
be backed by swap. Thankfully this "feature" was removed around .2.4.10

Another area where your ram can "disappear" is in the slab caches
/proc/slabinfo 


Linux will happily use ram for buffers and cache and swap out idle
programs to swap... so do not be alarmed at all that you seem to be
using swap more than you may have thought you needed to.


> Thanks for any help! 
> 
> dave.
> 
> 
> 
> ______________________________________________________________________
> 
> $ sar -r -f sa28
> 07:30:00 PM kbmemfree kbmemused  %memused kbmemshrd kbbuffers  kbcached kbswpfree kbswpused  %swpused
> 07:40:00 PM     26628    221200     89.26         0     30272     71040    752036     35140      4.46
> 07:50:00 PM     26544    221284     89.29         0     30488     70908    752036     35140      4.46
> 08:00:00 PM     26324    221504     89.38         0     30736     70768    752036     35140      4.46
> 08:10:00 PM     26212    221616     89.42         0     31092     70524    752036     35140      4.46
> 08:20:00 PM     26112    221716     89.46         0     31328     70388    752036     35140      4.46
> 08:30:00 PM     26012    221816     89.50         0     31580     70240    752036     35140      4.46
> 08:40:00 PM     25940    221888     89.53         0     31888     70004    752036     35140      4.46
> 08:50:00 PM     25724    222104     89.62         0     32220     69880    752036     35140      4.46
> 09:00:00 PM     25596    222232     89.67         0     32308     69920    752036     35140      4.46
> 09:10:00 PM     25568    222260     89.68         0     32368     69888    752036     35140      4.46
> Average:        12718    235110     94.87         0     27800     97127    753577     33599      4.27
> 
> $ sar -r
> 06:30:00 AM kbmemfree kbmemused  %memused kbmemshrd kbbuffers  kbcached kbswpfree kbswpused  %swpused
> 06:40:00 AM      8400    239428     96.61         0     13932    133388    759636     27540      3.50
> 06:50:00 AM      8252    239576     96.67         0     14044    133420    759636     27540      3.50
> 07:00:00 AM      7892    239936     96.82         0     14100    133472    759636     27540      3.50
> 07:10:00 AM      7648    240180     96.91         0     14336    133476    759636     27540      3.50
> 07:20:00 AM      7532    240296     96.96         0     14440    133484    759636     27540      3.50
> 07:30:00 AM      7140    240688     97.12         0     14664    133600    759636     27540      3.50
> 07:40:00 AM      8144    239684     96.71         0     14940    131456    759636     27540      3.50
> 07:50:00 AM      7616    240212     96.93         0     15160    131552    759636     27540      3.50
> 08:00:00 AM      6776    241052     97.27         0     15360    131632    759636     27540      3.50
> 08:10:01 AM      5712    242116     97.70         0     15704    131724    759636     27540      3.50
> 08:20:00 AM     23356    224472     90.58         0     14928    113056    759636     27540      3.50
> 08:30:00 AM     22900    224928     90.76         0     15164    113148    759636     27540      3.50
> 08:40:00 AM     22240    225588     91.03         0     15396    113256    759636     27540      3.50
> 08:50:00 AM     21528    226300     91.31         0     15596    113336    759636     27540      3.50
> 09:00:00 AM     20604    227224     91.69         0     15756    113416    759636     27540      3.50
> 09:10:00 AM     20040    227788     91.91         0     16064    113504    759636     27540      3.50
> 09:20:00 AM     19584    228244     92.10         0     16328    113612    759636     27540      3.50
> 09:30:00 AM     18400    229428     92.58         0     16804    114280    759636     27540      3.50
> 09:40:00 AM     17356    230472     93.00         0     17472    114504    759636     27540      3.50
> 09:50:00 AM     15704    232124     93.66         0     18408    114772    759636     27540      3.50
> 10:00:00 AM     14864    232964     94.00         0     19016    114840    759636     27540      3.50
> 10:10:00 AM      8436    239392     96.60         0     19992    115312    759636     27540      3.50
> 10:20:00 AM      8192    239636     96.69         0     20488    115372    759636     27540      3.50
> 10:30:00 AM      6828    241000     97.24         0     21028    115472    759636     27540      3.50
> 10:40:00 AM      7340    240488     97.04         0     21476    113508    759636     27540      3.50
> 10:50:00 AM      5672    242156     97.71         0     21908    113752    759636     27540      3.50
> 11:00:00 AM      5152    242676     97.92         0     22408    113556    759636     27540      3.50
> 11:10:00 AM      4036    243792     98.37         0     22456    114124    759636     27540      3.50
> 11:20:00 AM      3788    244040     98.47         0     22772    114520    759636     27540      3.50
> 11:30:00 AM      4688    243140     98.11         0     23552    112784    759636     27540      3.50
> 11:40:00 AM      8100    239728     96.73         0     24496    108312    759636     27540      3.50
> 11:50:00 AM      7240    240588     97.08         0     25308    108436    759636     27540      3.50
> Average:        18180    229648     92.66         0     21606    108279    763857     23319      2.96
> 
> $ sar -u -f sa28
> 07:30:00 PM       CPU     %user     %nice   %system     %idle
> 07:40:00 PM       all      0.40      0.00      0.45     99.15
> 07:50:00 PM       all      0.20      0.00      0.27     99.53
> 08:00:00 PM       all      0.19      0.00      0.28     99.53
> 08:10:00 PM       all      0.26      0.00      0.40     99.34
> 08:20:00 PM       all      0.19      0.01      0.27     99.53
> 08:30:00 PM       all      0.17      0.02      0.32     99.49
> 08:40:00 PM       all      0.30      0.02      0.36     99.32
> 08:50:00 PM       all      0.32      0.05      0.39     99.24
> 09:00:00 PM       all      0.18      0.02      0.22     99.59
> 09:10:00 PM       all      0.17      0.00      0.23     99.60
> Average:          all      0.39      0.01      0.54     99.07
> 
> $ sar -u
> 06:30:00 AM       CPU     %user     %nice   %system     %idle
> 06:40:00 AM       all      0.26      0.00      0.23     99.51
> 06:50:00 AM       all      0.10      0.00      0.20     99.71
> 07:00:00 AM       all      0.12      0.00      0.18     99.70
> 07:10:00 AM       all      0.16      0.00      0.22     99.62
> 07:20:00 AM       all      0.12      0.00      0.19     99.70
> 07:30:00 AM       all      0.21      0.00      0.21     99.58
> 07:40:00 AM       all      0.59      0.00      0.32     99.09
> 07:50:00 AM       all      0.25      0.00      0.23     99.52
> 08:00:00 AM       all      0.12      0.00      0.20     99.68
> 08:10:01 AM       all      0.42      0.00      0.28     99.30
> 08:20:00 AM       all      0.55      0.00      0.39     99.06
> 08:30:00 AM       all      0.25      0.00      0.29     99.46
> 08:40:00 AM       all      0.39      0.00      0.26     99.34
> 08:50:00 AM       all      0.24      0.00      0.21     99.55
> 09:00:00 AM       all      0.33      0.00      0.20     99.47
> 09:10:00 AM       all      0.25      0.00      0.27     99.47
> 09:20:00 AM       all      0.21      0.00      0.23     99.56
> 09:30:00 AM       all      0.16      0.00      0.28     99.55
> 09:40:00 AM       all      0.42      0.00      0.38     99.20
> 09:50:00 AM       all      0.33      0.02      0.36     99.29
> 10:00:00 AM       all      0.22      0.01      0.26     99.51
> 10:10:00 AM       all      0.54      0.01      0.42     99.03
> 10:20:00 AM       all      0.23      0.00      0.31     99.46
> 10:30:00 AM       all      0.24      0.02      0.33     99.40
> 10:40:00 AM       all      0.33      0.02      0.34     99.30
> 10:50:00 AM       all      0.24      0.00      0.30     99.46
> 11:00:00 AM       all      0.36      0.00      0.37     99.27
> 11:10:00 AM       all      0.30      0.00      0.38     99.32
> 11:20:00 AM       all      0.30      0.01      0.45     99.24
> 11:30:00 AM       all      0.31      0.00      0.51     99.18
> 11:40:00 AM       all      0.58      0.00      0.58     98.85
> 11:50:00 AM       all      0.36      0.00      0.48     99.16
> 12:00:00 PM       all      0.35      0.00      0.68     98.97
> Average:          all      0.33      0.00      0.52     99.14
> 
> $ sar -W -f sa28
> 07:30:00 PM  pswpin/s pswpout/s
> 07:40:00 PM      0.00      0.00
> 07:50:00 PM      0.00      0.00
> 08:00:00 PM      0.00      0.00
> 08:10:00 PM      0.00      0.00
> 08:20:00 PM      0.00      0.00
> 08:30:00 PM      0.00      0.00
> 08:40:00 PM      0.00      0.00
> 08:50:00 PM      0.00      0.00
> 09:00:00 PM      0.00      0.00
> 09:10:00 PM      0.00      0.00
> Average:         0.00      0.03
> 
> $ sar -W
> 06:30:00 AM  pswpin/s pswpout/s
> 06:40:00 AM      0.00      0.00
> 06:50:00 AM      0.00      0.00
> 07:00:00 AM      0.00      0.00
> 07:10:00 AM      0.00      0.00
> 07:20:00 AM      0.00      0.00
> 07:30:00 AM      0.00      0.00
> 07:40:00 AM      0.00      0.00
> 07:50:00 AM      0.00      0.00
> 08:00:00 AM      0.00      0.00
> 08:10:01 AM      0.00      0.00
> 08:20:00 AM      0.00      0.00
> 08:30:00 AM      0.00      0.00
> 08:40:00 AM      0.00      0.00
> 08:50:00 AM      0.00      0.00
> 09:00:00 AM      0.00      0.00
> 09:10:00 AM      0.00      0.00
> 09:20:00 AM      0.01      0.00
> 09:30:00 AM      0.02      0.00
> 09:40:00 AM      0.00      0.00
> 09:50:00 AM      0.00      0.00
> 10:00:00 AM      0.00      0.00
> 10:10:00 AM      0.01      0.00
> 10:20:00 AM      0.00      0.00
> 10:30:00 AM      0.00      0.00
> 10:40:00 AM      0.00      0.00
> 10:50:00 AM      0.00      0.00
> 11:00:00 AM      0.00      0.00
> 11:10:00 AM      0.00      0.00
> 11:20:00 AM      0.00      0.00
> 11:30:00 AM      0.00      0.00
> 11:40:00 AM      0.00      0.00
> 11:50:00 AM      0.00      0.00
> 12:00:00 PM      0.02      0.00
> Average:         0.01      0.07



None of these reports seem to show anything "bad".


Perhaps some hardware stressing may be in order.

Compiling something like a kernel or php repeatedly.

For the kernel, something like this

Get the kernel src and put it in /tmp

then cd into the kernel src directory and

cp arch/i386/defconfig .config

then

make oldconfig

answer y or n to any questions asked, it doesn't really matter which.

then 

while true; do make dep && make bzImage && make modules && make clean;
done


You could setup a couple of these going from separate directories. Or do
something similar with any largish src tree... php also works well for
this.


If you want to test your ram, then look at www.memtest86.com


--
Andrew