[Gllug] Identifying what processes are waiting on IO
Stephen Nelson-Smith
sanelson at gmail.com
Sun Dec 7 11:09:46 UTC 2008
I have a system whose load is very high for several hours . It's a quad
core machine, and I'm seeing load of around 20.
The system becomes somewhat unresponsive during this time - and
customers are complaining.
Investigating further, I see that the system sees a drop in the amount
of memory used as buffers by the kernel, and I notice that all 4 cores
spend a very large amount of time in iowait.
Looking at the disk performance, I see device saturation - the
percentage of CPU time during which I/O requests were issued to the
device is at 100%
I would like to be able to find out what processes are using the disks,
and waiting on IO - so I can consider either moving them to other
machines, or consider reconfiguring storage in a way that is better
suited to the application profile.
I'm using CentOS 5, so I don't have a kernel with IO accounting, nor do
I have Python 2.5 or 2.6, so I can't use iotop.
I've heard good things about systemtap, but I wouldn't know where to start.
So far all I've done is use lsof and look at the major and minor device
numbers and mount points - but this really doesn't tell me much.
Any pointers of further troubleshooting gratefully received.
Thanks,
S.
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list