[Gllug] Modern Fault finding techniques

James Hawtin oolon at ankh.org
Fri Nov 18 12:23:43 UTC 2011

Aaron Trevena wrote:
> That's a good open-ended discussion question in interviews and
> appropriate for both developers and sysadmins.
I didn't have a problem with the question, seems a very real world question.

> If it's not a app dev or app support role I wouldn't expect somebody
> to talk through troubleshooting an app aside from some basic ballpark
> stuff, but I would expect a sysadmin to be able to help isolate the
> problem so that the right team are (re-)assigned the trouble ticket.
And I did isolate it, it was the problem with a table on a database, it 
was running
slow because it was very big. I however isolated it using commands I 
knew would
be available on any Unix system, as i could not make assumptions about 
any monitoring
tools available.

> There are plenty of system level things that could be problematic that
> a sysadmin would be in a  better position to spot : heavy IO, flakey
> network connections to other systems the application uses - not just
> databases - there could be nosql applications, memcached, mogilefs etc
> that are causing problems, some clustering apps support multiple
> fallback handling so the app won't see errors, but under the cover a
> web page request could be making multiple attempts to reach a resource
> on a different load balanced machine, there are also fun things like
> the number of apache processes rapidly increasing because part of the
> application is waiting for a slow database query or an overloaded
> resource elsewhere.
That was not my question, I know what might be the problem, what I want 
to know
is what "modern/better" fault finding techniques people use.
> The thing is, system administration can and should be more than just
> making sure the operating system on a given machine is running
> smoothly without any applications running - and system administrators
> are part of a team, working with developers and support and hopefully
> providing tools or knowledge that enables them to resolve or prevent
> problems - not just putting there hands up and saying "oh that's a
> software/hardware/application/support problem - not my area.. I'm off
> on a fag break" ;)
What makes you think I don't? Just trying to find out what other people 
would do,
to improve myself.

Gllug mailing list  -  Gllug at gllug.org.uk

More information about the GLLUG mailing list