[Gllug] RAID on RAID

Rich Walker rw at shadow.org.uk
Thu Nov 4 14:28:22 UTC 2004


Russell Howe <rhowe at wiss.co.uk> writes:

> On Wed, Nov 03, 2004 at 04:28:20PM +0000, Rich Walker wrote:
>> The idea that occurred to me was to allocate a ~5GB chunk of each disk
>> and then do 
>> hda1 + hde1 => md0, RAID1
>> hdc1 + hdg1 => md1, RAID1
>> md0 + md1 => md2, RAID1
>> 
>> and then mount md2 as /
>
> A few things spring to mind:
>
> 1) Why have 5G for /? If you have /{var,home,usr} as seperate
> filesystems, / shouldn't grow much above 100 megabytes (about the
> biggest consumer would be a large /lib/modules if you have many kernels
> installed). 

I was going to keep /usr on /. Looking at the access patterns, that
seems to make more sense than separating it. Mind you, /usr/src would go
somewhere else

> If you're an /opt kind of person, then that would probably
> be wanting its own partition too. As an example, a Debian sid system I
> have looks like this:
>
> Filesystem    Type    Size  Used Avail Use% Mounted on
> /dev/hda2      xfs    239M  120M  119M  51% /
> /dev/hda1     ext2     23M   14M  9.4M  59% /boot
> /dev/hda5      xfs    1.9G   12M  1.9G   1% /tmp
> /dev/hda6      xfs    5.6G  3.0G  2.7G  54% /usr
> /dev/hdd2      xfs    4.9G  2.4G  2.6G  49% /usr/local
> /dev/hdd3      xfs    4.0G  3.9G   49M  99% /var
> /dev/hdd5      xfs    2.0G  2.0G   48M  98% /home
> /dev/hdc1      xfs     75G   75G  387M 100% /usr/local/media

Have a look at /proc/partitions and iostat. I'm thinking that, as long
as you trust everything that writes to /usr (hence /usr/local and
/usr/src elsewhere) - and I trust dpkg mostly - then you might as well
have everything useful on the "as reliable as possible" partition.

> The only reason /var is so big is because it holds the filesystem for my
> desktop (NFS-root box). /home is only so full because I have too much
> crap in ~rhowe. /tmp was more a case of "hm, 2G of space.. what can I do
> with it?". 

These days, I am quite happy to put /tmp on a tmpfs partition - along
with /var/tmp, /var/lib/amavis/tmp and maybe a few others. Does wonders
for reducing spurious disk access...

> Having hdc on its own cable would be nice, but IRQs are
> limited in that box, so I don't really want to add another IDE
> controller.

Is that actually a problem? I wouldn't think there was much of a
difference between having two controllers or one on the same IRQ -
servicing time difference should be well down in the noise level
compared to disk seek time...

>
> / is 50% kernel modules, too:
>
> $ du -hsc /lib/modules/*
> 13M     /lib/modules/2.6.1-xiao
> 16M     /lib/modules/2.6.3
> 12M     /lib/modules/2.6.5
> 13M     /lib/modules/2.6.5-xiao
> 11M     /lib/modules/2.6.9-final-xiao
> 63M     total

Yeah, that's surprisingly common. The other one to watch out for is
/etc/gconf - I found this to be >20MB on a box recently...

> 2) You need to be careful when layering kernel subsystems like this -
> especially with the recent kernel option of 4K stacks. There are parts
> of the kernel which are rather heavy consumers of stack, and you can hit
> the 4K limit relatively easily (you can even hit the 8K limit too, if
> you try). Running out of stack space is something to be avoided. 

TBH, putting 2.6 on a box that doesn't need it is something to be
avoided in my corner of the universe :->

> Things
> to watch out for are LVM, MD, XFS and certain device drivers (cpqfc, for
> example). All are fairly heavy consumers of stack space. Note that some
> distributions (notably Fedora) ship with a kernel with CONFIG_4KSTACKS
> set.
>
>> Now, clearly write will be slow :-> But write to / is rare - most writes
>> go to /home, /var, /tmp and some to /big.
>> 
>> Reads should alternate between md0 and md1.
>> 
>> If any one disk controller goes down, no problem.
>> If any three disks go down, no problem.
>
> If a controller or disk goes down, it's quite likely to take Linux with
> it... I've had lockups and kernel panics due to drives simply getting too
> hot, and returning IDE errors! Putting the drives in a fan-cooled
> removable bay solved that one though.

Yeah, I've seen that kind of thing. The issue really is, once the
machine starts to fail, for whatever reason, how bad is the recovery
process going to be and how long will it take? Normally, it chooses the
Worst Possible Time, and waiting N hours for a restore from tape isn't
an option when, for example, you have a catastrophic power failure
taking out the server a few hours before you get on an aeroplane with
a box that still has to be installed from the server :->

Not that that's what prompted this...

cheers, Rich.

>
> -- 
> Russell Howe       | Why be just another cog in the machine,
> rhowe at siksai.co.uk | when you can be the spanner in the works?
> -- 
> Gllug mailing list  -  Gllug at gllug.org.uk
> http://lists.gllug.org.uk/mailman/listinfo/gllug

-- 
rich walker         |  Shadow Robot Company | rw at shadow.org.uk
technical director     251 Liverpool Road   |
need a Hand?           London  N1 1LX       | +UK 20 7700 2487
www.shadow.org.uk/products/newhand.shtml
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list