[Gllug] RAID on RAID
Rich Walker
rw at shadow.org.uk
Thu Nov 4 14:28:22 UTC 2004
Russell Howe <rhowe at wiss.co.uk> writes:
> On Wed, Nov 03, 2004 at 04:28:20PM +0000, Rich Walker wrote:
>> The idea that occurred to me was to allocate a ~5GB chunk of each disk
>> and then do
>> hda1 + hde1 => md0, RAID1
>> hdc1 + hdg1 => md1, RAID1
>> md0 + md1 => md2, RAID1
>>
>> and then mount md2 as /
>
> A few things spring to mind:
>
> 1) Why have 5G for /? If you have /{var,home,usr} as seperate
> filesystems, / shouldn't grow much above 100 megabytes (about the
> biggest consumer would be a large /lib/modules if you have many kernels
> installed).
I was going to keep /usr on /. Looking at the access patterns, that
seems to make more sense than separating it. Mind you, /usr/src would go
somewhere else
> If you're an /opt kind of person, then that would probably
> be wanting its own partition too. As an example, a Debian sid system I
> have looks like this:
>
> Filesystem Type Size Used Avail Use% Mounted on
> /dev/hda2 xfs 239M 120M 119M 51% /
> /dev/hda1 ext2 23M 14M 9.4M 59% /boot
> /dev/hda5 xfs 1.9G 12M 1.9G 1% /tmp
> /dev/hda6 xfs 5.6G 3.0G 2.7G 54% /usr
> /dev/hdd2 xfs 4.9G 2.4G 2.6G 49% /usr/local
> /dev/hdd3 xfs 4.0G 3.9G 49M 99% /var
> /dev/hdd5 xfs 2.0G 2.0G 48M 98% /home
> /dev/hdc1 xfs 75G 75G 387M 100% /usr/local/media
Have a look at /proc/partitions and iostat. I'm thinking that, as long
as you trust everything that writes to /usr (hence /usr/local and
/usr/src elsewhere) - and I trust dpkg mostly - then you might as well
have everything useful on the "as reliable as possible" partition.
> The only reason /var is so big is because it holds the filesystem for my
> desktop (NFS-root box). /home is only so full because I have too much
> crap in ~rhowe. /tmp was more a case of "hm, 2G of space.. what can I do
> with it?".
These days, I am quite happy to put /tmp on a tmpfs partition - along
with /var/tmp, /var/lib/amavis/tmp and maybe a few others. Does wonders
for reducing spurious disk access...
> Having hdc on its own cable would be nice, but IRQs are
> limited in that box, so I don't really want to add another IDE
> controller.
Is that actually a problem? I wouldn't think there was much of a
difference between having two controllers or one on the same IRQ -
servicing time difference should be well down in the noise level
compared to disk seek time...
>
> / is 50% kernel modules, too:
>
> $ du -hsc /lib/modules/*
> 13M /lib/modules/2.6.1-xiao
> 16M /lib/modules/2.6.3
> 12M /lib/modules/2.6.5
> 13M /lib/modules/2.6.5-xiao
> 11M /lib/modules/2.6.9-final-xiao
> 63M total
Yeah, that's surprisingly common. The other one to watch out for is
/etc/gconf - I found this to be >20MB on a box recently...
> 2) You need to be careful when layering kernel subsystems like this -
> especially with the recent kernel option of 4K stacks. There are parts
> of the kernel which are rather heavy consumers of stack, and you can hit
> the 4K limit relatively easily (you can even hit the 8K limit too, if
> you try). Running out of stack space is something to be avoided.
TBH, putting 2.6 on a box that doesn't need it is something to be
avoided in my corner of the universe :->
> Things
> to watch out for are LVM, MD, XFS and certain device drivers (cpqfc, for
> example). All are fairly heavy consumers of stack space. Note that some
> distributions (notably Fedora) ship with a kernel with CONFIG_4KSTACKS
> set.
>
>> Now, clearly write will be slow :-> But write to / is rare - most writes
>> go to /home, /var, /tmp and some to /big.
>>
>> Reads should alternate between md0 and md1.
>>
>> If any one disk controller goes down, no problem.
>> If any three disks go down, no problem.
>
> If a controller or disk goes down, it's quite likely to take Linux with
> it... I've had lockups and kernel panics due to drives simply getting too
> hot, and returning IDE errors! Putting the drives in a fan-cooled
> removable bay solved that one though.
Yeah, I've seen that kind of thing. The issue really is, once the
machine starts to fail, for whatever reason, how bad is the recovery
process going to be and how long will it take? Normally, it chooses the
Worst Possible Time, and waiting N hours for a restore from tape isn't
an option when, for example, you have a catastrophic power failure
taking out the server a few hours before you get on an aeroplane with
a box that still has to be installed from the server :->
Not that that's what prompted this...
cheers, Rich.
>
> --
> Russell Howe | Why be just another cog in the machine,
> rhowe at siksai.co.uk | when you can be the spanner in the works?
> --
> Gllug mailing list - Gllug at gllug.org.uk
> http://lists.gllug.org.uk/mailman/listinfo/gllug
--
rich walker | Shadow Robot Company | rw at shadow.org.uk
technical director 251 Liverpool Road |
need a Hand? London N1 1LX | +UK 20 7700 2487
www.shadow.org.uk/products/newhand.shtml
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list