[Nottingham] Large disks and storage (md raid alignments)
Martin
martin at ml1.co.uk
Fri Sep 11 14:34:13 UTC 2009
Martin wrote:
[---]
>>>
>>> From "man 8 mdadm":
>>>
>>> "Use the new version-1 format superblock. This has few restrictions. The
>>> different subversion store the superblock at different locations on the
>>> device, either at the end (for 1.0), at the start (for 1.1) or 4K from
>>> the start (for 1.2)."
>>>
>>>
>>> So...
>>>
>>> Why would you want the raid superblock to be 4k from the start?
>>>
>>> And for the sake of ssd/stripe alignment, how big is the superblock?
>>>
>>> Why not simply have the superblock at the end (v1.0)?
>
> ...And that lot is mostly answered in:
>
> http://wiki.tldp.org/LVM-on-RAID
[---]
>
> So... Looks like raid1 with a v1.0 superblock is the way to go!
>
> ... Or does a v1.2 superblock fit nicely after a partition table (on
> track 0) yet before the start of the first partition if placed on the
> old DOS required track 1 start?
Phew! What an awkward set of convolutions!!
It looks like the md raid stuff all assumes 4kByte alignments, and that
the md superblock is itself 4KB in size.
Hence, I guess having a 4kB offset from the start is a convenient "frig"
to allow an entire underlying device to be used as a raid array member
AND yet also to allow a 512Byte mbr/partition-table to be written into
LBA sector 0 (CHS sector 1). Assuming 512Byte sectors, the superblock is
written over LBA sectors 8 to 15. That is also nicely within the first
63 sectors usually left "hidden" (mbr/partition-table inclusive) by the
old MSDOS partitioning 'standard' for the first partition to start on
CHS sector 0-1-1 (LBA sector 63).
Note that the LBA sectors count up from "0". Hence, with 512Byte sectors
you get the first 31.5kB of the disk drive "hidden" and the first
partition starts immediately after that. So that's one sector off from a
32kB alignment there...
This is all a hangover from the old CHS counting and of trying to align
the first partition onto the second head of the first cylinder.
Note:
The maximum number of Sides (read/write heads) that can be represented
with 1 byte is 256. The maximum number of Cylinders that can be
represented with 10 bits is 1024. The maximum number of Sectors that can
be represented with 6 bits is 63 because Sectors start counting with 1
(versus Cylinders and Sides which start counting with 0).
And you have a cluster of sectors read by one head, a number of heads in
one cylinder, and as many cylinders as there are tracks across the disk
surface. All in the days of old. That is all now remapped into
meaninglessness today!
BUT...!
Such a partition table only makes sense for an md raid1 (mirror) where
the entire raid devices have been mirrored... And even then, that
partition table shouldn't be used or even visible from the md raid
device... So is that just done for minimum interference and to give the
option of rolling back to a non-raided device?
To ensure, for example, a 64kByte alignment, must a filesystem format be
offset 64kB - (4kB + 4kB) = 56kB from the start of what md provides for
that device? (Assuming the start of the md device area is already aligned.)
Mmmm... Is that why my raid5 runs so very slowly for writes!!! (No start
offset given when formatted to align to the stripes :-( )
Which then comes onto the new GUID partitioning scheme:
http://en.wikipedia.org/wiki/File:GUID_Partition_Table_Scheme.svg
That has one sector of old mbr and partition table, and then a further
*thirty-three* sectors for the GUID structure. That at most allows for a
1kByte alignment before sectors must be skipped to maintain alignments.
Just two fewer 512Byte sectors and you would nicely get a 16kByte
alignment...
And note that HDD manufacturers may well be moving up to 4kByte sectors
for the ever larger HDDs...
All good fun!
Hope that's of interest and help to others. And especially so for
getting the alignment correct for SSD erase block boundaries!...
Cheers,
Martin
--
----------------
Martin Lomas
martin at ml1.co.uk
----------------
More information about the Nottingham
mailing list