[Nottingham] Large disks and storage (md raid alignments)

Martin martin at ml1.co.uk
Fri Sep 11 14:34:13 UTC 2009


Martin wrote:
[---]
>>>
>>>  From "man 8 mdadm":
>>>
>>> "Use the new version-1 format superblock. This has few restrictions. The 
>>> different subversion store the superblock at different locations on the 
>>> device, either at the end (for 1.0), at the start (for 1.1) or 4K from 
>>> the start (for 1.2)."
>>>
>>>
>>> So...
>>>
>>> Why would you want the raid superblock to be 4k from the start?
>>>
>>> And for the sake of ssd/stripe alignment, how big is the superblock?
>>>
>>> Why not simply have the superblock at the end (v1.0)?
> 
> ...And that lot is mostly answered in:
> 
> http://wiki.tldp.org/LVM-on-RAID
[---]
> 
> So... Looks like raid1 with a v1.0 superblock is the way to go!
> 
> ... Or does a v1.2 superblock fit nicely after a partition table (on 
> track 0) yet before the start of the first partition if placed on the 
> old DOS required track 1 start?

Phew! What an awkward set of convolutions!!

It looks like the md raid stuff all assumes 4kByte alignments, and that 
the md superblock is itself 4KB in size.

Hence, I guess having a 4kB offset from the start is a convenient "frig" 
to allow an entire underlying device to be used as a raid array member 
AND yet also to allow a 512Byte mbr/partition-table to be written into 
LBA sector 0 (CHS sector 1). Assuming 512Byte sectors, the superblock is 
written over LBA sectors 8 to 15. That is also nicely within the first 
63 sectors usually left "hidden" (mbr/partition-table inclusive) by the 
old MSDOS partitioning 'standard' for the first partition to start on 
CHS sector 0-1-1 (LBA sector 63).

Note that the LBA sectors count up from "0". Hence, with 512Byte sectors 
you get the first 31.5kB of the disk drive "hidden" and the first 
partition starts immediately after that. So that's one sector off from a 
32kB alignment there...

This is all a hangover from the old CHS counting and of trying to align 
the first partition onto the second head of the first cylinder.

Note:

The maximum number of Sides (read/write heads) that can be represented 
with 1 byte is 256. The maximum number of Cylinders that can be 
represented with 10 bits is 1024. The maximum number of Sectors that can 
be represented with 6 bits is 63 because Sectors start counting with 1 
(versus Cylinders and Sides which start counting with 0).

And you have a cluster of sectors read by one head, a number of heads in 
one cylinder, and as many cylinders as there are tracks across the disk 
surface. All in the days of old. That is all now remapped into 
meaninglessness today!

BUT...!

Such a partition table only makes sense for an md raid1 (mirror) where 
the entire raid devices have been mirrored... And even then, that 
partition table shouldn't be used or even visible from the md raid 
device... So is that just done for minimum interference and to give the 
option of rolling back to a non-raided device?

To ensure, for example, a 64kByte alignment, must a filesystem format be 
offset 64kB - (4kB + 4kB) = 56kB from the start of what md provides for 
that device? (Assuming the start of the md device area is already aligned.)

Mmmm... Is that why my raid5 runs so very slowly for writes!!! (No start 
offset given when formatted to align to the stripes :-( )


Which then comes onto the new GUID partitioning scheme:

http://en.wikipedia.org/wiki/File:GUID_Partition_Table_Scheme.svg

That has one sector of old mbr and partition table, and then a further 
*thirty-three* sectors for the GUID structure. That at most allows for a 
1kByte alignment before sectors must be skipped to maintain alignments. 
Just two fewer 512Byte sectors and you would nicely get a 16kByte 
alignment...

And note that HDD manufacturers may well be moving up to 4kByte sectors 
for the ever larger HDDs...


All good fun!

Hope that's of interest and help to others. And especially so for 
getting the alignment correct for SSD erase block boundaries!...

Cheers,
Martin


-- 
----------------
Martin Lomas
martin at ml1.co.uk
----------------



More information about the Nottingham mailing list