[Nottingham] Copy without copying? (file copy deduplication)

Mon Jun 10 13:28:02 UTC 2019

Martin, LVM snapshots can be mounted read-write and keep modifications made. They also can be merged with the origin if changes are desired. For instance, make a snapshot, mount it, upgrade, check everything is fine, merge snapshot with the origin. It's another use of LVM snapshots, different to read-only backups.

On June 10, 2019 12:37:55 PM UTC, Martin via Nottingham <nottingham at mailman.lug.org.uk> wrote:
>On 10/06/2019 12:06, J J via Nottingham wrote:
>> I must be misunderstanding something, I thought the snapshot retained
>> the original data as things continued to get written to the source.
>> (This seems to be what LVM does at any rate)
>
>On LVM, when you enable a snapshot device:
>
>The source device is used as normal, unaffected, with data read and new
>data written as normal.
>
>While the LVM snapshot device is enabled, any disk blocks written to
>the
>source device first have the old block data copied to the snapshot
>device. Hence you get the original LVM source device in use as normal
>and a copy of the old data before being overwritten, block by block,
>copied across to the LVM snapshot device. I guess that can be called a
>sort of CoW but used instead for 'saving' the old data.
>
>When reading from the LVM snapshot device, any blocks that are
>unchanged
>are read from the source device as normal. Any changed blocks are
>instead read from the copy saved on the snapshot device to make the
>source device appear to be unchanged.
>
>Hence, you have the read-only snapshot view and additionally, the
>snapshot device area can be much smaller than the source device. (Only
>the changed data is stored...)
>
>
>> Which is great, but kinda the wrong way round for my use case.
>> I effectively want the changes written into the snapshot, leaving the
>> source untouched.
>> Thus I can have as many snapshots as I need (parallel tests etc) and
>> then chuck them.
>> Much more like the differencing you get with Hypervisors (and
>OverlayFS,
>> just at the block level to reduce space requirements).
>
>And that is exactly what you can do on btrfs with the inherent magic of
>CoW:
>
>Simplistically, all a btrfs snapshot is is an atomic snapshot of the
>b-tree indexing. The file data stays in place.
>
>You then have, ready for the very next operation, two identical views
>of
>the same data and attributes. There is no distinction as to which is a
>snapshot of whichever. You have taken ' _a_ snapshot' whereby you now
>have two identical views with access from two different points on the
>filesystem.
>
>Both views are read-writeable in whatever way you wish.
>
>Or you can restrict whichever or both to read-only in whatever way
>wished.
>
>
>> If XFS or something can do that, awesome, and I'll start looking at
>it
>> as soon as Kubuntu gets itself out of whatever APT fankle it's
>managed
>> to get into!
>
>No need for XFS (and in any case, I'm a little prejudiced against its
>block allocation ways of working and how that must fit your use case -
>whereas btrfs is more flexible for extremes of use cases).
>
>
>> Not sure how viable BTRFS would be as a choice, I thought the project
>> was slowly dying off (I know RedHat has dropped it) and any proposals
>> will need to be passed by Internal IT, so can't be too exotic.
>
>btrfs is very much alive and is used by and supported by the big cloud
>providers including Facebook...
>
>Red Hat only dropped support for some of their special Red Hat kernel
>versions for the sake of long term maintainability.
>
>btrfs is still developing apace for new features. However, their main
>features and disk structure can be considered long stable.
>
>I've been using btrfs for about a decade to good effect and without any
>adverse issues and spanning many TBytes of data.
>
>
>Hope that's of help,
>
>Cheers,
>Martin
>
>
>ps: Corrections welcomed!
>
>
>> On Mon, 10 Jun 2019 at 11:24, Martin via Nottingham
>> <nottingham at mailman.lug.org.uk
><mailto:nottingham at mailman.lug.org.uk>>
>> wrote:
>> 
>>     Jason,
>> 
>>     The obvious fix for that is to use btrfs whereby:
>> 
>>     You have your pristine folders in one btrfs subvolume (can even
>be
>>     mounted or set to be read-only);
>> 
>>     Then create a snapshot subvolume from that first subvolume;
>> 
>>     Your new snapshot subvolume can then be used or mounted
>read-write.
>> 
>> 
>>     Aside: Note that you don't even need to mount the subvolumes as
>separate
>>     mounts: they can even appear as 'normal' directories on a
>(single)
>>     higher mount point.
>> 
>> 
>>     The snapshot volume is created with zero copying of files and is
>very
>>     fast and low resource to do. You'll see no increase in disk
>usage!
>> 
>>     During use, the only file writing will be only for whatever new
>data is
>>     actually written. The magic of the btrfs CoW means that only new
>blocks
>>     of data get written. Unchanged file fragments remain unchanged
>and
>>     uncopied...
>> 
>> 
>>     Hope that fits your needs?
>> 
>>     Cheers,
>>     Martin
>> 
>> 
>> 
>>     On 10/06/2019 11:04, J J via Nottingham wrote:
>>     > I need read/write access to the new folder.
>>     > Mucking around with OverlayFS I can effectively have a
>"Template"
>>     folder
>>     > and then a "Working" folder (plus a couple of others for
>reasons) and
>>     > that works well. Except for the fact that full file copies
>still
>>     happen
>>     > and some of these files are big.
>>     > So the situation is definitely better, but far from ideal.
>>     >
>>     > I am going to see if I can grok XFS enough to do the similar
>>     thing, but
>>     > do some block-level magics.
>>     > LVM snapshots might also do it, but not sure if they would
>allow for
>>     > multiple, concurrent "Working" folders and AIUI that would mean
>>     mucking
>>     > around with how the system is deployed, not something I have
>>     control over.
>>     >
>>     > The other wrinkle is that this needs t be done ad hoc.
>>     > But first things first...
>
>
>-- 
>Nottingham mailing list
>Nottingham at mailman.lug.org.uk
>https://mailman.lug.org.uk/mailman/listinfo/nottingham

--
vadim at mankevich.co.uk PGP key fingerprint
0xC046022A3A91455AF0C9BB2404BF882B1905C772
Retrieve from https://keybase.io/vmankevich

"When we take away the right to figure out if something bad is going on in our computers, the inevitable consequence is that bad things will happen in our computers." (Cory Doctorow)