[Nottingham] Copy without copying? (file copy deduplication)

J J jasonirwin73 at gmail.com
Tue Jun 11 10:54:46 UTC 2019


Thanks for that, it explained things rather nicely. I didn't know the
snapshot was writeable (I was probably reading old docs) but using a R/W
snapshot works exactly how I want.
I am going to presume it's a very similar trick with BTRFS/XFS/WhateverFS

Now just to find out how this VM has been deployed and see which solution
works best for IT.

<rant>Why are docs usually just lists of commands? Is a simple picture to
provide some kind of context just too much to ask for></rant>

On Mon, 10 Jun 2019 at 14:50, VM via Nottingham <
nottingham at mailman.lug.org.uk> wrote:

> A good explanation of different uses for read-write snapshots is at
> https://www.clevernetsystems.com/lvm-snapshots-explained/
>
> On June 10, 2019 1:27:54 PM UTC, VM via Nottingham <
> nottingham at mailman.lug.org.uk> wrote:
> >Martin, LVM snapshots can be mounted read-write and keep modifications
> >made. They also can be merged with the origin if changes are desired.
> >For instance, make a snapshot, mount it, upgrade, check everything is
> >fine, merge snapshot with the origin. It's another use of LVM
> >snapshots, different to read-only backups.
> >
> >On June 10, 2019 12:37:55 PM UTC, Martin via Nottingham
> ><nottingham at mailman.lug.org.uk> wrote:
> >>On 10/06/2019 12:06, J J via Nottingham wrote:
> >>> I must be misunderstanding something, I thought the snapshot
> >retained
> >>> the original data as things continued to get written to the source.
> >>> (This seems to be what LVM does at any rate)
> >>
> >>On LVM, when you enable a snapshot device:
> >>
> >>The source device is used as normal, unaffected, with data read and
> >new
> >>data written as normal.
> >>
> >>While the LVM snapshot device is enabled, any disk blocks written to
> >>the
> >>source device first have the old block data copied to the snapshot
> >>device. Hence you get the original LVM source device in use as normal
> >>and a copy of the old data before being overwritten, block by block,
> >>copied across to the LVM snapshot device. I guess that can be called a
> >>sort of CoW but used instead for 'saving' the old data.
> >>
> >>When reading from the LVM snapshot device, any blocks that are
> >>unchanged
> >>are read from the source device as normal. Any changed blocks are
> >>instead read from the copy saved on the snapshot device to make the
> >>source device appear to be unchanged.
> >>
> >>Hence, you have the read-only snapshot view and additionally, the
> >>snapshot device area can be much smaller than the source device. (Only
> >>the changed data is stored...)
> >>
> >>
> >>> Which is great, but kinda the wrong way round for my use case.
> >>> I effectively want the changes written into the snapshot, leaving
> >the
> >>> source untouched.
> >>> Thus I can have as many snapshots as I need (parallel tests etc) and
> >>> then chuck them.
> >>> Much more like the differencing you get with Hypervisors (and
> >>OverlayFS,
> >>> just at the block level to reduce space requirements).
> >>
> >>And that is exactly what you can do on btrfs with the inherent magic
> >of
> >>CoW:
> >>
> >>Simplistically, all a btrfs snapshot is is an atomic snapshot of the
> >>b-tree indexing. The file data stays in place.
> >>
> >>You then have, ready for the very next operation, two identical views
> >>of
> >>the same data and attributes. There is no distinction as to which is a
> >>snapshot of whichever. You have taken ' _a_ snapshot' whereby you now
> >>have two identical views with access from two different points on the
> >>filesystem.
> >>
> >>Both views are read-writeable in whatever way you wish.
> >>
> >>Or you can restrict whichever or both to read-only in whatever way
> >>wished.
> >>
> >>
> >>> If XFS or something can do that, awesome, and I'll start looking at
> >>it
> >>> as soon as Kubuntu gets itself out of whatever APT fankle it's
> >>managed
> >>> to get into!
> >>
> >>No need for XFS (and in any case, I'm a little prejudiced against its
> >>block allocation ways of working and how that must fit your use case -
> >>whereas btrfs is more flexible for extremes of use cases).
> >>
> >>
> >>> Not sure how viable BTRFS would be as a choice, I thought the
> >project
> >>> was slowly dying off (I know RedHat has dropped it) and any
> >proposals
> >>> will need to be passed by Internal IT, so can't be too exotic.
> >>
> >>btrfs is very much alive and is used by and supported by the big cloud
> >>providers including Facebook...
> >>
> >>Red Hat only dropped support for some of their special Red Hat kernel
> >>versions for the sake of long term maintainability.
> >>
> >>btrfs is still developing apace for new features. However, their main
> >>features and disk structure can be considered long stable.
> >>
> >>I've been using btrfs for about a decade to good effect and without
> >any
> >>adverse issues and spanning many TBytes of data.
> >>
> >>
> >>Hope that's of help,
> >>
> >>Cheers,
> >>Martin
> >>
> >>
> >>ps: Corrections welcomed!
> >>
> >>
> >>> On Mon, 10 Jun 2019 at 11:24, Martin via Nottingham
> >>> <nottingham at mailman.lug.org.uk
> >><mailto:nottingham at mailman.lug.org.uk>>
> >>> wrote:
> >>>
> >>>     Jason,
> >>>
> >>>     The obvious fix for that is to use btrfs whereby:
> >>>
> >>>     You have your pristine folders in one btrfs subvolume (can even
> >>be
> >>>     mounted or set to be read-only);
> >>>
> >>>     Then create a snapshot subvolume from that first subvolume;
> >>>
> >>>     Your new snapshot subvolume can then be used or mounted
> >>read-write.
> >>>
> >>>
> >>>     Aside: Note that you don't even need to mount the subvolumes as
> >>separate
> >>>     mounts: they can even appear as 'normal' directories on a
> >>(single)
> >>>     higher mount point.
> >>>
> >>>
> >>>     The snapshot volume is created with zero copying of files and is
> >>very
> >>>     fast and low resource to do. You'll see no increase in disk
> >>usage!
> >>>
> >>>     During use, the only file writing will be only for whatever new
> >>data is
> >>>     actually written. The magic of the btrfs CoW means that only new
> >>blocks
> >>>     of data get written. Unchanged file fragments remain unchanged
> >>and
> >>>     uncopied...
> >>>
> >>>
> >>>     Hope that fits your needs?
> >>>
> >>>     Cheers,
> >>>     Martin
> >>>
> >>>
> >>>
> >>>     On 10/06/2019 11:04, J J via Nottingham wrote:
> >>>     > I need read/write access to the new folder.
> >>>     > Mucking around with OverlayFS I can effectively have a
> >>"Template"
> >>>     folder
> >>>     > and then a "Working" folder (plus a couple of others for
> >>reasons) and
> >>>     > that works well. Except for the fact that full file copies
> >>still
> >>>     happen
> >>>     > and some of these files are big.
> >>>     > So the situation is definitely better, but far from ideal.
> >>>     >
> >>>     > I am going to see if I can grok XFS enough to do the similar
> >>>     thing, but
> >>>     > do some block-level magics.
> >>>     > LVM snapshots might also do it, but not sure if they would
> >>allow for
> >>>     > multiple, concurrent "Working" folders and AIUI that would
> >mean
> >>>     mucking
> >>>     > around with how the system is deployed, not something I have
> >>>     control over.
> >>>     >
> >>>     > The other wrinkle is that this needs t be done ad hoc.
> >>>     > But first things first...
> >>
> >>
> >>--
> >>Nottingham mailing list
> >>Nottingham at mailman.lug.org.uk
> >>https://mailman.lug.org.uk/mailman/listinfo/nottingham
> >
> >--
> >vadim at mankevich.co.uk PGP key fingerprint
> >0xC046022A3A91455AF0C9BB2404BF882B1905C772
> >Retrieve from https://keybase.io/vmankevich
> >
> >"When we take away the right to figure out if something bad is going on
> >in our computers, the inevitable consequence is that bad things will
> >happen in our computers." (Cory Doctorow)
> >
> >
> >--
> >Nottingham mailing list
> >Nottingham at mailman.lug.org.uk
> >https://mailman.lug.org.uk/mailman/listinfo/nottingham
>
> --
> vadim at mankevich.co.uk PGP key fingerprint
> 0xC046022A3A91455AF0C9BB2404BF882B1905C772
> Retrieve from https://keybase.io/vmankevich
>
> "When we take away the right to figure out if something bad is going on in
> our computers, the inevitable consequence is that bad things will happen in
> our computers." (Cory Doctorow)
>
>
> --
> Nottingham mailing list
> Nottingham at mailman.lug.org.uk
> https://mailman.lug.org.uk/mailman/listinfo/nottingham
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.lug.org.uk/pipermail/nottingham/attachments/20190611/9c1daadb/attachment-0001.html>


More information about the Nottingham mailing list