[Nottingham] Copy without copying? (file copy deduplication)

Martin martin at ml1.co.uk
Mon Jun 10 12:37:59 UTC 2019

On 10/06/2019 12:06, J J via Nottingham wrote:
> I must be misunderstanding something, I thought the snapshot retained
> the original data as things continued to get written to the source.
> (This seems to be what LVM does at any rate)

On LVM, when you enable a snapshot device:

The source device is used as normal, unaffected, with data read and new
data written as normal.

While the LVM snapshot device is enabled, any disk blocks written to the
source device first have the old block data copied to the snapshot
device. Hence you get the original LVM source device in use as normal
and a copy of the old data before being overwritten, block by block,
copied across to the LVM snapshot device. I guess that can be called a
sort of CoW but used instead for 'saving' the old data.

When reading from the LVM snapshot device, any blocks that are unchanged
are read from the source device as normal. Any changed blocks are
instead read from the copy saved on the snapshot device to make the
source device appear to be unchanged.

Hence, you have the read-only snapshot view and additionally, the
snapshot device area can be much smaller than the source device. (Only
the changed data is stored...)

> Which is great, but kinda the wrong way round for my use case.
> I effectively want the changes written into the snapshot, leaving the
> source untouched.
> Thus I can have as many snapshots as I need (parallel tests etc) and
> then chuck them.
> Much more like the differencing you get with Hypervisors (and OverlayFS,
> just at the block level to reduce space requirements).

And that is exactly what you can do on btrfs with the inherent magic of CoW:

Simplistically, all a btrfs snapshot is is an atomic snapshot of the
b-tree indexing. The file data stays in place.

You then have, ready for the very next operation, two identical views of
the same data and attributes. There is no distinction as to which is a
snapshot of whichever. You have taken ' _a_ snapshot' whereby you now
have two identical views with access from two different points on the

Both views are read-writeable in whatever way you wish.

Or you can restrict whichever or both to read-only in whatever way wished.

> If XFS or something can do that, awesome, and I'll start looking at it
> as soon as Kubuntu gets itself out of whatever APT fankle it's managed
> to get into!

No need for XFS (and in any case, I'm a little prejudiced against its
block allocation ways of working and how that must fit your use case -
whereas btrfs is more flexible for extremes of use cases).

> Not sure how viable BTRFS would be as a choice, I thought the project
> was slowly dying off (I know RedHat has dropped it) and any proposals
> will need to be passed by Internal IT, so can't be too exotic.

btrfs is very much alive and is used by and supported by the big cloud
providers including Facebook...

Red Hat only dropped support for some of their special Red Hat kernel
versions for the sake of long term maintainability.

btrfs is still developing apace for new features. However, their main
features and disk structure can be considered long stable.

I've been using btrfs for about a decade to good effect and without any
adverse issues and spanning many TBytes of data.

Hope that's of help,


ps: Corrections welcomed!

> On Mon, 10 Jun 2019 at 11:24, Martin via Nottingham
> <nottingham at mailman.lug.org.uk <mailto:nottingham at mailman.lug.org.uk>>
> wrote:
>     Jason,
>     The obvious fix for that is to use btrfs whereby:
>     You have your pristine folders in one btrfs subvolume (can even be
>     mounted or set to be read-only);
>     Then create a snapshot subvolume from that first subvolume;
>     Your new snapshot subvolume can then be used or mounted read-write.
>     Aside: Note that you don't even need to mount the subvolumes as separate
>     mounts: they can even appear as 'normal' directories on a (single)
>     higher mount point.
>     The snapshot volume is created with zero copying of files and is very
>     fast and low resource to do. You'll see no increase in disk usage!
>     During use, the only file writing will be only for whatever new data is
>     actually written. The magic of the btrfs CoW means that only new blocks
>     of data get written. Unchanged file fragments remain unchanged and
>     uncopied...
>     Hope that fits your needs?
>     Cheers,
>     Martin
>     On 10/06/2019 11:04, J J via Nottingham wrote:
>     > I need read/write access to the new folder.
>     > Mucking around with OverlayFS I can effectively have a "Template"
>     folder
>     > and then a "Working" folder (plus a couple of others for reasons) and
>     > that works well. Except for the fact that full file copies still
>     happen
>     > and some of these files are big.
>     > So the situation is definitely better, but far from ideal.
>     >
>     > I am going to see if I can grok XFS enough to do the similar
>     thing, but
>     > do some block-level magics.
>     > LVM snapshots might also do it, but not sure if they would allow for
>     > multiple, concurrent "Working" folders and AIUI that would mean
>     mucking
>     > around with how the system is deployed, not something I have
>     control over.
>     >
>     > The other wrinkle is that this needs t be done ad hoc.
>     > But first things first...

More information about the Nottingham mailing list