[Gllug] git "snapshots"

Nix nix at esperi.org.uk
Wed Apr 6 08:25:00 UTC 2011


On 5 Apr 2011, Iain Conochie uttered the following:

> On 05/04/11 17:33, gvim wrote:
>> This is probably rather basic but bear with me. I'm trying to get my head round git and hear constant references to the term
>> "snapshot" without any explanation of what such an entity contains other than unhelpful photographic analogies.
>
> A snapshot is a point-in-time. For example you have a directory with 4 files in it. When you stage and then commit these files you
> now have a snapshot of the whole directory tree as it was at the point-in-time you made the commit. Remember git tracks changes,
> using SHA1 hashes of each file / directory (blob) in your working directory, so when a blob changes the SHA1 will obviously change
> and hence git knows about it

Yep.

>> Apparently git is based on snapshots of the working directory but does that mean a complete copy of the whole directory is made
>> during a commit?
>
> No. Only the files that have changed. That is why you stage them first (git add). However you _first_ commit will add the whole
> directory (and sub directories). Remember git commits the entire file as a blob, not just the changes

Sort of. *Conceptually* the tree consists of *the whole tree*: every
file, changed or not.

Let's look at a random small commit in the Linux kernel tree.
4a2b9c3756077c05dd8666e458a751d2248b61b6 will do.

Now this commit changes *one line* in net/ipv4/route.c.  But what does
its tree object look like? Feeding the SHA1 we get from 'git cat-file'
into 'git ls-tree', we see

100644 blob 5d56a3fd0de6b9d4d8acc0a26495bd24c489d31f    .gitignore
100644 blob 1eba28acab64c83c3e6fd1c39cebfbc6ad6d29ac    .mailmap
100644 blob ca442d313d86dc67e0a2e5d584b465bd382cbf5c    COPYING
100644 blob 1d39a6d0a510c97558d38e3b8d4b836eff0f9039    CREDITS
040000 tree f87c877d6690b84cd20e8ab3c7e041fece8831b1    Documentation
100644 blob 2114113ceca2801770c57ac07c78fff2b0b8a477    Kbuild
100644 blob c13f48d65898487105f0193667648382c90d0eda    Kconfig
100644 blob a41c1e0a7d73dc205de105224832b6c029fb6f79    MAINTAINERS
100644 blob 504f788773e5b72f2ca6bb733d74196266f8a169    Makefile
100644 blob 1b81d2836873c278ce4fcf3e9868ee9c6341ce49    README
100644 blob 55a6074ccbb715d99b642fa510d3c993121f453d    REPORTING-BUGS
040000 tree aeaf7dddacd69a107e7d9511f3dd48e12ee7d379    arch
040000 tree dec8bd8b8edb44f57de1d4708e1f6f2189ce0a63    block
040000 tree daf2bbda235a6f4b60fe6aebc30b1867939a57c2    crypto
040000 tree 184ab0679a639657b21e9fffa0067c78055a048b    drivers
040000 tree 9597d91d70d0454bcbec5e04200fc2505fec1533    firmware
040000 tree ef586f2bacb4baa77d283e27635566654dd57728    fs
040000 tree 63ccef345d4f843e8547e5ae0b5d311b24553ce9    include
040000 tree 2b1de7cabbfd38b9b8fd99c9f53191a155320dff    init
040000 tree 4acc9cfff125f811602b77297b7064242ffda79a    ipc
040000 tree 8330b488bcd59f46f526393e3da3adb7791e82fd    kernel
040000 tree b117c18dc395277e9e2b9e9935520543e5fa5ea5    lib
040000 tree 5a7a318835e569b647986626bb5ef946988ab3d8    mm
040000 tree f6d0f543cffef3b0961172e418b44983efc1454f    net
040000 tree c151f03fec182e31c2d3bafdbcd837fbecb02837    samples
040000 tree 92c6ff52710424d47a1ef76baf199453b675dd23    scripts
040000 tree af94e88c8e822c7dc87728eb75e3126f1c8c2946    security
040000 tree 840f01ffb5b73bbd43515676b53543c6e252bf14    sound
040000 tree 0f19191da8ecd3ad8ad18e23ec7b2201b3b33223    tools
040000 tree 76742a1e6086b54dcac8818a8ad131e88d34a9db    usr
040000 tree 18a79abcc4340f76e5cc8968cf6fc023290453f4    virt

Note: *every directory* is there! It's just that if you look at the
parent of that commit, you see that the SHA1 given for every directory
other than net/ hasn't changed:

100644 blob 5d56a3fd0de6b9d4d8acc0a26495bd24c489d31f    .gitignore
100644 blob 1eba28acab64c83c3e6fd1c39cebfbc6ad6d29ac    .mailmap
100644 blob ca442d313d86dc67e0a2e5d584b465bd382cbf5c    COPYING
100644 blob 1d39a6d0a510c97558d38e3b8d4b836eff0f9039    CREDITS
040000 tree f87c877d6690b84cd20e8ab3c7e041fece8831b1    Documentation
100644 blob 2114113ceca2801770c57ac07c78fff2b0b8a477    Kbuild
100644 blob c13f48d65898487105f0193667648382c90d0eda    Kconfig
100644 blob a41c1e0a7d73dc205de105224832b6c029fb6f79    MAINTAINERS
100644 blob 504f788773e5b72f2ca6bb733d74196266f8a169    Makefile
100644 blob 1b81d2836873c278ce4fcf3e9868ee9c6341ce49    README
100644 blob 55a6074ccbb715d99b642fa510d3c993121f453d    REPORTING-BUGS
040000 tree aeaf7dddacd69a107e7d9511f3dd48e12ee7d379    arch
040000 tree dec8bd8b8edb44f57de1d4708e1f6f2189ce0a63    block
040000 tree daf2bbda235a6f4b60fe6aebc30b1867939a57c2    crypto
040000 tree 184ab0679a639657b21e9fffa0067c78055a048b    drivers
040000 tree 9597d91d70d0454bcbec5e04200fc2505fec1533    firmware
040000 tree ef586f2bacb4baa77d283e27635566654dd57728    fs
040000 tree 63ccef345d4f843e8547e5ae0b5d311b24553ce9    include
040000 tree 2b1de7cabbfd38b9b8fd99c9f53191a155320dff    init
040000 tree 4acc9cfff125f811602b77297b7064242ffda79a    ipc
040000 tree 8330b488bcd59f46f526393e3da3adb7791e82fd    kernel
040000 tree b117c18dc395277e9e2b9e9935520543e5fa5ea5    lib
040000 tree 5a7a318835e569b647986626bb5ef946988ab3d8    mm
040000 tree 606e4dd9e86ba3ebb802f229e0c607463d92dd79    net
040000 tree c151f03fec182e31c2d3bafdbcd837fbecb02837    samples
040000 tree 92c6ff52710424d47a1ef76baf199453b675dd23    scripts
040000 tree af94e88c8e822c7dc87728eb75e3126f1c8c2946    security
040000 tree 840f01ffb5b73bbd43515676b53543c6e252bf14    sound
040000 tree 0f19191da8ecd3ad8ad18e23ec7b2201b3b33223    tools
040000 tree 76742a1e6086b54dcac8818a8ad131e88d34a9db    usr
040000 tree 18a79abcc4340f76e5cc8968cf6fc023290453f4    virt


This has multiple advantages. It makes diffing huge trees really, really
fast (whole subtrees can be eliminated with a single SHA1 comparison):
it means that displaying trees is really fast too, as you don't need
to walk back across parent commits: and it's cheap to create, as you
just need to copy and modify the parent's trees.

>> If so, the disk usage would be huge so I suspect this is not what a snapshot refers to. Too often I find git terminology is
>> explained with other git terminology so not helpful to new users.
>
> Come come now - when has _anything_ useful on UNIX been helpful to new users ;)

git isn't so much 'unhelpful' as 'different', because it was created by
someone who had studiously avoided other version control systems because
they sucked so much compared to what was in his head. :)

-- 
NULL && (void)
--
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list