[GLLUG] syncing files

Philip Hands phil at hands.com
Tue Nov 25 14:50:38 UTC 2014


T Menezes <tm.onthemove at gmail.com> writes:

> Hi All,
>
> I am in the middle of writing a script (in bash) to sync CVS vaults between
> two hosts. Ignoring it's a CVS vault, I am talking about running a cron job
> syncing files between two hosts - something that some/most of you probably
> do for a living.
>
> I am stuck with this fundamental question about who controls the sync
> operation and I was wondering if anyone could enlighten me.
> 
> This is a one way sync operation, with the master CVS vault being pushed to
> another host. This seems to be a typical application for rsync so that is
> what I am going with.
>
> The files to be synced are read from a file in the master host. The files
> themselves are also in the master host.
>
> The question is, do I run a push operation from the master host, or a pull
> from the mirrored host?
>
> When I use a push operation, I am concerned about leaving the mirrored CVS
> vault in an inconsistent state if there is a problem with the comms.

You should create a new tree, with hard links, adjacent to the live
tree, and if the rsync is successful, then rename directories, or
redirect links.

No idea how well a client is going to react if you do that in the middle
of them reading the repo though.

> The issue might be easily solved by running a pull operation from the
> mirrored host. However, I would have to hard-code the master paths in the
> script running from the mirrored host which I would like to avoid if at all
> possible (I have full control of the master host; I won't have full control
> of the mirrored host). Over and above that, I am not sure how I would read
> the file with the list of files to sync from the master/remote host. I
> don't think I can use '--files-from' with a path on the remote (master)
> host.

You can specify a single file with rsync, so you could write a script
that first grabs the file spec, and then does things according to that,
but that sounds overly complicated to me.

> So, what would you say is best practice for this case?

switch to git?

(there are tools for interacting between git and CVS, so that might not
be _completely_ unhelpful ;-) )

> 1. Push from the master host, perhaps using temporary files on the mirrored
> host to flag the start and successful completion of the sync operation?
>
> 2. Pull from the mirrored host (with the caveats as per above)
>
> 3. None of the above. The solution is...

I think I'd probably use a script that creates a new timestamped
directory --link-dest=$olddir and if the rsync succeeds then replace a
symlink to point at the new dated dir, as that bit will then be an
atomic switch-over.

Then at the end discard any directories that are no longer needed.

Which end initiates the transfer is largely independent of which drives
the rsync -- see:

    http://wiki.hands.com/howto/passphraseless-ssh/

It sounds like the receiver is the one that knows when it wants to get a
new copy, and that makes the scripting of the directory creation/removal
slightly simpler.

On the other hand, the sender knows what's supposed to be sent, so
should be doing the rsync as a push from there.

Cheers, Phil.
-- 
|)|  Philip Hands  [+44 (0)20 8530 9560]  HANDS.COM Ltd.
|-|  http://www.hands.com/    http://ftp.uk.debian.org/
|(|  Hugo-Klemm-Strasse 34,   21075 Hamburg,    GERMANY
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://mailman.lug.org.uk/pipermail/gllug/attachments/20141125/42d05a3a/attachment.pgp>


More information about the GLLUG mailing list