[GLLUG] Transferring high volumes of data.

Christopher Walker C.J.Walker at qmul.ac.uk
Wed Jun 11 09:11:31 UTC 2014


On 11/06/14 09:07, Martin A. Brooks wrote:
> On Wed, June 11, 2014 01:15, JLMS wrote:
>> I would appreciate any ideas, pointers, etc that may make possible to
>> transfer such amounts of data in an efficient manner as possible.
> Fedex.
Some particle physicists use phedex[1]. Any similarity in name is, I'm 
sure, entirely intended.

Seriously though, fasterdata.es.net is a very good place to start.

For fast links with high latency, you need to increase the maximum tcp 
window size - and use software that can make use of that in order to 
take full advantage of the link. Notably, scp cannot.

In addition, you need to eliminate bottlenecks - 100Mbit links in 
supposedly Gbit connections, firewalls holding things up, disk speed at 
source and destination etc, using different .

It is perfectly possible to do this - we do so routinely, but it has 
taken a nontrivial amount of my time. We use the recommended settings at 
fasterdata.es.net, and have done lots of testing with iperf.

We use globus gridftp to transfer data, and transfers are scheduled by 
the File Transfer Service (FTS) - which schedules multiple files to be 
transferred at the same time.

Globusonline may be an easier alternative for you. Aspera sell something 
in this space - which AIUI uses UDP to transfer data, rather than TCP, 
so is less sensitive to packet loss. I've no particular experience with 
either of these.

A saturated Gbit link can transfer 10TB in 24h. Achieving link 
saturation is difficult - and likely to annoy other users of the link.

Chris


[1] https://cmsweb.cern.ch/phedex/




More information about the GLLUG mailing list