[GLLUG] Transferring high volumes of data.

James Courtier-Dutton james.dutton at gmail.com
Wed Jun 11 12:24:12 UTC 2014


On 11 June 2014 06:52, James Courtier-Dutton <james.dutton at gmail.com> wrote:
>
> On Jun 11, 2014 1:16 AM, "JLMS" <jjllmmss at googlemail.com> wrote:
>>
>> HI,
>>
>> I am wondering what are people out there doing to transfer high volumes of
>> data (100 GB or more every time) between geographically distant sites.
>>
>> I started using rsync (over ssh, including using a version of ssh
>> optimized for performance during file transfers) and got very poor
>> performance (3-7 MB/s).
>>
>> I started to play with sending data in parallel (going as far as splitting
>> some files) and although I improved speed by a factor of 3 or 4 times, the
>> time the transfers take is still unsatisfactory.
>>
>> I started by opening many instances of rsync and my bottleneck became the
>> amount of sessions ssh can handle before starting to drop connections, at
>> the end I had to settle for running around 10 rsync instances, this works
>> much better but would be considered still slow by the powers that be.
>>
>> I would appreciate any ideas, pointers, etc that may make possible to
>> transfer such amounts of data in an efficient manner as possible. I am
>> looking for expertise in the field rather than assisted googling (although
>> if you find something very,very interesting I would of course love to hear
>> about it) :-)
>>
>
> What is the latency of the link?
> What is the bandwidth of the link?
> You might need to use transfer protocols more suited to satellite links.
>
> James

Do you have answers to my questions?
I am asking them because "latency" is a very important factor that
limits file transfer speed when using TCP.
You could have a 10Gig Bandwidth link, but if your latency is high you
will get rather slow file transfers.
In order to predict file transfer speeds you need "bandwidth" and
"latency" as input to the calculations.

Kind Regards

James




More information about the GLLUG mailing list