[Nottingham] Downloading multiple images from a website

Robert Hart enxrah at nottingham.ac.uk
Wed Jun 1 10:54:18 BST 2005


On Wed, 2005-06-01 at 07:18 +0100, Michael Quaintance wrote:
> Roger Light said:
> >
> > wget --wait 10 --random-wait `for i in $(seq 0 999); do echo
> > "http://blah/${i}.jpg; done`
> >

> Correct me if I am wrong but won't this actually spawn 1000 consecutive
> instances of wget? I might be better using the 'for' to create a list file
> of each url I want to download and ensuring wget actually performs the
> --random-wait algorithm. Actually, this seems better as I can then
> randomise the list file and get the images in a pseudo-random order.

The first version (not quoted here) did run wget 1000 times. This
version runs echo 1000 times to create a list of URLs and then passes
the whole load to a single wget on the command line.

On most shells (e.g. bash) "echo" is a "builtin", which means the shell
bypasses running the "echo" program you have in /bin, and does it
directly.

Also running wget (I think... curl certainly does this) with a list of
urls from the same site means it will use a single TCP/IP connection,
and download multiple files in one go.

Compare:
$ time echo 

and 

$ time /bin/echo

Note you could use 

$ curl http://blah/[0-999].jpeg -O --limit-rate 3k

Which would do it all in one go, but there would be no delay, instead
the connection is throttled to 3kb/s

Rob


-- 
Robert Hart <enxrah at nottingham.ac.uk>
University of Nottingham


This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.




More information about the Nottingham mailing list