[Gloucs] webpages to pdf

David Leadbeater dleadbeater at messagelabs.com
Mon Nov 15 08:46:06 UTC 2010


Hi,

On Sun, Nov 14, 2010 at 10:38:56PM +0000, Glyn Davies wrote:
> 'lo
> 
> Any recommendations for a web to pdf tool for the following problem?
> 
> I'd like to save links from the escape-to website to a single PDF.
> 
> Example links:
> http://www.escape-to.co.uk/view_property.php?id=19
> http://www.escape-to.co.uk/view_property.php?id=26
> 
> Basically, I'd like to create a PDF with each property on a single
> page so I can view the facilities, etc at each property with ease. I'd
> probably print it out for ease of viewing.
> 
> So, the obvious way about this is to have a counter for the property
> id, ignore ids that don't return a proper property and add each real
> property to a PDF file.
> 
> Any suggestions for a tool for doing this? I guess I could hack
> something with wget, but not sure whether it will be really easy to
> get images and formatting as they look on the website.

Don't know of a tool for doing this, but in this case I think you could
hack something as you say:

echo '<base href="http://www.escape-to.co.uk" />' > escape-to.html
for id in $(seq 1 n); do
  wget -O - "http://www.escape-to.co.uk/view_property.php?id=$id" > tmp
  # "id=&" being something that only appears on ids that don't exist
  grep -q "id=&" tmp || (cat tmp >> escape-to.html)
done
firefox escape-to.html

(Replace n above with how many you want to go to, I don't think we want
 the whole list scraping their site.)

Then you can use firefox to print the resulting page as a PDF. This
technique probably won't work on all sites as it depends on the HTML/CSS
not using absolute positioning or anything, but seems to work on this
one ;).

David

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________



More information about the gloucs mailing list