[Gloucs] webpages to pdf
David Leadbeater
dleadbeater at messagelabs.com
Mon Nov 15 08:46:06 UTC 2010
Hi,
On Sun, Nov 14, 2010 at 10:38:56PM +0000, Glyn Davies wrote:
> 'lo
>
> Any recommendations for a web to pdf tool for the following problem?
>
> I'd like to save links from the escape-to website to a single PDF.
>
> Example links:
> http://www.escape-to.co.uk/view_property.php?id=19
> http://www.escape-to.co.uk/view_property.php?id=26
>
> Basically, I'd like to create a PDF with each property on a single
> page so I can view the facilities, etc at each property with ease. I'd
> probably print it out for ease of viewing.
>
> So, the obvious way about this is to have a counter for the property
> id, ignore ids that don't return a proper property and add each real
> property to a PDF file.
>
> Any suggestions for a tool for doing this? I guess I could hack
> something with wget, but not sure whether it will be really easy to
> get images and formatting as they look on the website.
Don't know of a tool for doing this, but in this case I think you could
hack something as you say:
echo '<base href="http://www.escape-to.co.uk" />' > escape-to.html
for id in $(seq 1 n); do
wget -O - "http://www.escape-to.co.uk/view_property.php?id=$id" > tmp
# "id=&" being something that only appears on ids that don't exist
grep -q "id=&" tmp || (cat tmp >> escape-to.html)
done
firefox escape-to.html
(Replace n above with how many you want to go to, I don't think we want
the whole list scraping their site.)
Then you can use firefox to print the resulting page as a PDF. This
technique probably won't work on all sites as it depends on the HTML/CSS
not using absolute positioning or anything, but seems to work on this
one ;).
David
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________
More information about the gloucs
mailing list