[Gloucs] webpages to pdf
Simon Cozens
simon at simon-cozens.org
Mon Nov 15 09:46:16 UTC 2010
On 14/11/2010 22:38, Glyn Davies wrote:
> Any suggestions for a tool for doing this? I guess I could hack
> something with wget, but not sure whether it will be really easy to
> get images and formatting as they look on the website.
One part of the solution is probably http://code.google.com/p/wkhtmltopdf/
The other part would be your scripting language of choice.
Totally untested:
#!/usr/bin/env perl
use strict; use warnings;
use PDF::API2;
use LWP::Simple;
my $collated = PDF::API2->new or die;
for (1..1020) {
my $url = "http://www.escape-to.co.uk/view_property.php?id=$_";
next if get($url) =~ /noprop.gif/;
system("wkhtml2pdf", $url, "/tmp/escapeto.pdf");
next if $?;
# Add it to the collection
my $pdf = PDF::API2->open("/tmp/escapeto.pdf");
$collated->import_page($pdf, $_) for 1..$pdf->pages;
}
$collated->saveas("escapeto-all.pdf");
More information about the gloucs
mailing list