[Gloucs] webpages to pdf

Simon Cozens simon at simon-cozens.org
Mon Nov 15 09:46:16 UTC 2010


On 14/11/2010 22:38, Glyn Davies wrote:
> Any suggestions for a tool for doing this? I guess I could hack
> something with wget, but not sure whether it will be really easy to
> get images and formatting as they look on the website.

One part of the solution is probably http://code.google.com/p/wkhtmltopdf/
The other part would be your scripting language of choice.

Totally untested:

#!/usr/bin/env perl
use strict; use warnings;
use PDF::API2;
use LWP::Simple;
my $collated = PDF::API2->new or die;
for (1..1020) {
    my $url = "http://www.escape-to.co.uk/view_property.php?id=$_";
    next if get($url) =~ /noprop.gif/;
    system("wkhtml2pdf", $url, "/tmp/escapeto.pdf");
    next if $?;
    # Add it to the collection
    my $pdf = PDF::API2->open("/tmp/escapeto.pdf");
    $collated->import_page($pdf, $_) for 1..$pdf->pages;
}
$collated->saveas("escapeto-all.pdf");



More information about the gloucs mailing list