[Gloucs] webpages to pdf

Glyn Davies glynd at walmore.com
Mon Nov 15 21:40:37 UTC 2010


Thanks all.

Something for me to play with.

On 15 November 2010 09:45, Simon Cozens <simon at simon-cozens.org> wrote:
> On 14/11/2010 22:38, Glyn Davies wrote:
>> Any suggestions for a tool for doing this? I guess I could hack
>> something with wget, but not sure whether it will be really easy to
>> get images and formatting as they look on the website.
>
> One part of the solution is probably http://code.google.com/p/wkhtmltopdf/
> The other part would be your scripting language of choice.
>
> Totally untested:
>
> #!/usr/bin/env perl
> use strict; use warnings;
> use PDF::API2;
> use LWP::Simple;
> my $collated = PDF::API2->new or die;
> for (1..1020) {
>    my $url = "http://www.escape-to.co.uk/view_property.php?id=$_";
>    next if get($url) =~ /noprop.gif/;
>    system("wkhtml2pdf", $url, "/tmp/escapeto.pdf");
>    next if $?;
>    # Add it to the collection
>    my $pdf = PDF::API2->open("/tmp/escapeto.pdf");
>    $collated->import_page($pdf, $_) for 1..$pdf->pages;
> }
> $collated->saveas("escapeto-all.pdf");
>
> _______________________________________________
> gloucs mailing list
> gloucs at mailman.lug.org.uk
> https://mailman.lug.org.uk/mailman/listinfo/gloucs
>



-- 
Best Regards
Glyn Davies



More information about the gloucs mailing list