[Wolves] Ryanair web page ripper
Re-LoaD
reload at brum2600.net
Fri Sep 8 19:35:55 BST 2006
Wayne Morris wrote:
> Hi guys,
>
> This isn't necesarily a Linux question, but it seems a good place as any
> to start.
>
> Cheap air flight companies like Ryanair don't tell you when the cheapest
> flights are, they expect you to pick a date then step forward or backwards
> day by day checking prices one by one.
> What would be really useful would be a bit of code that would strip out
> the price from a page, then say 'next day' and then get the new page and
> then build
> a table of results for a couple of months.
>
> Any ideas how to get started with this?
>
For me it would be php but use what ever :) I have added some comments
but I'm not one for documenting stuff as I am an expert in Dirty code
sorry I meant rapid deployment code as my boss would say :D~
// set your url
$src_file = "http://www.somesite.com/contents.htm";
// open your url
if (@fopen($src_file,rb) == TRUE) {
$src = fopen($src_file,rb) or die("Can't open source file");
$src_script = '';
// suck your url into an array
while (!feof($src)) {
$src_script .= fread($src, 8192);
}
fclose($src);
// parse your url for the data etc....
$src_script = preg_replace("/\n/","","$src_script");
//take out new lines
$src_script = preg_replace("/(\s{2,})/"," ","$src_script");
// take out extra white space
// I'm after links in this example but you could get the price data so
suck out links
$src_script = preg_replace("/<a href=\"/","\n<!-- link --> <a
href=\"$src_url/",$src_script);
// this next line replaces the sites code with mine
$src_script = str_replace("\" target=\"_self\">","\"
target=\"_blank\">",$src_script);
$src_script = str_replace("</a>","</a>\n",$src_script);
$src_script = str_replace("></a>\n","></a> <!-- not me -->\n",$src_script);
// cant just parsing you need to look at the url source to get this
working for your url
// I have added back in the \n (new line) so I can split the array on \n
later.
$src_script = str_replace("> </a>\n","></a> <!-- not me -->\n",$src_script);
$src_array = split("\n",$src_script);
// look at the array
print_r ($src_array);
$thiswk=0;
foreach ($src_array as $flow => $null) {
// look for a trigger in the url code around the price data
if (preg_match("/price/",$src_array[$flow]) == TRUE ) {
$thiswk=1;
}
if (preg_match("/end price/",$src_array[$flow]) == TRUE ) {
$thiswk=0;
}
if ($thiswk == 1) {
if (preg_match("<!-- link -->",$src_array[$flow]) == TRUE ){
if (preg_match("<!-- not me -->","$src_array[$flow]")
== FALSE ) {
$JUSTLNKS[$la] = str_replace("<!-- link
-->","",$src_array[$flow]);
$la++;
}
}
}
}
// that should give you an array of in my case links :)
print_r ($JUSTLNKS);
Well thats a start of course then you do what you like with the data in
the array push it to mysql would be good...
hope that helps have phun.....
Re-
More information about the Wolves
mailing list