[Wolves] Ryanair web page ripper

Re-LoaD reload at brum2600.net
Fri Sep 8 19:35:55 BST 2006


Wayne Morris wrote:
> Hi guys,
> 
> This isn't necesarily a Linux question, but it seems a good place as any 
> to start.
> 
> Cheap air flight companies like Ryanair don't tell you when the cheapest 
> flights are, they expect you to pick a date then step forward or backwards
> day by day checking prices one by one.
> What would be really useful would be a bit of code that would strip out 
> the price from a page, then say 'next day' and then get the new page and 
> then build
> a table of results for a couple of months.
> 
> Any ideas how to get started with this?
> 

For me it would be php but use what ever :) I have added some comments 
but I'm not one for documenting stuff as I am an expert in Dirty code 
sorry  I meant rapid deployment code as my boss would say :D~

// set your url
$src_file = "http://www.somesite.com/contents.htm";
// open your url
if (@fopen($src_file,rb) == TRUE) {
$src = fopen($src_file,rb) or die("Can't open source file");
$src_script = '';
// suck your url into an array
while (!feof($src)) {
  $src_script .= fread($src, 8192);
}
fclose($src);

// parse your url for the data etc....

$src_script = preg_replace("/\n/","","$src_script");
//take out new lines

$src_script = preg_replace("/(\s{2,})/"," ","$src_script");
// take out extra white space


// I'm after links in this example but you could get the price data so 
suck out links
$src_script = preg_replace("/<a href=\"/","\n<!-- link --> <a 
href=\"$src_url/",$src_script);
// this next line replaces the sites code with mine
$src_script = str_replace("\" target=\"_self\">","\" 
target=\"_blank\">",$src_script);

$src_script = str_replace("</a>","</a>\n",$src_script);
$src_script = str_replace("></a>\n","></a> <!-- not me -->\n",$src_script);
// cant just parsing you need to look at the url source to get this 
working for your url
// I have added back in the \n (new line) so I can split the array on \n 
later.
$src_script = str_replace("> </a>\n","></a> <!-- not me -->\n",$src_script);

$src_array = split("\n",$src_script);

// look at the array
print_r ($src_array);

$thiswk=0;
foreach ($src_array as $flow => $null) {
// look for a trigger in the url code around the price data
  if (preg_match("/price/",$src_array[$flow]) == TRUE ) {
  $thiswk=1;
  }
  if (preg_match("/end price/",$src_array[$flow]) == TRUE ) {
  $thiswk=0;
  }
   if ($thiswk == 1) {
         if (preg_match("<!-- link -->",$src_array[$flow]) == TRUE ){
                 if (preg_match("<!-- not me -->","$src_array[$flow]") 
== FALSE ) {
                 $JUSTLNKS[$la] = str_replace("<!-- link 
-->","",$src_array[$flow]);
                 $la++;
                 }
         }
  }
}

// that should give you an array of in my case links :)

print_r ($JUSTLNKS);

Well thats a start of course then you do what you like with the data in 
the array push it to mysql would be good...

hope that helps have phun.....

Re-






More information about the Wolves mailing list