[Gllug] OT: domain name appears on google by magic!

Robert McKay robert at mckay.com
Thu Apr 15 12:49:04 UTC 2004


On Thu, Apr 15, 2004 at 12:06:47PM +0100, Ben Fitzgerald wrote:
> Hi
> 
> I'm sure someone will be able to put this one to bed
> very quickly.
> 
> I registered a domain, put some private stuff on the
> web server it points to and left it.
> 
> Now when I go to google it shows up if I search for
> mydomain where the dns record is www.mydomain.com
> It's not a real word so it's the only result, BTW.
> 
> I'd expect to see this appear if I'd:
> 
> 1. Added the url to google's list of urls
> 2. Someone else who already has an indexed page
> has added a link to my page on their page (!).
> 3. The company with whom I registered my domain
> has a "relationship" with google and has made
> some of my info available to google.
> 
> The last one is what I'm guessing, but I'm interested
> if someone has other ideas.

There are many ways google can obtain urls. You might have outbound
links on your site connecting to some site that logs referrers in a
location that google can index. Someone who looked at your site
may use a proxy server who's logs may be indexable by google. Someone
may have posted your url to IRC where it was snarfed by a url bot that
updates a page that google indexes. You might have an inline image on
your page hosted on another site (a web counter perhaps) that logs
referrers and google indexes. Your domain registrar might publicly
list new domain additions.. like a "these people recently bought domains
with us" page that google might index. 

Many places run whois service websites that you may have typed your URL
into (or someone else may have)... these sites often use GET method form
queries which means the request will be leaked to all it's banner ad
providers.. I'm not sure if google's ad system leaks form data or not
(since it's text included in the page rather than an image)... perhaps
someone here knows?

The IP address that your site is running on may have once had a
different website on it that google was trying to index and found your
new site instead.

Some organisations have access to the GTLD zonefiles (you can get these 
from ICANN by signing their license and paying some money.) and maybe
some of them generate webpages automatically containing extracts of 
the data that google then indexes. I wouldn't be surprised if google
actually did this directly itself, and certainly I think it's very
unlikely that extract data from the zones has never been leaked to 
google in this way.

finally.. you typed your domainname into google. Maybe it indexes itself? ;)

There are so many ways it could have obtained your domainname it's 
probably impossible to ever know which one was responsible.

-Robert.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: Digital signature
URL: <http://mailman.lug.org.uk/pipermail/gllug/attachments/20040415/20ede267/attachment.pgp>
-------------- next part --------------
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug


More information about the GLLUG mailing list