MoonPoint Support Logo

 

Shop Amazon Warehouse Deals - Deep Discounts on Open-box and Used ProductsAmazon Warehouse Deals



Advanced Search
April
Sun Mon Tue Wed Thu Fri Sat
3
         
2007
Months
Apr


Tue, Apr 03, 2007 12:01 pm

Googlebot Alert

If you would like to know when the Google webcrawler, Googlebot, visits your website you can insert the PHP code below in the home page for your site.
<?
$email = "yourname@example.com";
if( eregi("googlebot", $_SERVER['HTTP_USER_AGENT']) )
{ 
    mail($email, "Googlebot Alert", 
            "Google just indexed your following page: " .
            $_SERVER['REQUEST_URI']); 
}
?>

You will, of course, need to replace yourname@example.com with your own email address.

On a Linux or Unix system, you can issue the following commands to see how many requests for pages on your site today have come from a Googlebot visit to your site.

grep "$(date +"%d/%b/%Y")" access.log | grep -i "googlebot" | wc -l

You will need to substitute the name and location of the log file that tracks access to your site for access.log.

The $(date +"%d/%b/%Y") tells grep to look for occurrences of the current date in the form dd/mmm/YYYY, e.g. 03/Apr/2007 . In my Apache log files, entries appear similar to the one below.

66.249.66.147 - - [03/Apr/2007:09:10:42 -0400] "GET /robots.txt HTTP/1.1" 200 146

If the date is formatted in a different manner in your log file, you will need to adjust the format accordingly. You can obtain information on formatting the date with man date.

If you don't have IP addresses translated to a FQDN, e.g. if your log file records 66.249.66.147 instead of crawl-66-249-66-147.googlebot.com, which is the case for my log file, then you will need to look for the IP address range that is used by Googlebot. Googlebot's and Mediapartners-google's IP indicates that 66.249.71.x appears to be assigned to Googlebot, though reverse name lookups only work up to 66.249.71.208. You can use the following commands to search for the Googlebot IP address range 66.249.71.1 to 66.249.71.255.

grep "$(date +"%d/%b/%Y")" access.log | grep -i '66.249.66.' | wc -l

The Googlebot's and Mediapartners-google's IP article mentions that Google uses a separate bot that checks pages with Google AdSense ads on them. So, if you have Google AdSense ads on your site, then both the main Googlebot bot and the MediaPartners-Google bot will probably visit your site. The author of that article states he has seen the following IP addresses used for the Mediapartners-Google bot.

66.249.65.40crawl-66-249-65-40.googlebot.com
66.249.66.65crawl-66-249-66-65.googlebot.com

References:

  1. Googlebot Alert
    By Philipp Lenssen
    June 23, 2004
    Google Blogoscoped
  2. Googlebot's and Mediapartners-google's IP
    By Tim Johansson
    gurka.se
  3. FQDN
    Wikipedia, the free encyclopedia
  4. Internet bot
    Wikipedia, the free encyclopedia

[/network/web/search] permanent link

Valid HTML 4.01 Transitional

Privacy Policy   Contact

Blosxom logo