If you would like to know when the Google webcrawler, Googlebot, visits your website you can insert the PHP code below in the home page for your site.
<?
$email = "yourname@example.com";
if( eregi("googlebot", $_SERVER['HTTP_USER_AGENT']) )
{
mail($email, "Googlebot Alert",
"Google just indexed your following page: " .
$_SERVER['REQUEST_URI']);
}
?>
You will, of course, need to replace yourname@example.com with
your own email address.
On a Linux or Unix system, you can issue the following commands to see how many requests for pages on your site today have come from a Googlebot visit to your site.
grep "$(date +"%d/%b/%Y")" access.log | grep -i "googlebot" | wc -l
You will need to substitute the name and location of the log file that tracks
access to your site for access.log.
The $(date +"%d/%b/%Y") tells grep to look for occurrences of the
current date in the form dd/mmm/YYYY, e.g. 03/Apr/2007
. In my Apache log files, entries appear similar to the one below.
66.249.66.147 - - [03/Apr/2007:09:10:42 -0400] "GET /robots.txt HTTP/1.1" 200 146
If the date is formatted in a different manner in your log file,
you will need to adjust the format accordingly. You can obtain information
on formatting the date with man date.
If you don't have IP addresses translated to a
FQDN, e.g. if your log
file records 66.249.66.147 instead of
crawl-66-249-66-147.googlebot.com, which is the case for my
log file, then you will need to look for the IP address range that is used
by Googlebot.
Googlebot's and Mediapartners-google's IP indicates that
66.249.71.x appears to be assigned to Googlebot, though
reverse name lookups only work up to 66.249.71.208. You
can use the following commands to search for the Googlebot IP address range
66.249.71.1 to 66.249.71.255.
grep "$(date +"%d/%b/%Y")" access.log | grep -i '66.249.66.' | wc -l
The Googlebot's and Mediapartners-google's IP article mentions that Google uses a separate bot that checks pages with Google AdSense ads on them. So, if you have Google AdSense ads on your site, then both the main Googlebot bot and the MediaPartners-Google bot will probably visit your site. The author of that article states he has seen the following IP addresses used for the Mediapartners-Google bot.
| 66.249.65.40 | crawl-66-249-65-40.googlebot.com |
| 66.249.66.65 | crawl-66-249-66-65.googlebot.com |
References:
-
Googlebot
Alert
By Philipp Lenssen
June 23, 2004
Google Blogoscoped -
Googlebot's and Mediapartners-google's IP
By Tim Johansson
gurka.se -
FQDN
Wikipedia, the free encyclopedia -
Internet bot
Wikipedia, the free encyclopedia
