If you would like to know when the Google webcrawler, Googlebot, visits your website you can insert the PHP code below in the home page for your site.
<? $email = "yourname@example.com"; if( eregi("googlebot", $_SERVER['HTTP_USER_AGENT']) ) { mail($email, "Googlebot Alert", "Google just indexed your following page: " . $_SERVER['REQUEST_URI']); } ?>
You will, of course, need to replace yourname@example.com
with
your own email address.
On a Linux or Unix system, you can issue the following commands to see how many requests for pages on your site today have come from a Googlebot visit to your site.
grep "$(date +"%d/%b/%Y")" access.log | grep -i "googlebot" | wc -l
You will need to substitute the name and location of the log file that tracks
access to your site for access.log
.
The $(date +"%d/%b/%Y")
tells grep to look for occurrences of the
current date in the form dd/mmm/YYYY
, e.g. 03/Apr/2007
. In my Apache log files, entries appear similar to the one below.
66.249.66.147 - - [03/Apr/2007:09:10:42 -0400] "GET /robots.txt HTTP/1.1" 200 146
If the date is formatted in a different manner in your log file,
you will need to adjust the format accordingly. You can obtain information
on formatting the date with man date
.
If you don't have IP addresses translated to a
FQDN, e.g. if your log
file records 66.249.66.147
instead of
crawl-66-249-66-147.googlebot.com
, which is the case for my
log file, then you will need to look for the IP address range that is used
by Googlebot.
Googlebot's and Mediapartners-google's IP indicates that
66.249.71.x
appears to be assigned to Googlebot, though
reverse name lookups only work up to 66.249.71.208
. You
can use the following commands to search for the Googlebot IP address range
66.249.71.1
to 66.249.71.255
.
grep "$(date +"%d/%b/%Y")" access.log | grep -i '66.249.66.' | wc -l
The Googlebot's and Mediapartners-google's IP article mentions that Google uses a separate bot that checks pages with Google AdSense ads on them. So, if you have Google AdSense ads on your site, then both the main Googlebot bot and the MediaPartners-Google bot will probably visit your site. The author of that article states he has seen the following IP addresses used for the Mediapartners-Google bot.
66.249.65.40 | crawl-66-249-65-40.googlebot.com |
66.249.66.65 | crawl-66-249-66-65.googlebot.com |
References:
-
Googlebot
Alert
By Philipp Lenssen
June 23, 2004
Google Blogoscoped -
Googlebot's and Mediapartners-google's IP
By Tim Johansson
gurka.se -
FQDN
Wikipedia, the free encyclopedia -
Internet bot
Wikipedia, the free encyclopedia