/network/web/tools/search/htdig-setup.html
even though years ago I replaced th .html file with
/network/web/tools/search/htdig-setup.php. I created the HTML version on
July 25, 2005 and I believe I replaced it with the PHP version on October
22, 2006.
Yet today, I saw a "File does not exist" entry in the error log pointing to the htdig-setup.html file from an IP address of 183.60.213.31 and an attempt to access it two days ago from 183.60.215.32. Looking in the Apache CustomLog file, which contains the user agent string, which can be used to identify the browser or web crawler that requested a webpage or other file from the web server when someone accesses a page from a browser or it is searched by a web crawler, I could see that both IP addresses had used the following user agent string when attempting to access the file:
"Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)
Seeing "spider" or "bot" in the user agent string usually means that the entity requesting a page on the website is a web crawler like Google's Googlebot or Microsoft's bingbot. In this case the EasouSpider appears to be a Chinese search engine indexing the site as the Asia Pacific Network Information Centre (APNIC), which is the regional Internet registry for the Asia Pacific region, reports that the IP address block 183.0.0.0 - 183.63.255.255 is allocated to the following entity:
netname: | CHINANET-GD |
descr: | CHINANET Guangdong province network |
descr: | Data Communication Division |
descr: | China Telecom |
country: | CN |
According to a June 12, 2010 article, Easou search skills trump Baidu at the Asia Times website by Sherman So, Easou helps mobile-phone users in China search the Internet and even Eclipses China's largest Internet search company, Baidu, as the primary search service for the country's mobile users with double Baidu's volume for mobile traffic, according to an internal report issued by China's leading mobile-phone operator, China Mobile.
The two attempts at accessing the file this month by Easou were the only attempts to access the old, no longer existing .html file this year, but in checking last year's logs, I found a number of attempts to access it by other web crawlers from the following 43 unique IP addresses in 2013:
IP Address | Bot |
---|---|
123.151.139.211 | Sosopider |
157.55.32.114 | bingbot |
157.55.32.28 | bingbot |
157.55.32.58 | bingbot |
157.55.35.86 | bingbot |
157.56.229.184 | bingbot |
157.56.92.144 | bingbot |
157.56.93.186 | bingbot |
173.199.114.147 | AhrefsBot |
173.199.114.211 | AhrefsBot |
173.199.115.59 | AhrefsBot |
173.199.116.179 | AhrefsBot |
173.199.117.251 | AhrefsBot |
173.199.119.139 | AhrefsBot |
180.76.5.162 | Baiduspider |
180.76.5.166 | Baiduspider |
180.76.5.178 | Baiduspider |
180.76.5.23 | Baiduspider |
180.76.5.27 | Baiduspider |
180.76.5.8 | Baiduspider |
180.76.6.225 | Baiduspider |
180.76.6.36 | Baiduspider |
199.127.227.203 | MJ12bot |
199.21.99.88 | YandexBot |
202.46.59.196 | zh-CN |
202.46.63.202 | zh-CN |
204.124.181.85 | MJ12bot |
208.167.230.59 | AhrefsBot |
23.20.57.138 | |
46.105.99.120 | MJ12bot |
5.10.83.28 | AhrefsBot |
5.10.83.90 | AhrefsBot |
5.9.7.208 | MJ12bot |
65.55.24.217 | bingbot |
65.55.55.230 | bingbot |
66.249.72.234 | |
66.249.75.234 | Googlebot |
66.249.76.174 | Googlebot |
66.249.76.234 | Googlebot |
85.17.29.107 | MJ12bot |
85.178.84.125 | MJ12bot |
88.190.44.26 | MJ12bot |
91.232.96.23 |
The IP addresses 202.46.59.196 and 202.46.63.202 have the following user agent string:
Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.1.2) Firefox/3.5.2
An nslookup on the IP addresses yields a name of ptr.cnsat.com.cn for both. There is no website accessible from that name nor the IP addresses, so I can't be certain that they correspond to a web crawler.
Since I saw so many attempts by web crawlers to access the older version of the page that no longer exists, I created a redirect for the old page to point to the new page, i.e., an .htaccess file in the directory where the old file was located containing the following line:
Redirect 301 /network/web/tools/search/htdig-setup.html /network/web/tools/search/htdig-setup.php
I then added a <Directory> section to the section in the Apache
httpd.conf file pertaining to the website to allow redirects to occur in that
directory and then restarted the web server with apachectl restart
.
<Directory /home/jdoe/public_html/network/web/tools/search>
AllowOverride FileInfo
</Directory>
So I'm going to have to remember to create redirects for any files that I move or rename on the site, if they've been on the site for more than a short period of time.
References:
Created: Saturday March 22, 2014