Xenu Link Sleuth

When checking the site's error log today, I noticed a "File does not exist" entry that I thought might point to an incorrect link in one of my files, however I could not find anything that matched in the file. So I looked in the Apache CustomLog file to see if there was a "referrer" and what files on the site were being accessed from the IP address I saw in the error log entry. I found only two entries in the CustomLog file for today corresponding to the IP address. The user agent string in the entries was "Xenu Link Sleuth/1.3.8". Checking on Xenu's Link Sleuth™, I found that it is a free tool that runs on Microsoft Windows systems that can be used to check websites for broken hyperlinks. I am presuming that someone may have been checking another site that had an incorrect hyperlink pointing to a page on my site.

Thinking that the tool might be useful to me in analyzing my website, I downloaded it from the developer's website. The software comes in a 426 KB (437,129 bytes) zip file. Within it is one file, setup.exe. The checksums for the current file are as follows:

MD5e6e85aad6a9ef6a5d3e0425c78b18727
SHA17db3282bee6b622d50853f9bd569a6ec6126e113
SHA256 319108b72709e32c250bb49bd02739f26d37b8a9d5a9bb3bbfa488d7354e197f

I checked the downloaded zip file with Google's VirusTotal service, which scans uploaded files for malware using multiple antivirus programs for free. When I uploaded the file I saw:

This file was last analysed by VirusTotal on 2014-03-22 03:07:45 UTC, it was first analysed by VirusTotal on 2010-09-04 05:25:29 UTC. Detection ratio: 0/51

I used the existing report, which is available at XENU.ZIP, since the file had been uploaded by someone else less than a week ago. The report shows that none of the 51 antivirus programs used by VirusTotal reported any issues with the software.

The developer's site lists "Microsoft Windows 95/98/ME/NT/2000/XP/Vista/7" for the system requirements; I didn't find any problems with running it on Windows 8, either.

To install the software, extract the setup.exe file from the zip file and run it to install the software in whichever directory you choose. Installation of the software requires only 743.0 KB of disk drive space.

To check a website, from the inital window, choose File and then Check URL.

Xenu main window

At the Xenu's starting point window, put the address of the website in the "What address do you want to check?" field. Leave "Check external links" checked, if you want to verify that external links in webpages on the site are valid. Over time links that were once valid may become invalid as webpages or even entire websites disappear. If you only want to check internal links on the site, e.g. references from one page to another page on the site, uncheck that checkbox.

Xenu's starting point

Click on OK to start the check of the website. You can click on File then Pause and select Stop Immediately if you need to stop the test. You can click on File then select Continue to resume the test.

When the test concludes, you will see a window appear stating "Link sleuth finished" and asking "Do you want a report?". Click "Yes" to view the report. When the "Remote Orphan Check: Ftp Parameters" window appears giving you the opportunity to check a site providing a File Transfer Protocol (FTP) service, you can click on Cancel at that window, if you are not providing an FTP service from the site.

The report for the website is an HTML file that can be viewed in the browser of your choice. It will be opened in a browser when you choose to view the report and can be found later in the temp directory for the account under which it was run, e.g., C:/Users/useracct/AppData/Local/Temp where useracct is the account from which the program was run. Note: on earlier versions of Microsoft Windows, such as Windows XP, the file will be in the directory C:\Documents and Settings\useracct\Local Settings\Temp. I.e., the file will be in the directory pointed to by the temp environment variable for the account, which you can find by issuing the command echo %temp% from a command prompt.

The report includes a "Site Map of valid HTML pages with a title" section and "Correct internal URLs, by MIME type", which will tell you the number of URLs for various file types, including the following:.

MIME type
text/html
text/css
image/jpeg
text/xml
image/png
image/gif
application/pdf
text/plain
text/plain
audio/mpeg
application/zip
application/x-rpm
application/x-gzip
application/octet-stream
application/vnd.ms-excel
audio/x-wav
application/x-bzip2
text/rtf
application/msword
application/x-sh
application/x-javascript
image/bmp

If you see a lot of URLs with timeouts in the report, you can adjust the value for the number of parallel threads to a lower number to reduce the number of simultaneous queries to the website. The default value is 30. To adjust the value, at the Xenu's starting point window, click on the More options button and then on the Options window, pull the pointer upwards to reduce the number of parallel threads. By doing so, though, you may substantially increase the time to check a site, e.g., you may go from minutes to hours to complete a check of a site.

Xenu options

The developer's FAQ has this to say about timeouts:

Why timeouts?

This is difficult to answer. The cause might be network overload; it might help to set a lower amount of threads, or to fine-tune the DoS detection of your firewall. Check your firewall logs to see whether it detected a "SYN flood" DoS attack by you. SYN is the first data packet that is sent to a host when starting a connection. Theoretically, Xenu might send up to 100 SYN packets that are not immediately answered, so a firewall (that counts "unanswered" SYN packets) might think something "evil" is going on. My firewall box once claimed to have detected a SYN flood when I opened many newspaper articles in background browser windows.

An alternative is also offered in the FAQ in the "Can I configure the timeout?" entry in the FAQ:

Some users have complained that if one URL hits a timeout or a failed connection, all URLs from that host also do. Starting with version 1.2h, this behaviour can be disabled by unchecking "fail all URLs with same failed host" in the advanced options dialog. (The default behaviour is "checked")

If pages on the site are password protected with HTTP basic authenticaion and you haven't provided the authentication credentials, the standard window will open where you can provide the user name and password, if you wish, or you can simply click on "cancel", if you don't know those or don't want to provide them. The FAQ provides further information on how to deal with password protected pages.

There are also guides written by others for Xenu's Link Checker:

  1. Checking Links Using Xenu™
    UCSF School of Medicine
    Updated: September 27, 2007
  2. Checking Links with Xenu
    Integral World: Exploring theories of everything

Xenu's Link Sleuth - More Than Just A Broken Links Finder also provides useful information on using the tool to find problems with a site and glean information to improve the site's design.

 

TechRabbit ad 300x250 newegg.com

Justdeals Daily Electronics Deals1x1 px

Valid HTML 4.01 Transitional

Created: Friday March 28, 2014