MoonPoint Support Logo

 

Shop Amazon Warehouse Deals - Deep Discounts on Open-box and Used ProductsAmazon Warehouse Deals



Advanced Search
May
Sun Mon Tue Wed Thu Fri Sat
   
16
   
2007
Months
May


Wed, May 16, 2007 9:57 pm

htDig Invalid Comptype

I ran ht://Dig to index the site today using the command /usr/bin/rundig -c /etc/htdig_support.conf >>/var/log/htdig 2>&1, but when I performed htdig searches of the site after the indexing process completed, which took a considerable amount of time, none of the searches returned any results. When I checked the output file for the rundig command, /var/log/htdig, I saw the errors below:

# cat /var/log/htdig
FATAL ERROR:Compressor::get_vals invalid comptype
FATAL ERROR at file:WordBitCompress.cc line:827 !!!
/usr/bin/rundig: line 36: 23767 Segmentation fault      $BINDIR/htdig -i $opts $
stats $alt
/usr/bin/rundig: line 81: 24766 Segmentation fault      /usr/bin/htfuzzy $opts m
etaphone
/usr/bin/rundig: line 82: 24767 Segmentation fault      /usr/bin/htfuzzy $opts s
oundex
I found some references to others encountering the same error message when I performed a Google search, but didn't see anything that I felt would give me an appropriate fix for my system. Some of the references seemed to indicate the problem occurred when htdig was indexing an enormous number of files. But there are only a few hundred files for it to index on my site, so I didn't think the number of files should be the cause of the problem. However, htdig had been indexing pages in my Blosxom blog several times, because of my use of the Find plugin for Blosxom.

I included a search feature on each page of the blog that uses Fletcher Penney's find plugin to allow a search of the blog for information. Underneath the search box there is an "Advanced Search" link that provides more advanced search capabilities. Clicking on it will display the same blog page as was visible before, but with advanced search options visible. This was resulting in ht://Dig returning the same page multiple times whenever I used it to search the entire site (the Find plugin only searches the blog while I have htdig search the entire site).

I thought I might reduce the extraneous results for htdig queries, reduce the time to index the site when running rundig, and possibly elimininate the "FATAL ERROR:Compressor::get_vals invalid comptype" error message by having htdig exclude the "Advanced Search" links when indexing the site. Since that link on pages always includes "advanced_search=1" in the link URL, I edited the htdig configuration file for the website, which is /etc/htdig_support.conf in this case, and added "advanced_search=1" to the exclude_urls list. So I now have the following line in that conf file (the "/cgi-bin/ .cgi" was there by default):

exclude_urls:           /cgi-bin/ .cgi advanced_search=1

I also added some file extensions to the list of filetypes htdig should exclude from its indexing process. I added ".mp3 .img .iso .dat .dll .scr" to the bad_extensions section, so I now have the following in that list:


bad_extensions:         .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif \
        .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi .css \
        .cab .png .rar .mp3 .img .iso .dat .dll .scr

There is no need for htdig to index binary files. It will only take more time for htdig to index the site if they aren't excluded and greatly increase the changes htdig will fail while indexing the site. If you store other types of music or movie files on a site, you should add them to the bad_extensions list, if you use htdig.

When I reran rundig with the command /usr/bin/rundig -c /etc/htdig_support.conf >/var/log/htdig 2>&1, it did not fail this time and when I performed htdig searches of the site, I didn't get results returned that were duplicates due to the Blosxom Find plugin's "Advanced Search" links.

References:

  1. RE: [htdig] Segfault indexing a site with 3.2.0b2
    May 23 2000
    ht://Dig 3.x list archive

  2. Error in zlib Compressor for WordDB
    July 30, 2002
    web.htdig.devel

  3. FindPlugin
    Author: Fletcher T. Penney

[/network/web/tools/search] permanent link

Valid HTML 4.01 Transitional

Privacy Policy   Contact

Blosxom logo