In all typical webmaster's geekiness, I regularly explore my web server logs to try to understand how people use this site, as well as detect anything bizarre such as referrer spam and other Internet weirdness.
For the referrer spam, I'm reluctant to post the URLs as this would favor their position in search engines, but here is the tutorial from Mark Pilgrim for Apache and which is the method I use block certain robots from doing their dirty work here. The way I manage it is very simple, I have defined a template in MT that points to a
.htaccess file at the site's root, so I can update it easily with a browser.
Earlier in the year, I had banned AvantBrowser because it stubbornly pretended that their home page referred people to my site. This was an error in an old version of this browser which has been corrected as far as I can see. Very recently, I've started to become suspicious after the apparition of regular hits apparently referred from... "http://padawan.info". Because of an unresolved bug in the DNS service I use for this domain, there is no such thing as http://padawan.info because the DNS rejects all my attempts to define a record for this short version. So, I started to suspect a dirty robot, only to find out this in my logs (referrer highlighted):
<IP removed> - - [23/May/2004:14:19:19 +0200] "GET /culture/fahrenheit_911_gets_cannes_palme_dor.html HTTP/1.1" 200 20528 "http://padawan.info" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1; aggregator:NewsMonster; http://www.newsmonster.org/) Gecko/20021130"
I explored further to reach the conclusion that not all NewsMonster installations out there do such stupid thing as to send a false referrer, since the only culprit seems to always come from the same IP belonging to the following range:
Syntrio SYNTRIO-66-180-224-0 (NET-66-180-224-0-1)
126.96.36.199 - 188.8.131.52
innocent culprit can contact me if s/he recognize this range or the name "Metasearch Consulting Network".
Meanwhile I have sent an email to the developer to raise this simple issue: sending false referrer information is a sure way to be assimilated with dirty referrer spammers, and that surely is an embarrassment you do not want for your legitimate product, isn't it?
P.S. That wasn't long either. Here is Kevin Burton's reply:
That was fixed LONG ago... it's not my fault that people run old code ;)
innocent culprit at 66.180.236.something, please update your copy of NewsMonster!
But the morale of the story remains intact: it should not have been in the code in the first place!