Blog spam and search engines, continued

Yahoo! Search's most prominent (and unofficial) blogger, Jeremy Zawodny, writes about comment spam. And he's not shy about the key role that search engines have in fighting against this plague:

Then a partial solution is fairly clear. I've heard and seen others discuss it over the past few months. The search engines need to be smarter about reading and indexing content.

When folks like Tim build software that classifies pages, the software needs to be able to recognize the difference between links produced by the blog owner(s) and those contributed by readers and spambots.

Once you can identify the difference between those two types of links, you simply stop using the second type of link when calculating rank. Sure, you can still count them for the purpose of providing link counts--just don't factor them into the ranking.

How's that for removing the incentive?

There are already several proposals on how to do that. My favorite is a simple pair of comments that act as a wrapper around any content that you don't want the search engines to index, and it's my favorite because it's the simplest I've seen so far and it gives me control on what gets indexed or not within a page, not just links. Others involve a fairly comprehensive qualification of links relationships that paves the way to lots of very interesting applications but have about zero chance to be effective until it's built-in with a very simple GUI in web editing tools so that Joe Average starts using it. There might already be more solutions around than blog spammers, and I'm sure even Google and Yahoo! have their own ideas. The most important thing is that they agree on the same one, call it the industry standard and tell world + dog to happily use it on their web sites.

Brad Choate objects that the fix is already out there: just use a redirection service for links. However this has three main drawbacks: 1) it kills the referrer information, 2) it wastes resources for handling a simple link, 3) it prevents links mining services such as Technorati and alike to map connections between sites. Ironically, it should be pointed that Movable Type does provide a redirection of sorts, except that it doesn't work in comments body, making it fairly useless. But the killer reason why this is not the horse I'd bet on, is that nothing prevents the search engines bots to eventually be smart enough to follow the redirections until they reach the destination, and handle it exactly as those redirections did not exist. The redirection that Brad mentions is actually a hack based on the current inability of those engines to follow some redirection mechanisms.

6 Comments

Sure, search engines could start recognizing links through Google as being actual links to sites, but hopefully they will realize the intent and NOT consider them in the first place. Or at least heavily discount them since they are not on the same level as a direct hyperlink.

I am actually against the indexing of the content of comments on my site as well. I specifically exclude Google from indexing it to begin with. If people want to find my site, they will do so based on the content I write, not that of others. Content that I may not agree with. Content that may be wrong. Content that may even be fraudulent. Take a look at one of my individual archive pages in the Google cache and you'll see what I mean.

Hi Brad,

now you're talking ;-). By looking at both the Google cache and source code of one of your post, my guess is that you do not serve the same content to the Google bots. Same for Yahoo! ?

Clever.

Nice way to control what search engines see or not :).

If you're interested, you can read more about how I did that here.

Well, I see your preview isn't a true preview! The links disappeared upon posting. Here you go:

http://bradchoate.com/weblog/2004/07/02/restricting-google-on-my-terms

I'm not sure what's wrong with the preview, it's the stock install preview template! My configuration forbids HTML in the comments and should auto-link URLs. I've edited your comment to remove the HTML and the link appears fine.

I am loosing my head upon these spamming.

[Yeah, and you conveniently leave a commercial URL for yet another stupid cybersquatting/dummy listing site while commenting, how nice. Edited and banned.]

mensuelles Archives

Recent Entries

  • Steve Jobs

    "Remembering that I’ll be dead soon is the most important tool I’ve ever encountered to help me make the big choices in life. Because...

  • Your privacy on MOTOBLUR by Motorola

    After the Nokia Ovi Store carelessness, it's now Motorola who's allowing strangers to get access to your private information on their MOTOBLUR portal. Exactly like...

  • How to resume a broken ADC download

    (I'm documenting this trick for myself to remember, but it can be useful for others…) Apple, on its Apple Developer Connection site, has a bad...

  • WTF is this ‘myEventWatcherDiv’ doing in my web?

    All of a sudden I started to find the following line in most of the web pages I was browsing, including ones I made where...

  • Your privacy on Nokia Ovi Store

    My friend Adam Greenfield recently complained about the over-engineering culture at Nokia: I was given an NFC phone, and told to tap it against the...