Search engines and RSS aggregators patterns

Looking at my server logs (using Summary) I found the following search engines patterns interesting:

search engines patterns

I'm amazed at the differences between them, in how they crawl this site. Yahoo! is by far the most ressource intensive (the less efficient?) with the top score in terms of visits, hits and bandwidth consumed. Recrunched by visit over the past 12 months (from 10/01/05 to 09/30/06), it gives us:

  • Yahoo!: 1.28 hit/visit, 8.2 KB/visit, 1446 visits/day, 1851 hits/day, 11.9 MB/day
  • Google: 231 hits/visit, 100 KB/visit, 99.3 visits/day, 1009 hits/day, 9.97 MB/day
  • MSN Search: 7.93 hits/visit, 155 KB/visit, 56.4 visits/day, 447.4 hits/day, 8.76 MB/day
  • Ask Jeeves: 26.9 hits/visit, 209 KB/visit, 10.5 visit/day, 280.9 hits/day, 2.19 MB/day

Quite different behaviors! The way Summary distinguishes two visits may get in the way in defavor of Yahoo!, so hits and bandwidth are, I think, better metrics for comparisons.

During the same period, I've seen the following patterns from RSS aggregators:

  • Bloglines: 44,098 visits / 84,889 hits / 14.5MB
  • NewsGator: 45,402 visits / 84,785 hits / 51MB
  • Yahoo! RSS Syndication System: 7,003 visits / 7,837 hits / 95.4MB

So Yahoo! RSS consumes twice as much bandwidth as NewsGator in 11 times less hits! Weird, and here again they earn the biggest payload.

1 Comment

Yahoo! and Google are not effective given your patterns. It seems Yahoo! comes often but requests HTTP headers (do not download the full file), though why does it come that often… except if you screwed the HTTP cache headers ;). If you have not screwed on your server then Yahoo! should respect them and not come back until they are expired.

Google doesn't seem to do head request. It comes less often but download the full file.

This is caution to real testing and analysis. There are just my uninformed interpretation from your data.
Technorati still doesn't respect robots.txt, I'm sick of it.

mensuelles Archives

Recent Entries

  • Steve Jobs

    "Remembering that I’ll be dead soon is the most important tool I’ve ever encountered to help me make the big choices in life. Because...

  • Your privacy on MOTOBLUR by Motorola

    After the Nokia Ovi Store carelessness, it's now Motorola who's allowing strangers to get access to your private information on their MOTOBLUR portal. Exactly like...

  • How to resume a broken ADC download

    (I'm documenting this trick for myself to remember, but it can be useful for others…) Apple, on its Apple Developer Connection site, has a bad...

  • WTF is this ‘myEventWatcherDiv’ doing in my web?

    All of a sudden I started to find the following line in most of the web pages I was browsing, including ones I made where...

  • Your privacy on Nokia Ovi Store

    My friend Adam Greenfield recently complained about the over-engineering culture at Nokia: I was given an NFC phone, and told to tap it against the...