Search engines and RSS aggregators patterns

| | Comments (1) | TrackBacks (0)

Looking at my server logs (using Summary) I found the following search engines patterns interesting:

search engines patterns

I'm amazed at the differences between them, in how they crawl this site. Yahoo! is by far the most ressource intensive (the less efficient?) with the top score in terms of visits, hits and bandwidth consumed. Recrunched by visit over the past 12 months (from 10/01/05 to 09/30/06), it gives us:

  • Yahoo!: 1.28 hit/visit, 8.2 KB/visit, 1446 visits/day, 1851 hits/day, 11.9 MB/day
  • Google: 231 hits/visit, 100 KB/visit, 99.3 visits/day, 1009 hits/day, 9.97 MB/day
  • MSN Search: 7.93 hits/visit, 155 KB/visit, 56.4 visits/day, 447.4 hits/day, 8.76 MB/day
  • Ask Jeeves: 26.9 hits/visit, 209 KB/visit, 10.5 visit/day, 280.9 hits/day, 2.19 MB/day

Quite different behaviors! The way Summary distinguishes two visits may get in the way in defavor of Yahoo!, so hits and bandwidth are, I think, better metrics for comparisons.

During the same period, I've seen the following patterns from RSS aggregators:

  • Bloglines: 44,098 visits / 84,889 hits / 14.5MB
  • NewsGator: 45,402 visits / 84,785 hits / 51MB
  • Yahoo! RSS Syndication System: 7,003 visits / 7,837 hits / 95.4MB

So Yahoo! RSS consumes twice as much bandwidth as NewsGator in 11 times less hits! Weird, and here again they earn the biggest payload.

0 TrackBacks

Listed below are links to blogs that reference this entry: Search engines and RSS aggregators patterns.

TrackBack URL for this entry: http://padawan.info/cgi-bin/mt/mt-trckbck.cgi/1610

1 Comments

karl said:

Yahoo! and Google are not effective given your patterns. It seems Yahoo! comes often but requests HTTP headers (do not download the full file), though why does it come that often… except if you screwed the HTTP cache headers ;). If you have not screwed on your server then Yahoo! should respect them and not come back until they are expired.

Google doesn't seem to do head request. It comes less often but download the full file.

This is caution to real testing and analysis. There are just my uninformed interpretation from your data.
Technorati still doesn't respect robots.txt, I'm sick of it.

Leave a comment

About this Entry

This page contains a single entry by François Nonnenmacher published on October 2, 2006 4:52 PM.

Some statistics for this blog - third quarter 2006 was the previous entry in this blog.

The Venice Project is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 4.01