May 2005 Archives
In 1995, Business Week ran an excellent cover story on intranets, just around the time the buzzword was emerging into general parlance. Several CIOs were quoted as saying they had so-and-so many thousand Web pages behind their firewalls. They were crowing about it. But my take was that this content didn't get created top-down by the organization. Instead, these pages sprang up overnight like a crop of magic mushrooms on a rich mother lode of corporate horseshit.
Christopher Locke, Internet Apocalypso, The Cluetrain Manifesto
Replace "web pages" by blogs, and that's exactly the feeling I have right now while planting blogs around the organization. Except that this time, it will be CEOs who crave about "their" blogs, CIOs are still out of the picture (and CKOs have disappeared as far as I can tell).
That's what I love about the internet. After ten years in it, it's still changing and refreshing.
Investigating AdSense for Feeds, Tim Bray has found a rather annoying piece of legalese in their Terms and Conditions:
For a real dose of Fear, Uncertainty, and Doubt, check out the "Terms and Conditions" language. Uncharacteristically for Google, it's a bad piece of legal drafting; you have to agree that 'You will ensure that each feed user complies with, and each feed user's display of AFF Ads is in compliance with, all of the terms and conditions of the amended Agreement applicable to the Site in the same manner that such terms and conditions apply to You and the Site."Huh? What's a "feed user"? I'm pretty sure it doesn't mean a person-or-program-reading-the-feed, but I'm also pretty sure that I don't understand what it is they mean. And until I do, I'm damn well not going to sign up for "You agree you will be responsible and liable for any and all use of the AFF Ads by any feed user and will indemnify Google for any lawsuit or proceeding (a) relating to or arising from any feed user's use of AFF; (b) relating to or arising from Your failure to ensure any feed user's compliance with the terms of the amended Agreement; and/or (c) brought by a feed user against Google that arises from or is related to Your provision of AFF Ads to such feed user."
If you think this is a tempest in a blog pot, be aware that Google has already tried to enforce in front of a court the idea that only their AdSense customers are legally responsible for the use they make of the technology. In France, at least, they have failed and the judges have consistently condemned Google for infringement on trade marks, arguing that although Google denies all responsibility, it has an active role in suggesting names to their AdSense customers to refine the pertinence of their ads. I can see legal cases coming if some big brands discover that Google serves ads for their competitors on feeds that mention their names.
P.S. Jason Shellen has responded to Tim and promised an update.
I measure how corporate blogging has moved forward in the past six months, and it's amazing. Back in October 2004, David Sifry had spotted about 5,000 corporate bloggers in Technorati. Today, with the announcement that 330,000 IBMers (yes, 330,000 and not 130,000 as incorrectly reported here) are encouraged to blog, I reckon that about half a million employees are encouraged to blog in the IT industry alone!. If just one out of ten start a blog, David's number is multiplied by ten.
Besides the usual suspects (Sun, IBM, Microsoft, Macromedia), do you have examples of medium to large companies which encourage their employees to blog (especially outside the IT industry)?
The baby is out: it's my pleasure to present the new www.capgemini.com. "It looks nice and it validates, what else could anyone ask?" someone told me today ("Holidays?", I replied).
I cannot tell you how happy I am. If you excuse the use of a few superlatives, due to some champagne abuse may be, this is the biggest web site redesign I've orchestrated and the result is, IMHO, the best site ever in Capgemini's 10-year online history.
It's been a long story and I will tell you some of it once I get a chance to breeze a little bit again.
Meanwhile, I will let these fine people who have helped us create this site give you some insight on this fascinating journey: Messieurs Jeffrey Zeldman, Douglas Bowman and Brian Alvey. [P.S. and with his authorization, a special thanks to Adam Greenfield for his excellent job on information architecture.]
Thank you for your interest in Google Web Accelerator. We have currently reached our maximum capacity of users and are actively working to increase the number of users we can support.
Yeah, right. They cannot admit they have screwed up with this one, can they?
And completely off topic, but it made me laugh, a google search for gwa currently returns this:

Meanwhile, there is a lot of hot air at Joe's, with an argument between the "righteous superior designers" who want full control on the UI and the "standards ayatollahs" who want any unsafe action to look like a fire alarm. As I commented there, it's a caricature but that's how the debate looks to me at the moment. Ignoring each others' points isn't going to help.
I think this is my biggest teaser ever. See you there on Tuesday! :-)
I don't know what Drunkenbatman is drinking, but it's definitely good booze. Deconstructing Maui X-Stream is a truly amazing investigation job. You don't see that everyday on a blog, but it's not like you see that often on the IT mainstream media either!
I wish the guy the best in getting out of legal trouble, since he lives in this lawyer-heaven where you can sue your neighbors because they have a greener lawn.
If the Google Web Accelerator breaks your web application, here are a few ways to protect them from this little sucker:
From the GWA Webmaster FAQ:
Can I specify which links Google Web Accelerator will prefetch on my pages?Yes, you can. For each link you'd like us to prefetch, simply add the following snippet of code somewhere in your page's HTML source code:
<link rel="prefetch" href="http://url/to/get/">
The href value should be the actual URL you want prefetched. Google will prefetch this page, and when your users click on this link, that page will load more quickly.
You can learn more about the >link> tag on the Mozilla website.
Also worth knowing: the GWA will not prefetch secure pages, so any URL under https is safe.
If you want to block the GWA at the Apache level, see this tip which can be summarized as putting this in a .htaccess file or your Apache configuration:
If you want to redirect GWA users to an explanation page (here gwa-forbidden.html) use:
RewriteEngine on
RewriteBase /
RewriteCond %{REMOTE_ADDR} ^(72.14.192.|72.14.194.)
RewriteCond %{REQUEST_URI} !^/gwa-forbidden.html$
RewriteRule ^.*$ /gwa-forbidden.html
If you want to send a 403 FORBIDDEN error use:
RewriteEngine on
RewriteBase /
RewriteCond ^(72.14.192.|72.14.194.)
RewriteRule ^.*$ - [F]
Though it would be better to send a 412 PRECONDITION FAILED rather than a 403, and mod_security would be a good tool to use for this with either one or the other following set of rules (blocking by IP or blocking by HTTP header):
SecFilterSelective "REMOTE_ADDR" "^72.14.192.*$" "deny,log,status:412"
SecFilterSelective "REMOTE_ADDR" "^72.14.194.*$" "deny,log,status:412"
or
SecFilterSelective "HTTP_X_MOZ" "prefetch" "deny,log,status:412"
Another way to filter proxy requests at the Apache level, without relying on IP ranges (which Google can modify pretty easily) is to detect the "X-moz: prefetch" header (tip from jpack's comment, which also provides a way to log proxied requests to a separate file):
RewriteEngine On
SetEnvIfNoCase X-Forwarded-For .+ proxy=yes
SetEnvIfNoCase X-moz prefetch no_access=yes
# block pre-fetch requests with X-moz headers
RewriteCond %{ENV:no_access} yes
RewriteRule .* - [F,L]
# write out all proxy requests to another log
CustomLog logs/ursite.com-access_log combined env=!proxy
CustomLog logs/ursite.com-proxy_log combined env=proxy
In PHP, one could do a test like this: if(strtoupper($_SERVER[‘HTTP_X_MOZ’]) == ‘PREFETCH’) ...
For Ruby on Rails applications, see How to show Google's Web Accelerator the door in Rails.
For ColdFusion, see: Use CF to block problems with Google Accelorator.
For some context and perspective about the issues brought by the GWA, and mainly the purists' take that the issue comes from broken web applications that rely on GET when they should be using POST, see:
- Google Web Accelerator: Hey, not so fast - an alert for web app designers at 37Signals
- Loud Thinking: Google's Accelerator in need of a recall and follow-up: Google: Recall, Developers: Improve
- Rael Dornfest: Google Web Accelerator considered overzealous
- Leslie Michael Orchard: Idempotency: It's not just for APIs (don't miss this PDF asserting that Prefetching With GET is Not Good)
- Think Before You Worship
- Simon Willison about Fighting RFCs with RFCs (although he has changed his stance, purists should note that should not is not equivalent to must not)
- The GWA Google Group
- Nik Cubrilovic: Poor Web Applications and Pre-fetch Security Issues
- Joe Gregorio: I'm sorry, I can't kiss it and make it better
- Bill de hÓra: Only the paranoid survive
My own take on this is that although it is indeed a recommendation that one should not implement any destructive or otherwise data-modifying action over an HTTP GET request, the reality is that there are tons of web applications out there that implement such actions using regular links (e.g. Google's Blogger or even its own API!). And the very first reason that comes to mind for doing it is that it's not possible to design a POST request that looks like a regular link without resorting to javascript. I particularly subscribe to Jarkko's comment here:
The spec says that developers shouldn't use GET, it doesn't say they are violating the specs if they do. Actually it's specifically said that there can be valid reasons to disobey these recommendations.I sincerely admit that we as web app developers have a lot to learn from this episode but I still think you're distorting the discussion by bashing 37signals for this. It would be understandable if web application development would start from ground zero today. But it isn't. There's a whole sea of existing applications in the web that will be bitten by this and it's just plain nonsense going around screaming that it's your own fault.
As soon as people start using GWA and wreaking havoc in this imperfect world, they'll just be mad at Google and stop using the Accelerator. That's hardly what Google wants and as it's impossible for them to fix all the broken web apps in the world, there's realistically only one option left for them.
For another (bad) metaphor, this is about the same as leaving all the safety equipment away from a car because "if everyone obeys the traffic rules and laws, there will be no accidents".
But besides the reality check, my other problem with GWA is that it's not a good net citizen -- in fetching objects that most probably will not be displayed by visitors, it's wasting bandwidth and server resources. To me, GWA is the equivalent of an SUV on the net: it gives some sense of comfort to its users at the expense of others' resources.
And speaking of Google's Web Accelerator, besides privacy issues, there are some rather problematic issues with web applications. The guys at 37Signals happened to catch them very quickly because they have their noses sticked to their Backpack servers logs right now, but Google should correct that behavior soon because you can't expect every site to clean after issues in this tool. And correcting bugs in a desktop application isn't as fast as fixing a web application, you have to get all those users to upgrade.
Oh, and that reminds me of the fears that RSS aggregators would "kill the web" by being bandwidth hogs. What would you say of an application that just sucks every link it can find on a web page just to save you a few seconds here and there? Is Google going to sell bandwidth anytime soon?
Well, not exactly, if you pardon the pun and my pathetic attempts to find funny titles in English. Anyway...
Tor is an anonymous internet communication system supported and promoted by the Electronic Frontier Foundation. I see it as a grassroots anonymous proxies network. Here is the bit of their rationale that I find a bit radical:
Tor's technology aims to provide Internet users with protection against "traffic analysis," a form of network surveillance that threatens personal anonymity and privacy, confidential business activities and relationships, and state security.Traffic analysis is used every day by companies, governments, and individuals that want to keep track of where people and organizations go and what they do on the Internet. Instead of looking at the content of your communications, traffic analysis tracks where your data goes and when, as well as how much is sent.
Don't take me wrong, I'm all for defending privacy, online or offline. The trouble is that because some people are going too far in analyzing every steps of someone's online journey, all of a sudden "traffic analysis" becomes a dirty activity. To take two of their own examples (the pharmaceutical company and IBM), what's wrong with knowing that competitors look at our sites and what's wrong with knowing what our visitors do (or don't do) with our sites? I can extend the list to many more metrics, such as the visitors location (I'm running an international site with visitors coming from all over the world, it's important for me to know that), the type of browser client and computer/OS they use, the very common notion of unique visitors, etc.
It's, to me, an issue of transparency. And transparency goes both ways. I have to be clear about what I'm gathering and what I'm doing with that data(1), but thinking that it is not normal for a company to try to know a little bit about the people using their content and services, it's placing the cursor at the other extreme. If you find that I'm giving something of value for free over the internet, can you please tell me a little bit about yourself? It's my nature and my sense of balance, I don't like extremes. And to be fair, the other extreme is seriously questionable. Think about the Google Web Accelerator and its power to gather human stats (actually, one should think about the amazing observation power of Google over the whole internet and their transparency, or lack thereof, about what information they gather and what they do with it). Or as I've read somewhere (can't find the link though), some online advertisement lobby trying to find a way to effectively prevent people from removing cookies (that one's not going to fly for browsers makers I'm afraid, especially not from Mozilla).
What tears me apart is that I perfectly understand that there are legitimate needs for anonymity which I reckon are mainly linked to human rights issues (the mere need to freely browse the internet and get informed without the fear of being caught by the political police comes to mind easily). So people develop defenses against what they perceive as an invasion of privacy, and others develop in reaction new and more performing ways of gathering as much data as they can.
It's an arms race.
As usual with human nature, it gets only a few nasty players to get everybody punished. A quite difficult subject, isn't it?
(1) Hopefully in Europe we have some fairly strong protection of personal data. I have to assure everyone that they can access their personal records and get them modified and/or removed as they wish. I also have to be clear about what personal information I'm gathering and what I'm doing with it. We also live in an "opt-in" culture, not an "opt-out" one. Unfortunately it's not a culture that's shared everywhere else. But it's also worth to know that the European directives on e-commerce and data privacy also make anonymous posting on the web unlawful, and practically forces all technical actors of the internet to keep identifiable tracks of who publishes content on the web.
There was a bug in the dirify function in Movable Type which didn't know how to convert URLs from content encoded in Unicode. It hit me when I moved this blog from ISO-8859-1 to UTF-8 last year. At that time I hacked the MT code to fix this bug in such a way that an upgrade with a bug fix would work without further ado. So I was surprised, after upgrading to MT3.16 which was supposed to fix this bug, to see it coming back!
A quick look at dirify function in MT 3.16:
sub dirify {
($MT::VERSION && MT->instance->{cfg}->PublishCharset eq 'utf-8')
? utf8_dirify(@_) : iso_dirify(@_);
}
So obviously the test is case-sensitive. My config file had this line:
PublishCharset UTF-8
Bummer! Changing it to lowercase fixed the problem:
PublishCharset utf-8
I've filed a bug report at Six Apart. IMO all charset tests should be case insensitive.
Update: it's fixed in MT 3.17.
Yesterday night I hit "Mark All as Read" in my aggregator. Nine hours later, I fire it up and bang! 173 unread items for 159 feeds. It's frustrating and obviously not scalable as I diversify my news sources (one might say that I am not scalable).
The current generation of aggregators which rely on manual subscriptions of individual feeds may be suitable for a reasonably stable and modest set of sources, but it's clearly broken for me. And I don't consider myself as a news junky!
Hopefully, a new generation of aggregator is coming, based on collective tagging, ranking and sharing of feeds, plus a bit of collaborating filtering and machine learning for automated recommendations in the more advanced ones. Blogmarks ("Enlarge your bookmarks") and Rojo ("RSS with mojo!") are two examples.
Future generations of aggregators will crawl as many feeds as they can discover and swallow, and suggest feeds and/or links based on your topics of interest -- whether declared or inferred by watching what you and others consider worth reading, tagging, ranking, linking to. Manual subscription to monolithic feeds and TrackBacks will soon look so passé!
