Debugging charset encoding mismatch with Apache

While setting up a new weblog using UTF-8 as the default encoding charset, I spent literally hours trying to figure out why my first name persisted to show up as François instead of François. Not that I'm not used to it already, but I have this foolish hope that computers should eventually facilitate our life.

It turned out that despite a correct definition of the charset encoding in all pages (<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />), some pages (output from CGI scripts) would be recognized as carrying the proper encoding while others (HTML, PHP) were always reported as having an ISO-8859-1 charset.

Thanks to the excellent Web Developer toolbar for Firefox, I found out that certain pages had a charset definition superposed on them via a Content-Type HTTP header (See headers in Tools > Web Developer > Information > View Response Headers, very handy). After more digging, I found that the pages that were behaving properly would already provide a Content-Type header, which turned my suspicion to the brand new Apache 2 installation on my server.

Bingo! Apache 2 now ships with a default AddDefaultCharset directive that forces the charset to ISO-8859-1 when one is not provided in the headers by an external module (such as a script). Since the HTTP headers have precedence on the META headers tag in the HTML code, this basically voids your efforts to provide this information within HTML pages.

This has been flagged, with merit, as the Apache bug 23421 (see also Apache bug 14513).

If you experience this odd behavior, what you have to do is find the AddDefaultCharset in your httpd.conf and change it to this:

AddDefaultCharset Off

This will prevent Apache 2 to override the charset encodings that you provide through META tags. Apache 1.3.x ships without this directive, which means it's off by default. You should have Apache force the charset only in very specific cases, but that should never be the default behavior IMHO.

1 TrackBack

I recently completed work on http://www.riotweb.com and came across an odd one. Apache was delivering up pages with odd characters in them. I soon discovered that the pages were coming in some character set other then utf-8. Since I was doing PHP, the qui Read More

40 Comments

Hi,

I encountered similar problem when using Apache 2.0. Using a tools (wfetch.exe) from microsoft, i can see the response header and include charset=ISO-8859-1. However, after modify httpd.conf to 'AddDefaultCharset Off', the problem still exists. Response header now do not have the charset encoding, just "Context-type: html/text", however IE still use ISO encoding. The web page have meta tag specified charset=big5 (for traditional chinese).

Any hints for my problems.

Thanks & Regards,
Michael

I guess you meant Content-type: text/html

Have you cleared IE's cache? (sorry for the trivial proposal but I don't have a PC so I don't know about Win IE cache behavior)

Thanks for your reply. Yes, it is the problem on cache. After i touch the page on the server and re-visit the page, everything okay.

Regards,
Michael

add this:
AddDefaultCharset utf-8

François THANK YOU - this has been plaguing me for days and eventually I too noticed the UTF-8 mismatch in my headers on some scripts thanks to the excellent web developer tools with Firefox.

A quick google to your page and I am solved - thank you SO much for taking the time to post this information.

Steve

You're welcome Steve :-)

Cheers - I've spent ages trying to work out why some UTF content wasn't being displayed. It seems that if an HTTP charset is defined, it will overridge any META tags. Most annoying. But, finally, finding your page has put an end to all my misery. Thanks

Hm. Is there a way to do this if I don't have access to the server configs? My on-campus server has the same problem.

David, you can try placing the Apache rule into a file named .htaccess at the root of your HTML document folder. If the server authorizes it, it will work. If not, then I'm afraid your only hope is to convince the server admin to make that change in the main Apache configuration file.

This was exactly the problem I had on my server and now it's gone. Thanks a LOT for posting this!

Elias

I too have the same kind of issue with Websphere application server 5.0.2.6. It is using apache 1.3.26. The Thai characters are displayed as some junk characters. But when the encoding in the browser is changed to Thai(Windows) they are displayed fine. We need to do this every time we access this page. I tried the AddDefaultCharset Off. I restarted my IBM HTTPServer service. I also cleared my browsers cache. This does not help me in fixing this. Am I missing anything still

Uday, do you send the right charset along with the page (it can come from the HTML in the page, or Apache, or WebSphere)? Check with Firefox and the web developer toolbar (Live HTTP headers) to see the headers sent to the browser.

Hi Francois, a very fast and useful link from google and my problem with charset is solved ;) Earlier, I used to reconfigure httpd.conf like this:

AddCharset iso8859-2
AddDefaultCharset iso8859-2

(or something like that), but when I needed to change the charset for one site, I could f*** myself... Thx a lot :)

Thank so much for posting this - it was driving me crazy!

Wow.. thanks so much! It did solve the problem that I had been experiencing for some time now. :D

THANKS!!! It took me hours to find your posting :-) ...

Thank you very much! I was able to solve this by adding a .htaccess file in the root directory with the line "AddDefaultCharset utf-8" (it didnt work setting it to off).

Cheers to you!

I'm amazed that my little post still continues to help people with that stupid Apache configuration. That was the purpose after all ;-).

@Germán W.: that probably means that you're not setting the charset at all anywhere in your pages and scripts. If all your pages are in UTF-8 then that's fine, what you're doing is what Apache 2 does by default, only with a more suitable charset than Latin1!

AAAH!! THANK YOU.

I thought I was losing my mind.

Thank you, this page is a life saver... actually also nerve saver :P I am glad there are people out there like you that will put together this type of information together and let us google it.. precious..

I found this to work at my provider:

CharsetDisable On
AddDefaultCharset utf-8

hey - thanks for the info. I have been having a rough time messing around with character encodings on my blog (accented characters showing up as jumbled codes, etc) and finding this post basically solved the problem. apparently, according to my admin, more recent patches of Apache 1.3 have this setting on by default too. thanks for the tip!

thanx for this info, helped me out

very useful!!

We run an application that allows the user to upload their own files. It is up to them, or their tools, what encoding they choose. We therefore set

AddDefaultCharset off

And add a setting to the header of pages that require a specific encoding (utf-8)

François, thanks a lot for this info - I've spent hours now on trying to resolve this painful behavior. awesome!!!

sam

Thank you so much for your posting. Amezingly clarifying!!!

If you are using PHP5 with Apache, then the problem can also reside in php.ini.

I could not understand why apache 1.3.34 kept using the iso-8859-1 charset since I had changed AddDefaultCharset to Off.

After about two hours of strugle I stumbled upon this in php.ini:

default_mimetype = "text/html"
default_charset = "iso-8859-1"

To disable those set them to empty like this:

default_mimetype = ""
default_charset = ""

And finally my problem went away :-)

If you are using PHP5 with Apache, then the problem can also reside in php.ini.

I could not understand why apache 1.3.34 kept using the iso-8859-1 charset since I had changed AddDefaultCharset to Off.

After about two hours of strugle I stumbled upon this in php.ini:

default_mimetype = "text/html"
default_charset = "iso-8859-1"

To disable those set them to empty like this:

default_mimetype = ""
default_charset = ""

And finally my problem went away :-)

Hey Francois,

thank you so much for this great hint. I have had the same problem with german umlauts when i transfered 100ers of pages in our intranet from Apache 1.3 to 2.0.

Your above mentioned solution worked like charm.

You saved me hours of hours of work.

THANK YOU AGAIN
JoTo

Thanks a lot. Made my day :-)

Thomas

Thanks a lot François, I stumbled upon this problem while internationalizing a gwt/j2ee application. You prevented me from having more grey hair ;)

Solved my problem. Many thanks for the info.

Thanks for the post and also to Kim (http://padawan.info/2004/07/debugging-chars.html#comment-11858), After playing around with Apache, I resolved my charset problem by adding in the vhost :

AddDefaultCharset ISO-8859-1
php_value default_charset "ISO-8859-1"

I was reluctant at first to post my thanks on an article from 2004, but it seems like I'm not the only one who is still finding this page useful!

Thank you so much -- not only for creating this page but for *explaining* why it works: the server settings take priority over the encoding specified by the page, unless you tell it otherwise.

I couldn't figure that out from the official Apache docs or my first few searches.

Your advice helped me display pages created in iWeb (a Macintosh HTML editor) correctly on our Apache server.

Thank you kindly.

Hey Francois,

i have a big problem, with which i have fighted for weeks, even at the weekend.
We use Websphere plus Apache. I wish to open PDF in Browser. I set the contenttype "Application/pdf" and disposition und contentlenghth. (but not charset). Wenn the pdf is opened, it shows unreadable codes. Do you now why??

I hope so much that you can help me ....

Thanks in advanced
Celia

Before looking at the web server (I don't think it's where your problem is), I would first verify that the encoding issue isn't within the PDF itself. I've seen that before, and it's due to the way (or the software with which) the PDF is generated in the first place.

It is dued to the RequestDispatcher. I use it to forward/include my request to another Servlet, and
afterthat i cannot set contentType. now it works. The browser shows my pdf file. I use the web developer toolbar to view the header, the contenttype is still text/html. Even after i deleted the row "DefaultType text/html" in the httpd.conf. (we don't use .htaccess-files). Any idea why? Following is the header:

Date: Mon, 15 Sep 2008 10:12:07 GMT
Server: IBM_HTTP_Server/6.1.0.17 Apache/2.0.47 (Unix)
Content-Length: 8859
Keep-Alive: timeout=10, max=96
Connection: Keep-Alive
Content-Type: text/html
Content-Language: en-US
200 OK

No idea besides maybe the fact that serving a PDF file with a content-type of text/html might not be a good idea ;-).

You just saved my life :D

mensuelles Archives

Recent Entries

  • Steve Jobs

    "Remembering that I’ll be dead soon is the most important tool I’ve ever encountered to help me make the big choices in life. Because...

  • Your privacy on MOTOBLUR by Motorola

    After the Nokia Ovi Store carelessness, it's now Motorola who's allowing strangers to get access to your private information on their MOTOBLUR portal. Exactly like...

  • How to resume a broken ADC download

    (I'm documenting this trick for myself to remember, but it can be useful for others…) Apple, on its Apple Developer Connection site, has a bad...

  • WTF is this ‘myEventWatcherDiv’ doing in my web?

    All of a sudden I started to find the following line in most of the web pages I was browsing, including ones I made where...

  • Your privacy on Nokia Ovi Store

    My friend Adam Greenfield recently complained about the over-engineering culture at Nokia: I was given an NFC phone, and told to tap it against the...