Converting a Movable Type blog from ISO-8859-1 to UTF-8

Let me tell you that moving an MT blog from one server to another is not necessarily a piece of cake, especially if you have the odd idea of switching the charset, from ISO-8859-1 to UTF-8 (Unicode). It took me an awful amount of time, trials and errors, and I'm documenting the process in the hope that it will save time for someone else.

The key thing to keep in mind is that switching from one charset to another, with existing content, is not a matter of changing one setting here or there. It's not because you have modified AddDefaultCharset in Apache or PublishCharset in MT that you set to go. The charset must be consistent all the way through, from the content to the receiving end. This means that the content itself has to be in UTF-8 (possibly converted from another charset), stored in the database, manipulated by your blog software and served by the web server in UTF-8. It's this consistency that can be problematic to achieve, and a source of trouble if not.

First, I transfered all the static files from the source server to the destination one (this is where you may want to upgrade your copy of MT). Then I transferred the database, using mysqldump, connnecting from the new server directly to the source (you may not be able to do so, in wich case, you would have to transfer the resulting file):

mysqldump --default-character-set=latin1 -C -u username -p -h host --opt --skip-add-locks --skip-extended-insert database > mtdump.sql

My source database is using ISO-8859-1. The trap here is to forget to override the charset, because old versions of the mysql client (like the one on my source server) have ISO-8859-1 as the default while new ones (starting with 4.1 I think) use UTF-8.
Another trap may be to use --opt alone, which is supposed to be the best option (it's on by default on recent mysql versions). However, you may face two problems:
- you don't have permission to LOCK tables, therefore the --skip-add-locks option
- if you cannot change the max_allowed_packet variable on your destination server, you may get the following error during the import: ERROR 1153 (08S01): Got a packet bigger than 'max_allowed_packet' bytes, therefore the --skip-extended-insert option which produces a bigger file but with smaller chunks of INSERTs.

Then I converted the SQL dump from ISO-8859-1 to UTF-8:

iconv -f iso-8859-15 -t utf-8 mtdump.sql > mtutf8.sql

[Note: I use ISO-8859-15 here, because of the euro (€) sign. See this tutorial on charsets for more information on the ISO-8859/latin1 family.]

Then I used mysql, setting the proper input charset, to feed the destination database:

mysql -u username -p --default-character-set=utf8 database < mtutf8.sql

Once this is done, you'll need to configure your blogs with the proper settings at destination (notably the paths names), you'll have to rebuild the templates that are linked to files and that use accentuated characters and, of course, rebuild the whole site.

Now the really unfunny traps...

First, supposing that you have a funny firstname like François and the highly stupid idea of writing it as it should be written (with its bells and whistles, accents, cedillas, etc.) in a MT login name, you may have to resort to some trickery to login to your new MT installation. Never use accents in a login name, especially with a product developed in the U.S.! (BTW, TypeKey is quite broken in this regard too, always trying to scramble my first name.)

Along the same lines, if like me your writings go beyond ASCII, and if you are using the dirify attribute to create category folders and file names, you will find that the dirify function is broken (apparently so since MT 3.1). It works with ISO-8859-1 but not with UTF-8, and it will turn all your accents into 'a'. This is quite problematic since UTF-8 is supposed to be the default charset since MT 3.0! To overcome this, you can either resort to use the dirify for Unicode plugin and change to dirify_unicode="1" in the same way one would use dirify="1", or (this is what I did), grab in this plugin the entire my %HighASCII = (...) hash table that sits in sub convert_high_unicode to replace the one in lib/MT/ I prefer the latter way, since I hope that Six Apart will eventually fix the bug in dirify.

That's all for tonight, I hope I didn't forget anything big. If you see this post, you've reached the new server. If you note anything strange, please let me know!

P.S.: 1. I've first considered using TypeMover, which makes the attractive promise to handle both the transfer from one MT installation to another and the conversion of charset. But my attempt ended up with one major problem: it converts all the accents into HTML entities, which is a big no-no for me (would you like to edit content where half of the words are scrambled with ugly &blah; blocks?)

2. This tutorial assumes that you are moving your blog on the same major version of MT. If this is not the case, then you'll have to get your content, convert it to UTF-8 then follow the MT upgrade steps. The order is not that important.

2 TrackBacks

Déménagement from on December 29, 2004 9:45 PM

Si vous voyez ce billet, c'est que vous avez atteint le nouveau server qui héberge ce blog, chez TextDrive. Ce ne fut pas une mince affaire que de déménager mon installation MT, faire une mise à jour et surtout convertir... Read More

Converting a Movable Type blog from ISO-8859-1 to UTF-8 CutePDF - Create PDF for free, Save PDF Forms, Edit PDF easily PDF Writer für Windows: Kostenlos für den privaten... Read More


You can plan these things (migrations) as long as you like and there will _always_ be something else to do during the process.

I'm glad the switch went well for you. Was the upgraded mysql version one of the motivations for changing host?

Matt, you can't imagine how many traps I've fell through during the move, but I hope the worst is behind me now.

I had a number of reasons, notably the fact that my previous host hasn't upgraded its platform in years, lagging behind several versions of everything LAMP, and was lacking certain Perl extensions that are required by lots of MT 3.x plugins. TextDrive allows me much more control on my site and they're even more responsive since they've got people working on at least three timezones (US, Europe and Australia).

Conclusion for the non advanced user : use Typepad ?

> Conclusion for the non advanced user : use Typepad ?

Mmh, you don't have the choice of charset on TypePad, the charset is what the provider decides (UTF-8 as far as I can see on my TypePad account, which is good).

No, a better advice would be: don't change the charset of an existing blog if you don't need to ;-).

[I guess the next question will be: are you a masochist? ;-)]


the dirify-unicode plug-in doesn't work too well with MT 3.2.

"Ăă Îî Ţţ Şş Ââ Iñtërnâţiônàlizætiønş" will output "aeae-aa-aa-initeirnaitiioinailizatiansi". Something's wrong here...

Probably the issue has nothing to do with the MT versions and it's just a bug in the plug-in table. I'll try to update it with the latest version of my conversion table

PS: I created the conversion table for the version 0.2 of the plug-in too.

For those of you using postgres the comands are:

$ pg_dump -f my_mt.dmp mt
$ iconv -f ISO8859-1 -t UTF-8 my_mt.dmp>utf.dmp
$ dropdb mt
$ createdb -T template0 mt
$ psql mt

mensuelles Archives

Recent Entries

  • Steve Jobs

    "Remembering that I’ll be dead soon is the most important tool I’ve ever encountered to help me make the big choices in life. Because...

  • Your privacy on MOTOBLUR by Motorola

    After the Nokia Ovi Store carelessness, it's now Motorola who's allowing strangers to get access to your private information on their MOTOBLUR portal. Exactly like...

  • How to resume a broken ADC download

    (I'm documenting this trick for myself to remember, but it can be useful for others…) Apple, on its Apple Developer Connection site, has a bad...

  • WTF is this ‘myEventWatcherDiv’ doing in my web?

    All of a sudden I started to find the following line in most of the web pages I was browsing, including ones I made where...

  • Your privacy on Nokia Ovi Store

    My friend Adam Greenfield recently complained about the over-engineering culture at Nokia: I was given an NFC phone, and told to tap it against the...