If you work with a language that doesn’t have any character variations like English for example, you will never have to worry about this at all but for the rest of us character encodings can give a serious headache sometimes. In web design and programming mainly, otherwise you can live a wonderful life without coming across this problem what so ever.
Let’s start with what on earth is character encoding. There is a beautiful article about character encoding, character sets, code pages, and character maps on Wikipedia one should really check out. I didn’t understand half of it but you can immediately see that a lot of people wasted a lot of time developing different encodings to make the web designer’s life more complicated. Here is another useful resource on the topic: htmlpurifier.org/docs/enduser-utf8.html. To give the most comprehensible example, Morse code and Braille writing describe exactly the same characters in a totally different way. Various computer systems, programming languages, older or newer standardization do the same thing. You declare the charset in your HTML header, so the browsers know what to display. Until now it’s all nice but here comes the twist.
What if you have a web site all in UTF-8 and you get from your client a document with text to be put on the site that is in a different encoding, like ISO 8859-2. Or some other combination, it’s the same issue. If you copy / paste the text you will get some real funky characters back when you open it in the browser that you never new that ever existed. So you will have to somehow translate the characters to be able to use the same text without retyping it. I want to go through some of the free tools we find online for converting document encoding. Here is the text I was working with.
www.motobit.com offers a tool that seems to work perfectly. You can either paste your text there or upload the source file, choose the source and destination encoding, choose the output, on screen or file, press convert source data and you’re done. Entering the same text into kanjidict.stc.cx/recode.php and choosing the same encoding pair I ended up with the text looking like this.
Far not as smooth as the previous one although the entities would show up right at the end if one would paste in the HTML document. The Coder’s Toolbox also produced a normal looking text even though there wasn’t an option for ISO-8859-2. i-tools.org/charset/ looks good but I only got an empty text file back after converting.
So conclusion, motobit.com is definitely my favorite but for sure others would do just fine with the rest also probably using different character sets. I hope this will be useful for somebody.
There are no comments