In Time: 7 Clues to Solve Character Encoding Issues

Every so often as the Tridion CMS content and design are weaved by Editors and Developers, I encounter the unexpected character encoding issue. The published page has a quirky A or funny U, drawing attention to syntax and aspect rather than the pertinent content it should.

The simple explanation for why this can happen is that the issue is due to discrepancies in character encoding settings of the various systems the published content will pass in its journey to its final destination. Here are several checkpoints I follow through when I play detective and what I look for to solve the mystery:

1. Publication Target (Tridion): What setting has been selected for the Publication Target Default Code Page value? I check this value in the Publication Target properties of the Tridion CMS Admin Panel. By default, this is set to “System Default” which will acquire the code settings dictated by the Windows operating system of the publisher machine. I usually change this to Unicode (UTF-8)[1].

2. Browser: What is the browser using for its character set? In Internet Explorer I check View ⇒ Encoding and look to see that the Unicode (UTF-8) menu item is marked on. In Firefox, check Options ⇒ Content ⇒ Fonts & Colors Advanced ⇒ Default Character Encoding. In Google Chrome ⇒ Options ⇒ Under the Hood ⇒ Web Content ⇒ Customize fonts ⇒ Encoding

3. Java Virtual Machine: What JVM does the Tridion Deployer run in (for instance one used in Tomcat), and what encodings are set there? As from JDK 1.4 it is possible to find out what is supported by a particular JVM via java.nio.charset. Charset. availableCharsets()[2].

4. Application JVM:

· Is the IDE used forcing a specific encoding, for instance if I’m using Eclipse?

· Is any operation depending on the standard locale for character I/O carrying along the correct encoding, for example when reading a file? Reader r = new InputStreamReader(new FileInputStream("myfile"), "UTF-8");

· Tridion Deployer: if running on a file system, consider running the deployer with -Dfile.encoding=UTF8 command options

5. Web Servers: Decoding onward the trail, despite all of the above, most web servers are happily unaware of any encodings or treat the communication channel as ISO-8859-1, so another two checkpoints in one is at the level of webservers such as IIS, Tomcat or Sun Java System Application Server. Did you know that depending on the webserver even the requests GET and POST themselves can be treated differently by the same webserver? Beware these settings are server dependent and while Sun’s JSAS will treat both GET and POST the same based on one configuration, Tomcat may not, and IIS will expect the individual settings to be specified[3].

· IIS/.NET web.config: <globalization fileEncoding="UTF-8" requestEncoding="UTF-8" responseEncoding="UTF-8"/>

· Tomcat server.xml: set URIEncoding="UTF-8"

· Sun Java System Application Server sun-web.xml: include

<parameter-encoding
default-charset="UTF-8"/>

6. Page level can override encoding directives in HTTP header settings in:

· HTML

· .NET

<% @ Page ResponseEncoding="utf-8" %>

· Java/JSP

<%@page pageEncoding="UTF-8"%>

<%@page contentType="text/html;charset=UTF-8"%>

request.setCharacterEncoding("UTF-8");

· XML

<?xml version="1.0" encoding="UTF-8"?>

7. Create own abstract layer to interact with CM, also for overriding server settings. If step 5 has given you visions of long and dark nights bravely searching your server’s documentation for that minuscule setting, there is light at the end of the tunnel. Put your magnifying-glass away. It is possible to establish a server-independent encoding layer.

Consider setting a context parameter in WEB-INF/web.xml and propagate this throughout your code by reading it before any other parameters and passing it along in extensions of the request object for both GET and POST methods.

Here’s hoping data fidelity serves you right, and happy encoding.

[1] http://sdllivecontent.sdl.com/LiveContent/content/en-US/SDL_Tridion_2011/concept_879633C70905448885956711778D2C0E

[2] http://mindprod.com/jgloss/encoding.html
[3] http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/

13 comments:

NivlongJanuary 29, 2012 at 6:58 PM
Great post, Elena! Very applicable since character encoding issues can cross all three consulting roles and can affect authors and users.
UnknownJanuary 30, 2012 at 6:15 AM
Well done Elena. Very good article.
NivlongJuly 30, 2013 at 12:56 PM
As more customers use Experience Manager and/or Tridion's Content Delivery Web Service, we can include any "OData" Web servers as places to also check under #5 Web Servers. :-)
simply sueJune 10, 2014 at 9:46 PM
Hi Elena, I have done all the settings you have mentioned in this article. Yet after setting up Tridion UGC 2011 SP1 My data encoding doesn't work
UnknownJuly 11, 2014 at 7:36 AM
This comment has been removed by the author.
UnknownJuly 11, 2014 at 7:39 AM
This comment has been removed by the author.
UnknownJuly 11, 2014 at 7:41 AM
To add to #4, setting the correct file encoding for a Deployer on a Windows Server is done by setting the -DFile.encoding via a jvmarg in the registry. It is the key "HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Tridion\Content Delivery\General" and this image shows an example of setting it to UTF-8 http://bkjh.home.xs4all.nl/images/ContentDeliveryJvmArg.png
OniMay 6, 2015 at 11:58 AM
This work for me:
In this file weblogic-application.xml:

webapp.encoding.default
ISO-8859-1

Saludos from Chile!
RajFebruary 11, 2016 at 1:35 AM
ultimate guide.
UnknownSeptember 20, 2017 at 9:18 PM
"I very much enjoyed this article.Nice article thanks for given this information. i hope it useful to many pepole.php jobs in hyderabad.
"
kavyasriDecember 6, 2019 at 11:02 PM
Thanks For Sharin With Us.It gave me a lot of Helpful information.

UI Development Training
UI Development Training in Hyderabad
UI Development Online Training
keerthanaAugust 14, 2020 at 9:58 AM
wonderful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.PHP Training in Chennai

PHP Online Training in Chennai

Machine Learning Training in Chennai

iOT Training in Chennai

Blockchain Training in Chennai

Open Stack Training in Chennai

haseebNovember 8, 2022 at 1:40 AM
How it works: Using Wickr Me you can make free calls and send free text messages using your webcam. Webcam: You can use your webcam for sending What Is Wickr Me

Sunday, January 29, 2012

7 Clues to Solve Character Encoding Issues

13 comments: