Every so often as the Tridion CMS content and design are
weaved by Editors and Developers, I encounter the unexpected character encoding
issue. The published page has a quirky A
or funny U, drawing attention to syntax and aspect rather than the pertinent
content it should.
The simple explanation for why this can happen is that the
issue is due to discrepancies in character encoding settings of the various
systems the published content will pass in its journey to its final
destination. Here are several
checkpoints I follow through when I play detective and what I look for to solve
the mystery:
1. Publication Target (Tridion): What setting has
been selected for the Publication Target Default Code Page value? I check this value in the Publication Target
properties of the Tridion CMS
Admin Panel. By default, this is set to “System Default” which will acquire the
code settings dictated by the Windows operating system of the publisher
machine. I usually change this to Unicode
(UTF-8)[1].
2. Browser: What is the browser using for its
character set? In Internet Explorer I
check View ⇒
Encoding and look to see that the Unicode (UTF-8) menu item is marked on. In Firefox, check Options ⇒ Content ⇒ Fonts & Colors
Advanced ⇒
Default Character Encoding. In Google
Chrome ⇒
Options ⇒
Under the Hood ⇒
Web Content ⇒
Customize fonts ⇒
Encoding
3. Java Virtual Machine: What JVM does the Tridion
Deployer run in (for instance one used in Tomcat), and what encodings are set
there? As from JDK 1.4 it is possible to find out what is supported by a
particular JVM via java.nio.charset. Charset. availableCharsets()[2].
4. Application JVM:
·
Is the IDE used forcing a specific encoding, for
instance if I’m using Eclipse?
·
Is any operation depending on the
standard locale for character I/O carrying along the correct encoding, for
example when reading a file? Reader r = new InputStreamReader(new FileInputStream("myfile"), "UTF-8");
·
Tridion
Deployer: if running on a file system, consider running the deployer with
-Dfile.encoding=UTF8 command options
5. Web Servers: Decoding onward the trail, despite
all of the above, most web servers are happily unaware of any encodings or
treat the communication channel as ISO-8859-1, so another two checkpoints in
one is at the level of webservers such as IIS, Tomcat or Sun Java System
Application Server. Did you know that
depending on the webserver even the requests GET and POST themselves can be
treated differently by the same webserver?
Beware these settings are server dependent and while Sun’s JSAS will
treat both GET and POST the same based on one configuration, Tomcat may not,
and IIS will expect the individual settings to be specified[3].
·
IIS/.NET web.config: <globalization
fileEncoding="UTF-8" requestEncoding="UTF-8"
responseEncoding="UTF-8"/>
·
Tomcat server.xml: set
URIEncoding="UTF-8"
·
Sun Java System Application Server sun-web.xml:
include
<parameter-encoding
default-charset="UTF-8"/>
6. Page level can override encoding directives in HTTP header
settings in:
·
HTML
<meta http-equiv="Content Type"
content="text/html; charset=UTF-8" />
·
.NET
<% @ Page
ResponseEncoding="utf-8" %>
·
Java/JSP
<%@page
pageEncoding="UTF-8"%>
<%@page
contentType="text/html;charset=UTF-8"%>
request.setCharacterEncoding("UTF-8");
·
XML
<?xml version="1.0"
encoding="UTF-8"?>
7. Create own abstract layer to interact with CM, also for
overriding server settings. If step 5 has
given you visions of long and dark nights bravely searching your server’s
documentation for that minuscule setting, there is light at the end of the
tunnel. Put your magnifying-glass
away. It is possible to establish a
server-independent encoding layer.
Consider setting a context parameter in WEB-INF/web.xml and
propagate this throughout your code by reading it before any other parameters
and passing it along in extensions of the request object for both GET and POST
methods.
Here’s hoping data fidelity serves you right,
and happy encoding.
[1] http://sdllivecontent.sdl.com/LiveContent/content/en-US/SDL_Tridion_2011/concept_879633C70905448885956711778D2C0E
[2] http://mindprod.com/jgloss/encoding.html
[3] http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/
[3] http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/