V
V
Vyacheslav Golovanov2012-01-09 18:25:05
Perl
Vyacheslav Golovanov, 2012-01-09 18:25:05

Template::Toolkit and utf-8 in templates

The next problem keeps me awake.

perl, v5.10.1 built for MSWin32-x86-multi-thread (acivestate)
template::toolkit 2.22

Scripts use utf8 everywhere, all scripts and all templates have a BOM header, everything is saved in utf.
The initialization is done like this:
$tt = Template->new({
INCLUDE_PATH => $$cfg{tpl_path},
ENCODING => 'utf8',
}) || die "$tt::ERROR\n";

Problem:
if the template contains a non-ascii character, for example, any Russian letter, then the output is a mess like "Глупый Ð²Ð¾Ð¿Ñ€Ð¾Ñ " (while the Russian letters that were present in the template , are read).

If there is only ascii in the template, then everything works fine, including Russian strings that are taken from the database and inserted into the templates.
That is, TT does not want to work normally with templates that initially contain Russian letters. At the same time, it builds pages according to these templates without problems, even if Russian letters appear in the variable values.
I set the binmode => ':utf8' option in process() calls, it doesn't help.

How is it treated?

Answer the question

In order to leave comments, you need to log in

3 answer(s)
V
Vyacheslav Golovanov, 2012-01-09
@SLY_G

Thank you all for your help.
The problem is solved by a single directive:
$dbh -> {'mysql_enable_utf8'} = 1;
The default is "0", so I suspect TT converted data from sql to utf on its own, but somehow in the wrong place, as a result, some data turned out to be converted to utf twice.

V
Vyacheslav Golovanov, 2012-01-09
@SLY_G

It seems that the problem is still not in TT.
This is the script:
use utf8;
use open OUT => ':utf8';
use DBI;
my $dbh = DBI->connect("DBI:mysql:database=mybase;host=localhost;port=3306", "login", "pass");
#$dbh->do('SET CHARACTER SET utf8');
open TST, '>utftest1.txt';
binmode TST;
print TST "Russian";
$test = $dbh->selectrow_array("SELECT 'Russian'");
open TST, '>utftest2.txt';
binmode TST;
print TST $test;
The file utftest1.txt contains 14 bytes, and if you look at it with any text editor, you can see the word "Russian".
The file utftest2.txt contains 28 bytes of unknown what (double transcoding?):
0000000000: C3 91 C2 80 C3 91 C2 83 │ C3 91 C2 81 C3 91 C2 81
0000000010: C3 90 C2 BA C3 90 C2 B8 │ C3 90 C2
B9 scripts! And problems begin if Russian characters are inserted into a script or template.

V
Vyacheslav Golovanov, 2012-01-09
@SLY_G

It was possible to make a small script where the error occurs.
script file:
use utf8;
use strict;
use vars qw($dbh $tt);
use DBI;
use Template;
$dbh = DBI->connect("DBI:mysql:database=mybase;host=localhost;port=3306", "login", "pass");
$tt = Template->new({
INCLUDE_PATH => '.',
DEFAULT_ENCODING => 'utf8',
ENCODING => 'utf8',
}) || die "$tt::ERROR\n";
my $testvar = $dbh->selectrow_array("SELECT 'Text'");
$tt->process('template.htm', { 'testvar' => $testvar }) || die $tt->error(), "\n";
The template is in a separate template.
then it works OK, the output is "Text and some ascii text."
If you add Russian text to the template, for example
[% testvar %] Russian and some ascii text.
then the output is a mess:
Ð¢ÐµÐºÑ Ñ‚ Russian and some ascii text.
At the same time, both files, both the script and the template, are saved in utf-8 with the BOM header.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question