K
K
koltykov2014-12-06 09:49:44
Nginx
koltykov, 2014-12-06 09:49:44

UTF8 BOM or How does NGINX determine file encoding?

Yesterday I ran into a very interesting problem when working with nginx, over which we puzzled for 3 hours.
In general, there is a site where everything has been working in UTF-8 encoding for quite some time.
The main site in Russian: domain.com
And there is a small English. version on the subdomain en.domain.com
All files have UTF-8 meta encodings, and the files themselves are naturally saved in UTF-8.
In English. the version on the en.domain.com subdomain needed to display the Russian text, and then it suddenly turned out that the kryakozyabry was displayed. We started watching the server response from nginx, and it displays the windows-1251 encoding. They just didn’t do anything - they forced the UTF-8 encoding in the nginx config, set the header in PHP, etc. For the life of me it gives out in win1251.
The most interesting thing is that when accessing domain.com/en, it gave out a normal encoding. Configs for domain.com and en.domain.com are the same.
The problem was localized - in the ob_start function. As soon as we comment on it, everything is ok. Uncomment - kryakozyabry.
In general, we googled for a long time - we tried different options with ob - clean, flush, disabling gzip, etc. - nothing helped.
Until the idea came to save the included file where ob_start is called with VOM. After that everything got up normally.
Those. it turns out that nginx determines the file encoding in utf-8 or not according to the BOM record? And why does the forced encoding in the nginx config not work?

Answer the question

In order to leave comments, you need to log in

3 answer(s)
A
Anton Ivanov, 2014-12-06
@Fly3110

nginx doesn't pass encoding, it passes data. Headers may contain an indication of which encoding to use. But how to display this data is decided by the receiving party, for example, the browser.

K
koltykov, 2014-12-06
@koltykov

Fly3110
Here are the headers from en.domain.com:
But the same script is called without specifying the subdomain:
Why in the first case do we see an explicit indication of the windows-1251 encoding? Who prescribes NGINX or PHP5-FPM?
And how does BOM and ob_start () affect here?

P
Power, 2014-12-06
@Power

Try to see what headers are coming and in what encoding the result is by making a request:
1) to nginx directly from the server where nginx is installed;
2) directly to the backend from the server where nginx is installed.
Something like this:
Then you can determine who is to blame: the backend, nginx, or someone else (a proxy between nginx and your browser).

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question