Why is the ODBC driver (SQL Server 2012) returning data in the wrong encoding?

K

K. A.2019-05-17 16:20:57

PHP

K. A., 2019-05-17 16:20:57

Faced such a problem - there is a remote SQL Server 2012 database, in which there are functions that return data.
The application is deployed in a virtual machine (ubuntu 18.04), the connection is configured via the ODBC driver version 17 (because it was not possible to "negotiate" with the built-in driver):

[email protected]:~$ odbcinst -j
unixODBC 2.3.7
DRIVERS............: /etc/odbcinst.ini
SYSTEM DATA SOURCES: /etc/odbc.ini
FILE DATA SOURCES..: /etc/ODBCDataSources
USER DATA SOURCES..: /home/user/.odbc.ini
SQLULEN Size.......: 8
SQLLEN Size........: 8
SQLSETPOSIROW Size.: 8

I send a query in the sqlcmd console I get the expected data type (Cyrillic string): But when I send the same query via DB::select() I get the result: Same thing via tinker. The encoding is defined as UTF-8, in the database the data is in UTF-8 (as they assure me, at least there is no access to the database itself, with selections and table views the same thing - only a set of functions and storages) If you bring it "forcibly" to UTF- 8 (mb_convert_encoding) the result is better: But the tail remains, which cannot be brought to normal in any way, and it is clear that there is garbage in fact and not the data that is in the database. There is a suspicion that somewhere the string is trimmed by the number of bits or something like that, because all broken strings are no more than 57-59 characters long, and those that are shorter are displayed normally.
SELECT fld FROM foo.fnBar(123, 'username');

Explicitly specifying the encoding when connecting does not help the situation. If you pass the AutoTranslate = no parameter when connecting, then question marks are returned instead of Cyrillic, but the number of characters shows that the string is full and not truncated, there are no left hex codes.:

If you do not convert to UTF-8:

the question is how this is done overcome?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

K

K. A., 2019-05-20
@f_u_s_s

In general, the problem turned out to be in the wrong encoding on the SQL Server side, as it turned out, he still gave cp1251. But the "tails" themselves were still given by the driver, so I had to register when connecting
And run all the answers from the base through my function (I'll leave it here, maybe it will come in handy for someone)

function fixCharset($data, $from = 'cp1251', $to = 'utf8'){
    if(is_scalar($data)){
        return mb_convert_encoding($data, $to, $from);
    }
    else {
        if ($to_object = is_object($data))
            $data = (array)$data;

        array_walk_recursive($data, function (&$item, $key) use ($from, $to) {
            $item = mb_convert_encoding($item, $to, $from);
        });

        return $to_object ? (object)$data : $data;
    }
}

Because the application is read-only, that's enough. To speed up, all transcoded results are cached in redis, but this is most likely a savings on matches.

P

profesor08, 2019-05-17
@profesor08

Use the sqlsrv driver to work with the database.
Specify encoding. Well, you can use mb_ detect _encoding, mb_convert_encoding