Discussion:
[freetds] Unicode problem
Vano Beridze
2013-09-03 10:20:28 UTC
Permalink
Hello,

I've got windows 7 machine that runs SQL Server Express 2008 R2.
I've some table with nvarchar field that contains unicode data.

Another linux (CentOS 6.4) machine runs Apache 2.2 with php 5.3.3.
freetds version is 0.91

This is the /etc/freetds.conf file
[global]
# TDS protocol version
tds version = 8.0
port = 1433
client charset = UTF-8
# Whether to write a TDSDUMP file for diagnostic purposes
# (setting this to /tmp is insecure on a multi-user system)
dump file = /tmp/freetds.log
debug flags = 0x4fff

# Command and connection timeouts
timeout = 180
connect timeout = 180

# If you get out-of-memory errors, it may mean that your client
# is trying to allocate a huge buffer for a TEXT field.
# Try setting 'text size' to a more reasonable limit
text size = 20971520

I've created simple php script that connects to sql server and reads
data from that table (actually there is a view to hide unnecessary fields)

Here is the php script
<?php
echo "<html><head>";
echo '<meta http-equiv="Content-type" content="text/html; charset=utf-8"
/>';
echo "</head>";
echo "<body>";
echo "<h1>Hello</h1>";
$connect = odbc_connect("server", "someuser", "somepass");

if (!$connect) {
echo '<h3 color="red">Could not connect to the database!!!</h3>';
die();
}


# query the users table for all fields
$query = "SELECT odf_address FROM dbo.odf";

# perform the query
$result = odbc_exec($connect, $query);

# fetch the data from the database
while(odbc_fetch_row($result)) {
#$field1 = odbc_result($result, 1);
$field2 = odbc_result($result, 1);
#echo $field1;
#echo "<br>";
echo $field2;
echo "<br>";
}

# close the connection
odbc_close($connect);
echo "</body></html>";
?>

The script works fine. it connects and gets the data. The only problem I
have is that data is shown as question marks, so it's not converted to
UTF-8. I've read all the information I could find on the internet but
can not solve the problem.

Could you please suggest what's the problem?
--
Vano Beridze
Software Developer
Silk Road Group
Vano Beridze
2013-09-03 10:56:54 UTC
Permalink
Hello,

I've got windows 7 machine that runs SQL Server Express 2008 R2.
I've some table with nvarchar field that contains unicode data.

Another linux (CentOS 6.4) machine runs Apache 2.2 with php 5.3.3.
freetds version is 0.91

This is the /etc/freetds.conf file
[global]
# TDS protocol version
tds version = 8.0
port = 1433
client charset = UTF-8
# Whether to write a TDSDUMP file for diagnostic purposes
# (setting this to /tmp is insecure on a multi-user system)
dump file = /tmp/freetds.log
debug flags = 0x4fff

# Command and connection timeouts
timeout = 180
connect timeout = 180

# If you get out-of-memory errors, it may mean that your client
# is trying to allocate a huge buffer for a TEXT field.
# Try setting 'text size' to a more reasonable limit
text size = 20971520

I've created simple php script that connects to sql server and reads
data from that table (actually there is a view to hide unnecessary fields)

Here is the php script
<?php
echo "<html><head>";
echo '<meta http-equiv="Content-type" content="text/html; charset=utf-8"
/>';
echo "</head>";
echo "<body>";
echo "<h1>Hello</h1>";
$connect = odbc_connect("server", "someuser", "somepass");

if (!$connect) {
echo '<h3 color="red">Could not connect to the database!!!</h3>';
die();
}


# query the users table for all fields
$query = "SELECT odf_address FROM dbo.odf";

# perform the query
$result = odbc_exec($connect, $query);

# fetch the data from the database
while(odbc_fetch_row($result)) {
#$field1 = odbc_result($result, 1);
$field2 = odbc_result($result, 1);
#echo $field1;
#echo "<br>";
echo $field2;
echo "<br>";
}

# close the connection
odbc_close($connect);
echo "</body></html>";
?>

The script works fine. it connects and gets the data. The only problem I
have is that data is shown as question marks, so it's not converted to
UTF-8. I've read all the information I could find on the internet but
can not solve the problem.

Could you please suggest what's the problem?
--
Vano Beridze
Software Developer
Silk Road Group
Frediano Ziglio
2013-09-03 11:45:04 UTC
Permalink
How did you configure odbc.ini ?

Try to add ClientCharset (see
http://www.freetds.org/userguide/odbcconnattr.htm)

Frediano


2013/9/3 Vano Beridze <vano.beridze at silkroad.ge>
Post by Vano Beridze
Hello,
I've got windows 7 machine that runs SQL Server Express 2008 R2.
I've some table with nvarchar field that contains unicode data.
Another linux (CentOS 6.4) machine runs Apache 2.2 with php 5.3.3.
freetds version is 0.91
This is the /etc/freetds.conf file
[global]
# TDS protocol version
tds version = 8.0
port = 1433
client charset = UTF-8
# Whether to write a TDSDUMP file for diagnostic purposes
# (setting this to /tmp is insecure on a multi-user system)
dump file = /tmp/freetds.log
debug flags = 0x4fff
# Command and connection timeouts
timeout = 180
connect timeout = 180
# If you get out-of-memory errors, it may mean that your client
# is trying to allocate a huge buffer for a TEXT field.
# Try setting 'text size' to a more reasonable limit
text size = 20971520
I've created simple php script that connects to sql server and reads
data from that table (actually there is a view to hide unnecessary fields)
Here is the php script
<?php
echo "<html><head>";
echo '<meta http-equiv="Content-type" content="text/html; charset=utf-8"
/>';
echo "</head>";
echo "<body>";
echo "<h1>Hello</h1>";
$connect = odbc_connect("server", "someuser", "somepass");
if (!$connect) {
echo '<h3 color="red">Could not connect to the database!!!</h3>';
die();
}
# query the users table for all fields
$query = "SELECT odf_address FROM dbo.odf";
# perform the query
$result = odbc_exec($connect, $query);
# fetch the data from the database
while(odbc_fetch_row($result)) {
#$field1 = odbc_result($result, 1);
$field2 = odbc_result($result, 1);
#echo $field1;
#echo "<br>";
echo $field2;
echo "<br>";
}
# close the connection
odbc_close($connect);
echo "</body></html>";
?>
The script works fine. it connects and gets the data. The only problem I
have is that data is shown as question marks, so it's not converted to
UTF-8. I've read all the information I could find on the internet but
can not solve the problem.
Could you please suggest what's the problem?
--
Vano Beridze
Software Developer
Silk Road Group
_______________________________________________
FreeTDS mailing list
FreeTDS at lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/freetds
Vano Beridze
2013-09-03 13:09:07 UTC
Permalink
odbc.ini

[server]
Driver = FreeTDS
Description = SQL 2008
Trace = No
Server = 192.168.1.50
Port = 1433
TDS Version = 8.0
Database = mydb

ClientCharset A name recognized by the iconv library linked to FreeTDS.
Corresponds to client charset in freetds.conf. ISO 8859-1 Character
set (encoding) used by the client.


I've defined UTF-8 in freetds.conf
Is it necessary to indiciate it during odbc connection?

Vano Beridze
Software Developer
Silk Road Group
Post by Frediano Ziglio
How did you configure odbc.ini ?
Try to add ClientCharset (see
http://www.freetds.org/userguide/odbcconnattr.htm)
Frediano
2013/9/3 Vano Beridze <vano.beridze at silkroad.ge>
Post by Vano Beridze
Hello,
I've got windows 7 machine that runs SQL Server Express 2008 R2.
I've some table with nvarchar field that contains unicode data.
Another linux (CentOS 6.4) machine runs Apache 2.2 with php 5.3.3.
freetds version is 0.91
This is the /etc/freetds.conf file
[global]
# TDS protocol version
tds version = 8.0
port = 1433
client charset = UTF-8
# Whether to write a TDSDUMP file for diagnostic purposes
# (setting this to /tmp is insecure on a multi-user system)
dump file = /tmp/freetds.log
debug flags = 0x4fff
# Command and connection timeouts
timeout = 180
connect timeout = 180
# If you get out-of-memory errors, it may mean that your client
# is trying to allocate a huge buffer for a TEXT field.
# Try setting 'text size' to a more reasonable limit
text size = 20971520
I've created simple php script that connects to sql server and reads
data from that table (actually there is a view to hide unnecessary fields)
Here is the php script
<?php
echo "<html><head>";
echo '<meta http-equiv="Content-type" content="text/html; charset=utf-8"
/>';
echo "</head>";
echo "<body>";
echo "<h1>Hello</h1>";
$connect = odbc_connect("server", "someuser", "somepass");
if (!$connect) {
echo '<h3 color="red">Could not connect to the database!!!</h3>';
die();
}
# query the users table for all fields
$query = "SELECT odf_address FROM dbo.odf";
# perform the query
$result = odbc_exec($connect, $query);
# fetch the data from the database
while(odbc_fetch_row($result)) {
#$field1 = odbc_result($result, 1);
$field2 = odbc_result($result, 1);
#echo $field1;
#echo "<br>";
echo $field2;
echo "<br>";
}
# close the connection
odbc_close($connect);
echo "</body></html>";
?>
The script works fine. it connects and gets the data. The only problem I
have is that data is shown as question marks, so it's not converted to
UTF-8. I've read all the information I could find on the internet but
can not solve the problem.
Could you please suggest what's the problem?
--
Vano Beridze
Software Developer
Silk Road Group
_______________________________________________
FreeTDS mailing list
FreeTDS at lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/freetds
_______________________________________________
FreeTDS mailing list
FreeTDS at lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/freetds
Vano Beridze
2013-09-03 13:13:50 UTC
Permalink
I've added ClientCharset to odbc.ini and restarted httpd.
Still the same result.

Vano Beridze
Software Developer
Silk Road Group
Post by Frediano Ziglio
odbc.ini
[server]
Driver = FreeTDS
Description = SQL 2008
Trace = No
Server = 192.168.1.50
Port = 1433
TDS Version = 8.0
Database = mydb
ClientCharset A name recognized by the iconv library linked to FreeTDS.
Corresponds to client charset in freetds.conf. ISO 8859-1 Character
set (encoding) used by the client.
I've defined UTF-8 in freetds.conf
Is it necessary to indiciate it during odbc connection?
Vano Beridze
Software Developer
Silk Road Group
Post by Frediano Ziglio
How did you configure odbc.ini ?
Try to add ClientCharset (see
http://www.freetds.org/userguide/odbcconnattr.htm)
Frediano
2013/9/3 Vano Beridze <vano.beridze at silkroad.ge>
Post by Vano Beridze
Hello,
I've got windows 7 machine that runs SQL Server Express 2008 R2.
I've some table with nvarchar field that contains unicode data.
Another linux (CentOS 6.4) machine runs Apache 2.2 with php 5.3.3.
freetds version is 0.91
This is the /etc/freetds.conf file
[global]
# TDS protocol version
tds version = 8.0
port = 1433
client charset = UTF-8
# Whether to write a TDSDUMP file for diagnostic purposes
# (setting this to /tmp is insecure on a multi-user system)
dump file = /tmp/freetds.log
debug flags = 0x4fff
# Command and connection timeouts
timeout = 180
connect timeout = 180
# If you get out-of-memory errors, it may mean that your client
# is trying to allocate a huge buffer for a TEXT field.
# Try setting 'text size' to a more reasonable limit
text size = 20971520
I've created simple php script that connects to sql server and reads
data from that table (actually there is a view to hide unnecessary fields)
Here is the php script
<?php
echo "<html><head>";
echo '<meta http-equiv="Content-type" content="text/html; charset=utf-8"
/>';
echo "</head>";
echo "<body>";
echo "<h1>Hello</h1>";
$connect = odbc_connect("server", "someuser", "somepass");
if (!$connect) {
echo '<h3 color="red">Could not connect to the database!!!</h3>';
die();
}
# query the users table for all fields
$query = "SELECT odf_address FROM dbo.odf";
# perform the query
$result = odbc_exec($connect, $query);
# fetch the data from the database
while(odbc_fetch_row($result)) {
#$field1 = odbc_result($result, 1);
$field2 = odbc_result($result, 1);
#echo $field1;
#echo "<br>";
echo $field2;
echo "<br>";
}
# close the connection
odbc_close($connect);
echo "</body></html>";
?>
The script works fine. it connects and gets the data. The only problem I
have is that data is shown as question marks, so it's not converted to
UTF-8. I've read all the information I could find on the internet but
can not solve the problem.
Could you please suggest what's the problem?
--
Vano Beridze
Software Developer
Silk Road Group
_______________________________________________
FreeTDS mailing list
FreeTDS at lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/freetds
_______________________________________________
FreeTDS mailing list
FreeTDS at lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/freetds
_______________________________________________
FreeTDS mailing list
FreeTDS at lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/freetds
James K. Lowden
2013-09-05 02:51:49 UTC
Permalink
On Tue, 03 Sep 2013 14:20:28 +0400
Post by Vano Beridze
The script works fine. it connects and gets the data. The only
problem I have is that data is shown as question marks, so it's not
converted to UTF-8. I've read all the information I could find on the
internet but can not solve the problem.
I would set TDSDUMP and examine the log. You're right that
FreeTDS would substitute question marks for any non-convertible
character. Perhaps the character set name you specified in
freetds.conf isn't exactly right, perhaps because you're linked to the
system iconv library.

The status of iconv is listed at the top of the log. If it says it set
the client charset to UTF-8, that's what it will use.

A different problem that sometimes crops up: sometimes the "Unicode"
data on the server isn't UCS2. In that event, the server will send the
data, but the iconv library will be unable to convert the characters
correctly.

--jkl
Vano Beridze
2013-09-09 18:48:48 UTC
Permalink
This is the first lines of /tmp/freetds.log

log.c:196:Starting log file for FreeTDS 0.91
on 2013-07-05 23:20:33 with debug flags 0x4fff.
iconv.c:330:tds_iconv_open(0x7fd771f79e10, UTF-8)
iconv.c:187:local name for ISO-8859-1 is ISO-8859-1
iconv.c:187:local name for UTF-8 is UTF-8
iconv.c:187:local name for UCS-2LE is UCS-2LE
iconv.c:187:local name for UCS-2BE is UCS-2BE
iconv.c:349:setting up conversions for client charset "UTF-8"
iconv.c:351:preparing iconv for "UTF-8" <-> "UCS-2LE" conversion
iconv.c:391:preparing iconv for "ISO-8859-1" <-> "UCS-2LE" conversion

As for UCS2 data, table column type is nvarchar(30) and as I know if the
column is nvarchar or nchar it is stored as UCS2

Vano Beridze
Software Developer
Silk Road Group
Post by James K. Lowden
On Tue, 03 Sep 2013 14:20:28 +0400
Post by Vano Beridze
The script works fine. it connects and gets the data. The only
problem I have is that data is shown as question marks, so it's not
converted to UTF-8. I've read all the information I could find on the
internet but can not solve the problem.
I would set TDSDUMP and examine the log. You're right that
FreeTDS would substitute question marks for any non-convertible
character. Perhaps the character set name you specified in
freetds.conf isn't exactly right, perhaps because you're linked to the
system iconv library.
The status of iconv is listed at the top of the log. If it says it set
the client charset to UTF-8, that's what it will use.
A different problem that sometimes crops up: sometimes the "Unicode"
data on the server isn't UCS2. In that event, the server will send the
data, but the iconv library will be unable to convert the characters
correctly.
--jkl
_______________________________________________
FreeTDS mailing list
FreeTDS at lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/freetds
James K. Lowden
2013-09-10 03:58:59 UTC
Permalink
On Mon, 09 Sep 2013 22:48:48 +0400
Post by Vano Beridze
This is the first lines of /tmp/freetds.log
log.c:196:Starting log file for FreeTDS 0.91
on 2013-07-05 23:20:33 with debug flags 0x4fff.
iconv.c:330:tds_iconv_open(0x7fd771f79e10, UTF-8)
iconv.c:187:local name for ISO-8859-1 is ISO-8859-1
iconv.c:187:local name for UTF-8 is UTF-8
iconv.c:187:local name for UCS-2LE is UCS-2LE
iconv.c:187:local name for UCS-2BE is UCS-2BE
iconv.c:349:setting up conversions for client charset "UTF-8"
iconv.c:351:preparing iconv for "UTF-8" <-> "UCS-2LE" conversion
iconv.c:391:preparing iconv for "ISO-8859-1" <-> "UCS-2LE" conversion
OK, you're using an iconv library, and UTF-8 for a client character
set.

If you select just one row with tsql, is it displayed correctly? If
not, does tsql print an error about not being able to convert?

If the FreeTDS can't convert the data received from the server to the
client's encoding, it raises an error. tsql will print that message if
it occurs. There are 3 possibilities:

1. tsql displays the characters correctly in the terminal. Hooray!
Go look into PHP.

2. tsql does not display the characters correctly, and produces a
message saying it encountered problems during conversion. That's an
iconv problem. Check your documentation and make sure iconv(1) works
on known data.

You can also execute a query like

select cast('hello' as nvarchar(5)) as hello

If tsql displays that correctly, you might have corrupt "Unicode"
data in the nvarchar column you're working with.

3. tsql does not display the characters correctly, but produces no
error message. If you can't read the characters and don't see an error,
then FreeTDS encountered no difficulty converting, but your system
can't actually display the data. Perhaps because of fonts or
something.
Post by Vano Beridze
As for UCS2 data, table column type is nvarchar(30) and as I know if
the column is nvarchar or nchar it is stored as UCS2
Figure out what the bit representation is for one row, and satisfy
yourself that it's little-endian 2-byte Unicode. You can use "select
cast( foo as varbinary )" to get a hex representation, or look at the
data packet in a TDSDUMP log.

HTH.

--jkl
Post by Vano Beridze
Post by James K. Lowden
On Tue, 03 Sep 2013 14:20:28 +0400
Post by Vano Beridze
The script works fine. it connects and gets the data. The only
problem I have is that data is shown as question marks, so it's not
converted to UTF-8. I've read all the information I could find on
the internet but can not solve the problem.
I would set TDSDUMP and examine the log. You're right that
FreeTDS would substitute question marks for any non-convertible
character. Perhaps the character set name you specified in
freetds.conf isn't exactly right, perhaps because you're linked to
the system iconv library.
The status of iconv is listed at the top of the log. If it says it
set the client charset to UTF-8, that's what it will use.
A different problem that sometimes crops up: sometimes the "Unicode"
data on the server isn't UCS2. In that event, the server will send
the data, but the iconv library will be unable to convert the
characters correctly.
--jkl
Vano Beridze
2013-09-14 09:57:53 UTC
Permalink
It was the first case.
tsql displayed characters correctly.

then I looked into php and it seems the problem was with odbc library.
I could not get it working and then switched to mssql library and
finally got it working.

Thank you for your time and attention.

Vano Beridze
Software Developer
Silk Road Group
Post by James K. Lowden
On Mon, 09 Sep 2013 22:48:48 +0400
Post by Vano Beridze
This is the first lines of /tmp/freetds.log
log.c:196:Starting log file for FreeTDS 0.91
on 2013-07-05 23:20:33 with debug flags 0x4fff.
iconv.c:330:tds_iconv_open(0x7fd771f79e10, UTF-8)
iconv.c:187:local name for ISO-8859-1 is ISO-8859-1
iconv.c:187:local name for UTF-8 is UTF-8
iconv.c:187:local name for UCS-2LE is UCS-2LE
iconv.c:187:local name for UCS-2BE is UCS-2BE
iconv.c:349:setting up conversions for client charset "UTF-8"
iconv.c:351:preparing iconv for "UTF-8" <-> "UCS-2LE" conversion
iconv.c:391:preparing iconv for "ISO-8859-1" <-> "UCS-2LE" conversion
OK, you're using an iconv library, and UTF-8 for a client character
set.
If you select just one row with tsql, is it displayed correctly? If
not, does tsql print an error about not being able to convert?
If the FreeTDS can't convert the data received from the server to the
client's encoding, it raises an error. tsql will print that message if
1. tsql displays the characters correctly in the terminal. Hooray!
Go look into PHP.
2. tsql does not display the characters correctly, and produces a
message saying it encountered problems during conversion. That's an
iconv problem. Check your documentation and make sure iconv(1) works
on known data.
You can also execute a query like
select cast('hello' as nvarchar(5)) as hello
If tsql displays that correctly, you might have corrupt "Unicode"
data in the nvarchar column you're working with.
3. tsql does not display the characters correctly, but produces no
error message. If you can't read the characters and don't see an error,
then FreeTDS encountered no difficulty converting, but your system
can't actually display the data. Perhaps because of fonts or
something.
Post by Vano Beridze
As for UCS2 data, table column type is nvarchar(30) and as I know if
the column is nvarchar or nchar it is stored as UCS2
Figure out what the bit representation is for one row, and satisfy
yourself that it's little-endian 2-byte Unicode. You can use "select
cast( foo as varbinary )" to get a hex representation, or look at the
data packet in a TDSDUMP log.
HTH.
--jkl
Post by Vano Beridze
Post by James K. Lowden
On Tue, 03 Sep 2013 14:20:28 +0400
Post by Vano Beridze
The script works fine. it connects and gets the data. The only
problem I have is that data is shown as question marks, so it's not
converted to UTF-8. I've read all the information I could find on
the internet but can not solve the problem.
I would set TDSDUMP and examine the log. You're right that
FreeTDS would substitute question marks for any non-convertible
character. Perhaps the character set name you specified in
freetds.conf isn't exactly right, perhaps because you're linked to
the system iconv library.
The status of iconv is listed at the top of the log. If it says it
set the client charset to UTF-8, that's what it will use.
A different problem that sometimes crops up: sometimes the "Unicode"
data on the server isn't UCS2. In that event, the server will send
the data, but the iconv library will be unable to convert the
characters correctly.
--jkl
_______________________________________________
FreeTDS mailing list
FreeTDS at lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/freetds
Loading...