Discussion:
[freetds] FreeTDS + SQL Server 2012 + UTF-16LE
John Anderson
2014-08-29 16:50:20 UTC
Permalink
Hello! I'm wondering if you had any tips on how to work with characters
larger than 0xFFFF? We currently have the problem where we try to
insert/update a row in the database and it gets truncated at the character
and creates an invalid syntax:

For example this character: \U0001f44d

We've tracked this down to the fact that FreeTDS is trying to convert this
to UCS-2 rather than UTF16-LE, which is what SQL Server 2012 uses.

For example if we say:

INSERT INTO table(name) VALUES(N' Hello \U0001f44d')

We get the error:

Unclosed quotation mark after the character string 'Hello ' because it
stops processing the rest of the query.

Is there a freetds.conf or environment setting we can use to tell it to use
UTF-16LE instead of UCS2 so that we can accept these larger character sets?

As a temporary solution we've started sending the raw UTF-16LE bytes to
FreeTDS but we prefer not having to do this.

Thanks,
John
Frediano Ziglio
2014-08-29 16:56:19 UTC
Permalink
Post by John Anderson
Hello! I'm wondering if you had any tips on how to work with characters
larger than 0xFFFF? We currently have the problem where we try to
insert/update a row in the database and it gets truncated at the character
For example this character: \U0001f44d
We've tracked this down to the fact that FreeTDS is trying to convert this
to UCS-2 rather than UTF16-LE, which is what SQL Server 2012 uses.
INSERT INTO table(name) VALUES(N' Hello \U0001f44d')
How are you using this string?
Post by John Anderson
Unclosed quotation mark after the character string 'Hello ' because it
stops processing the rest of the query.
Is there a freetds.conf or environment setting we can use to tell it to use
UTF-16LE instead of UCS2 so that we can accept these larger character sets?
As a temporary solution we've started sending the raw UTF-16LE bytes to
FreeTDS but we prefer not having to do this.
Thanks,
John
Frediano
John Anderson
2014-08-29 17:07:16 UTC
Permalink
Post by John Anderson
Post by John Anderson
Hello! I'm wondering if you had any tips on how to work with characters
larger than 0xFFFF? We currently have the problem where we try to
insert/update a row in the database and it gets truncated at the
character
Post by John Anderson
For example this character: \U0001f44d
We've tracked this down to the fact that FreeTDS is trying to convert
this
Post by John Anderson
to UCS-2 rather than UTF16-LE, which is what SQL Server 2012 uses.
INSERT INTO table(name) VALUES(N' Hello \U0001f44d')
How are you using this string?
Sorry, this is the python representation of the string which is a "Thumbs
Up", It's UTF-8 hex representation is 0xf09f918d and what needs to be
stored in the database is UTF-16LE hex 0x3dd84ddc.

This is the character:

http://www.fileformat.info/info/unicode/char/1F44D/index.htm

But it is any character above 0xFFFF will cause the issue.

So the actually query ends up looking like N'Hello <display of a thumbs up>'
Post by John Anderson
Post by John Anderson
Unclosed quotation mark after the character string 'Hello ' because it
stops processing the rest of the query.
Is there a freetds.conf or environment setting we can use to tell it to
use
Post by John Anderson
UTF-16LE instead of UCS2 so that we can accept these larger character
sets?
Post by John Anderson
As a temporary solution we've started sending the raw UTF-16LE bytes to
FreeTDS but we prefer not having to do this.
Thanks,
John
Frediano
_______________________________________________
FreeTDS mailing list
FreeTDS at lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/freetds
Marc Abramowitz
2014-08-30 21:16:37 UTC
Permalink
Interesting. John, I wonder if you can write a little C program that illustrates the bug? I think that the FreeTDS authors are most comfortable in C, so this will probably illustrate the problem best for them and show beyond a shadow of a doubt that there's nothing happening in pymssql.

-Marc
http://marc-abramowitz.com
Sent from my iPhone 4S
Post by John Anderson
Post by John Anderson
Post by John Anderson
Hello! I'm wondering if you had any tips on how to work with characters
larger than 0xFFFF? We currently have the problem where we try to
insert/update a row in the database and it gets truncated at the
character
Post by John Anderson
For example this character: \U0001f44d
We've tracked this down to the fact that FreeTDS is trying to convert
this
Post by John Anderson
to UCS-2 rather than UTF16-LE, which is what SQL Server 2012 uses.
INSERT INTO table(name) VALUES(N' Hello \U0001f44d')
How are you using this string?
Sorry, this is the python representation of the string which is a "Thumbs
Up", It's UTF-8 hex representation is 0xf09f918d and what needs to be
stored in the database is UTF-16LE hex 0x3dd84ddc.
http://www.fileformat.info/info/unicode/char/1F44D/index.htm
But it is any character above 0xFFFF will cause the issue.
So the actually query ends up looking like N'Hello <display of a thumbs up>'
Post by John Anderson
Post by John Anderson
Unclosed quotation mark after the character string 'Hello ' because it
stops processing the rest of the query.
Is there a freetds.conf or environment setting we can use to tell it to
use
Post by John Anderson
UTF-16LE instead of UCS2 so that we can accept these larger character
sets?
Post by John Anderson
As a temporary solution we've started sending the raw UTF-16LE bytes to
FreeTDS but we prefer not having to do this.
Thanks,
John
Frediano
_______________________________________________
FreeTDS mailing list
FreeTDS at lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/freetds
_______________________________________________
FreeTDS mailing list
FreeTDS at lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/freetds
Loading...