Wednesday, July 16, 2008

Non-English Chars in query string

Non-English chars when passed through query string can not be rebuilt to their orginal format unless some sort of encoding / decoding is applied to the chars.

In my project I had to receive Danish chars sent through query string and store them in database, the problem while saving the data was the Danish chars were either substituted by special chars (like ? or %) or were totally ignored.

After bit of R&D it was found that the query strings can be rebuilt by proper char encoding conversions. I used Ruby's Iconv library to achieve this.

An inside look

Usually we set SQL server in proper language mode. Like for danish support you would require

Danish_Norwegian_CI_AS mode enabled.

But rails by default is not equipped to differentiate lang. support. Usually rails is in UTF-8 mode.

Scenario : We need to compare for uniqueness of a user name column in users table which can contain Danish chars also. ( This is just an example as this validation can be done at Model level itself )

My comparison for existing record failed by using Find method


Problem : Rails converted the string (
Bålåji) to UTF-8 mode forming a new string ( substituting å with some special char) thereby causing my Find method to return blank value projecting as if the record does not exist in the database and we can proceed with creating a new user.

Solution : Inorder to avoid such issue, I deliberately converted the input string to ISO-8859 format so as to preserver its original mode.

This can be done by

Iconv.conv('ISO-8859-1//IGNORE', 'UTF-8', STRING)

This converts a string from UTF-8 encoding to ISO-8859-1 encoding.

Now, when I try User.find_by_name('Bålåji') it returns me the correct match.

This conversions can be used in cases like

> importing values for a txt file / CSV .

> Comparing values in ruby standalone script.