Guessing a String's Encoding Under Ruby 1.9

Posted by Paul McMahon 02/09/2011 at 18h06

For our Japanese invoicing solution, 請求書.jp, we record the initial HTTP referrer for each user who signs up to our service. Search engines have standardize on the q parameter to represent a search query, such as 請求書テンプレート, so we use this parameter to guess what query a user signs up by. Unfortunately, search engines have not standardized on an encoding to use, and the query parameter can be encoded in any one of UTF-8, EUC-JP, or Shift_JIS. To work around this, we use the following Ruby 1.9 code:

This code allows us to try each of the three expected encodings, and then encode the result into UTF-8 for display within our admin interface.