Mysql Query With Regex Unicode
Solution 1:
MySQL does not have \u escapes. Try to include the raw Unicode character in the query string, and pass it to MySQL in a utf8 connection. How you might do that depends on what language and connector you are using to talk to MySQL. Best would be to pass the pattern string in a parameter from your language's native Unicode string type if you have one; for example in Python-MySQLdb I can just do:
group= u'[أإاآ]'
pattern= u'%sر%sء' % (chars, chars)
connection.execute('SELECT * FROM work WHERE title REGEX %s', [pattern])
(nb no pipe characters needed in a regex character group)
If you really can't get Unicode down your connection at all, MySQL does have a non-standard binary string escape which you could use to get the characters in through another encoding:
WHERE title REGEX 0x5bd8a3d8a5d8a7d8a25dd8b15bd8a3d8a5d8a7d8a25dd8a1 AS utf8 - hex-encoded UTF-8 encoded string
Generally you want to avoid using REGEX
because it means any index on the title
column will be ineffective and a full table search will be forced.
One alternative would be to do a WHERE title IN
a list of all 16 possible strings that would match the expression.
(The most performant approach would be to use a database collation which already treats all four characters as equal. I'm not aware of a collation that matches that sloppily though.)
Solution 2:
The utf8 for those 4 variants of Alef are D8A3 D8A5 D8A7 D8A2. So,
WHERE HEX(title) REGEXP '^(..)*D8(A3|A5|A7|A2)'
will check for the presence of any of them.
The ^(..)*
matches any number of pairs of characters (hex, in this case) at the beginning of title
, then look for any of those 2-byte utf8 codes.
This might be what you are striving for:
$SQL=" select * from work
where HEX(title)
REGEX '^(..)*D8(A2|A3|A5|A7)D8B1D8(A2|A3|A5|A7)D8A1';
^(..)*
is to skip over an even number of hex characters (to keep aligned).
D8(A2|A3|A5|A7)
is the utf8 encoding for the 4 Alefs.
D8B1
is for Reh.
Post a Comment for "Mysql Query With Regex Unicode"