There is an unsaved comment in progress. You will lose your changes if you continue. Are you sure you want to reopen the work item?
issues for encoding unicode string in Recognise function
The logic in your Recognise function has flaws. I suggest the sequence is:
Numeric -->AlphaNumeric -->Kanji -->utf8
The purpose is to reduce the encoded size of Matrix.
However,your logic always return utf-8 for unicode (except kanji).
e.g. "123支花朵" is Chinese, logically, if we don't assign an encoding name, program should give encoding name "gb2321" or "GB18030" with ECIValue 29 and 8 bit mode instead of utf-8.