[tex-live] TL expl3 update broke a mwe for me
Ingo Krabbe
ikrabbe.ask at gmail.com
Sun Jan 3 08:34:17 CET 2016
> ! Undefined control sequence.
> <argument> ...or: for example, they allow "MASSE" and "MaÃ
> e" to match.
> l.4195 \__unicode_map_inline:n { CaseFolding.txt }
This looks like an encoding error. It would help if you copy and paste the strange output into od or xxd for example.
Your non ascii sequence seems to be C3 83 C2 9F, which appears as a double UTF-8 encoding or something similar. Either the encoding of your mail, the encoding of your system or the encoding of the CaseFolding.txt file is bad, I would bet.
With your numbers above, written in binary form you have:
11000011 10000011
and
11000010 10011111
that are quickly calculated into ascii / unicode numbers through the guessed utf-8 encoding
01. x in [000000.00000000.0bbbbbbb] → 0bbbbbbb
10. x in [000000.00000bbb.bbbbbbbb] → 110bbbbb, 10bbbbbb
11. x in [000000.bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb, 10bbbbbb
100. x in [bbbbbb.bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
where we just need the 2nd (10) rule, here.
decode_utf8(11000011 10000011) = 000 1100 0011
decode_utf8(11000010 10011111) = 000 1001 1111
This again is a UTF-8 sequence (guessed again).
decode_utf8(11000011 10011111) = 1101 1111 = DF
unicode DF = ß (latin small letter sharp s)
So "Masse and Maße" match.
First shot: What is your system encoding. Most systems now use UTF-8 encodings. Check your locale, by just typing locale. This is an output for my system:
# locale
LANG=en_US.UTF-8
LC_CTYPE=de_DE.UTF-8
LC_NUMERIC=de_DE.UTF-8
LC_TIME=de_DE.UTF-8
LC_COLLATE=de_DE.UTF-8
LC_MONETARY=de_DE.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Try your example with a utf8 system encoding.
regards
ingo
More information about the tex-live
mailing list