Don Knuth's report on the TeX tuneup of 2021 was published in TUGboat. More references and info.
This page is a summary of the bugs in TeX fixed in the 2021 tune-up, with a code example that reproduces the bug (or doesn't in the fixed version). The list is sorted by the order in errorlog.tex, plus two entries at the end that are not in that file.
Thanks to Phelype Oleinik for putting this together. Knuth is not responsible for this page. Please send corrections to webmaster@tug.org.
Summary: While writing paragraph tracing info, TeX would be writing to term_only, and if an error occurred there, the user wouldn't see and TeX would be apparently stuck (replying to the error prompt would work normally).
Reproducing: Run TeX on a file (not from the command line; TeX needs an open .log file for this) that contains the line:
\tracingparagraphs=1 Press\hss return.\end
Original report. (By the way, if Xiaosa or anyone who knows him is reading this, please contact Don Knuth; he has checks for you.)
Summary: A certain combination of replies to TeX's prompt would make TeX try to interact while in \batchmode, thus trying to write to a closed IO stream, resulting in a segfault.
Reproducing: Create a file, say, invalid.in with:
\catcode`\^=7 \catcode`\^^?=15 \s^^?E 1 q v
then run tex -ini <invalid.in. Or run tex -ini and type each line followed by RET.
Summary: If you caused an error, then replied to TeX's prompt by inserting yet another error, and then asked TeX to open the editor with E, TeX would segfault trying to get the file name from the inserted text.
Reproducing: Create a file with an error, say:
\ERROR
then run TeX on that file, and at the error prompt reply:
I\AGAIN !
and this time reply to the prompt:
E
and TeX will segfault.
Summary: Before setting \jobname (more precisely before starting the .log file) you could change the value of \year, \month, \day and \time, and that would be written in the header line of the .log. \month was particularly bad because it takes three characters from TeX's memory, so you could access arbitrary points of TeX's memory, or with extreme enough values, cause a segfault.
Reproducing: For example, run (not from a file):
tex '\month=-100000 \end'
If you choose smaller values for \month, or change the other values, the .log will contain a bogus date.
Summary: TeX would allow an implicit left brace in place of an explicit one in the #{ argument.
Reproducing:
\catcode`\{=1 \catcode`\}=2 \catcode`\#=6 \let\bgroup={ \def\foo#1#\bgroup(#1)} \show\foo \end
That shows a command delimited by \bgroup, which inserts another \bgroup at the end of the replacement text.
Summary: While scanning the parameter text of a definition with 9 parameters already, TeX would leave any token after a # to be part of the parameter text.
Reproducing:
\catcode`\{=1 \catcode`\}=2 \catcode`\#=6 \catcode`\%=14 \nonstopmode % ignore errors while defining \def\foo#1#2#3#4#5#6#7#8#9#}##{\show#9} \errorstopmode \show\foo \foo........} }# \end
That shows a command whose ninth parameter is delimited by }#, and then shows that an unbalanced } was grabbed as argument; use \tracingmacros=1 to see the } being grabbed.
Summary: After the File ended within \read error message, TeX could be holding garbage tokens in buffer, which could appear in the error context.
Reproducing: Create a file unbal.tex with a single line containing:
{
then run this:
\catcode`{=1 \catcode`}=2 \catcode`#=6 \openin1 unbal \def\A#1#2#3#4#5#6#7#8#9{\read1to \x} \def\B#1#2#3#4#5#6#7#8#9{\A#1#2#3#4#5#6#7#8#9 \relax} \def\C#1#2#3#4#5#6#7#8#9{\B#1#2#3#4#5#6#7#8#9 \relax} \def\D#1#2#3#4#5#6#7#8#9{\C#1#2#3#4#5#6#7#8#9 \relax} \def\E#1#2#3#4#5#6#7#8#9{\D#1#2#3#4#5#6#7#8#9 \relax} \E123456789
and the error would start with:
Runaway definition? ->{ ! File ended within \read. <read 1> {^^M#5#6#7#8#9{\D
The <read 1> context shown after the { is junk.
Summary: In a TFM file, a non-existent character is marked by its width index being zero, and TeX assumes that if that is true, all other metrics of said character are zero as well, but nothing was enforced.
If that weren't the case, though, when reading a character from a font, TeX would only look at its width, and assume everything else is zero, without enforcing. But if a TFM was made so that the width was zero, but for example the italic correction were not, that index would not be zeroed and the wrong italic correction would be used.
To reproduce this you need a bad TFM file with a character's width zero and other indexes non-zero (which I failed to produce).
Summary: A fraction (like {1\over2}) was written to produce an Inner math atom, but the fact that fractions are almost always enclosed in braces to delimit their scope makes the fraction an Ord atom. The only two cases where a fraction would remain an Inner atom were:
This was largely a change in The TeXbook, correcting the statements that said that fractions were Inner math atoms, but TeX was changed to not assign t:=inner_noad.
Summary: The user's choice of \newlinechar would remain when TeX was printing the final statistics, so characters that matched the value of \newlinechar would cause a newline to be printed instead.
Reproducing: For example, run:
tex '\newlinechar=32 \end'
Summary: TeX would omit a (\tabskip) glue indication when showing an underfull alignment.
Reproducing: Run this:
\catcode`\{=1 \catcode`\}=2 \catcode`\&=4 \catcode`\#=6 \showboxdepth=1 \tracingonline=1 \tabskip=0pt plus10pt \halign to200pt{\hfil\cr \hbox to50pt{}&\hbox to60pt{}\cr} \end
and you will see the last line of the box typeout says
.\glue 0.0 plus 10.0
whereas it should say
.\glue(\tabskip) 0.0 plus 10.0
Summary: The code tries to store a value that's not in the declared range of the receiving variable. In particular, in the hyphenate function, when <Look for the word |hc[1..hn]|...> (page B392, module 930) is called, hn can already be 63, but then this module increments it to 64 for a while (to fit the cur_lang byte), which puts hn out of range for a small_number.
Reproducing: This isn't discoverable by running everyday TeX. DRF found this with his modified TeX with memory address checking. The code to trigger the bug is:
\lefthyphenmin=0 \righthyphenmin=0 \hyphenation{-a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z% -a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z% -a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z} \showhyphens{abcdefghijklmnopqrstuvwxyzabcdefghijklm% nopqrstuvwxyzabcdefghijklmnopqrstuvwxyz} \end
For any discussion about these issues, or further reports to be listed here, please use the contact information on the main TeX bugs page here.