[texhax] makeindex breaks up index group on a capitalized entry

geolsoft at mail.ru geolsoft at mail.ru
Mon Aug 23 15:10:26 CEST 2004


On Mon, Aug 23, 2004 at 12:57:23PM +0200, Stepan Kasal wrote:
> I have extracted the makeinfo sources from the huge
> tetex-src-2.0.2.tar.gz, so if you want to experiment with the source,
> you can get it---it's just 61 KB, and makeindex is very independent
> on other tetex componenets.

and later:

> You can try this:
> 
> #define TOLOWER(C) ( (unsigned char)tolower((unsigned char)(C)) )
> 
> and thus eliminate the usage isupper().
> 
> Or you can hardcode your own version of TOLOWER or first_letter,
> suitable for KOI8.
> This way, you should get a working makeindex program.


I did that, and did some experimenting.  I actually had
written a small test program:


#include <ctype.h>
#include <stdio.h>
#include <locale.h>


int main(void)
{
    setlocale(LC_ALL, "");
    putchar(tolower('M'));
    return 0;
}


where the parameter to tolower() was actually an uppercase
Cyrillic `M', and it did downcase it (with
LC_ALL=ru_RU.KOI8-R).  But when I added:


#define TOLOWER(C) ( (unsigned char)tolower((unsigned char)(C)) )


to the makeindex source, as you recommended, it still did
not do any good.  So I checked the source you provided for
makeindex, and this is what I came up with:

- sortid.c sets current locale only temporarily (for
  LC_COLLATE) while doing the sorting, and resets `old'
  locale thereafter;

- genind.c does not set the locale at all when generating
  the output.

Thus, tolower() is not locale-enabled during index
generation, hence the breaking of the group.

Below I include the patch which seemed to work out the
problem for me.  Note that it changes locale for LC_CTYPE
and then resets it back for each of the index entries.  I am
not sure that it could be moved upper for optimization
purposes, without breaking something else.  Could you please
look at it, or recommend somebody else to whom I should
submit it?


Many thanks,
Oleg Katsitadze


P.S.  I also posted this message to tex-eplain at tug.org,
tex-k at tug.org, and texhax at tug.org, in case somebody has any
comments to make.


--- makeindexk/genind.c	2002-10-02 15:19:22.000000000 +0300
+++ makeindexk.new/genind.c	2004-08-23 15:59:36.000000000 +0300
@@ -28,6 +28,10 @@
 #include    "mkind.h"
 #include    "genind.h"
 
+#ifdef HAVE_LOCALE_H
+#include <locale.h>
+#endif
+
 static FIELD_PTR curr = NULL;
 static FIELD_PTR prev = NULL;
 static FIELD_PTR begin = NULL;
@@ -219,6 +223,10 @@
 {
     int    let;
     FIELD_PTR ptr;
+#ifdef HAVE_SETLOCALE
+    char *prev_locale;
+    prev_locale = setlocale(LC_CTYPE, "");
+#endif
 
     if (in_range) {
 	ptr = curr;
@@ -246,6 +254,10 @@
 	make_item(NIL);
     } else
 	make_item(delim_t);
+
+#ifdef HAVE_SETLOCALE
+    setlocale(LC_COLLATE, prev_locale);
+#endif
 }
 
 



More information about the texhax mailing list