[tex-live] [LONG] Improving TeX package classification and the associated documentaion

Florent Rougon f.rougon at free.fr
Mon Jul 2 17:46:12 CEST 2007


Hi,

"George N. White III" <gnwiii at gmail.com> wrote:

>> > Furthermore I would prefer *not* to have new stuff on CTAN.
>>
>> I'm not sure what the problem with "new stuff" is, but...
>
> You have to be careful about generating more work for current
> maintainers and more confusion for google users getting hits on files
> they don't understand, e.g., when searching for "hyperref manual".

What do you mean? Surely, I don't intend to hijack hyperref... I would
like the metadata to either be part of "definitive packages" on CTAN, or
of the catalogue.

The downside when it is in the catalogue is that it isn't accessible to
packages not yet in CTAN: packages that you would install in /usr/local
for instance. OTOH, if there is a known place in the TDS to put the
metadata of each package, then such third-party packages can be
trivially found by the tool I'm thinking about if installed properly.

There would be for instance

  /usr/share/texmf/metadata/g/geometry.xml
  /usr/share/texmf/metadata/h/hyperref.xml
  etc.

and each of each of these files would contain relative paths from the
base of the TEXMF tree (here, /usr/share/texmf/) for each documentation
file belonging to the package.

This is a slight modification of the proposal I made to Norbert in
<873b06dffx.fsf at florent.maison>, where we don't put all the metadata in
one big file generated by the TL install scripts, but instead the
metadata is put in the relevant TEXMF trees and in separate files, to
ease the installation of individual packages (of course, I could rather
easily generate such files from the big file I mentioned in the previous
mail...).

This way, you can also have:

  /usr/local/share/texmf/metadata/m/mypackage.xml

containing paths relative to /usr/local/share/texmf, and that would
allow mypackage to automatically register itself with the
cataloguing/documentation system I'm proposing here.

> You have to work with packages as they come, the best you can do is:
>
> 1) encourage authors to include metadata (in a useful form) -- I suspect most
> would be happy to fill in some template if they knew it would be used.

Yup.

> 2) failing author-provided metadata, avoid creating multiple meta-data
> repositories (duplication of effort, confusion when they diverge,
> etc.)

Depending on what the CTAN maintainers think, it may be acceptable to
have only one authoritative repository containing the "override files",
and update it whenever there is a warranted change in the metadata
provided by the package maintainer, as I described in
<873b06dffx.fsf at florent.maison>.

>> If (1), then we (TeX Live) are on our own and have to build the doc
>> ourselves. I won't develop this case for now because this becomes a bit
>> messy and have the impression that a better solution would be to enforce
>> that each CTAN upload has the full documentation built. But if there are
>> good reasons against this, I can devise solutions.
>
> Again, you have to work with what you get -- no standards are enforced.

I believe I addressed this case in <873b06dffx.fsf at florent.maison>
(which can be amended by the per-TEXMF-tree structure mentioned at the
beginning of this mail).

> If there is a template and some information showing how and why the
> data will be useful some authors will provide it, but certainly you have to
> deal with cases where it is not provided or is somehow broken.

Yup, either we accept the risk of bad metadata, or we use a mechanism
such as "override files".

>> As said previously, the database need not be complete to be useful...
>
> It will be more useful if it can be trusted to include important widely used
> packages and offers fallbacks.

For important packages: sure, the metadata can be added easily. For the
fallbacks, frankly, I don't think we need more than a poor man's search
on package names or something like that. But that can be refined.

Anyway, my opinion is that it's better to have a not-yet-tagged package
than a package badly tagged by an automatic process.

> Better to have a one-time mechanism where the tag information is included
> once and the author doesn't have to do anything more unless there is a
> major change in the package status.   I'd suggest a special line in the
> README file for the author's tags (while still allowing ctan2tl, etc.
> to add more tags).

Ugh, that's awful. :-P

The catalogue is already based on XML; I think that if we add structured
data such as the metadata mentioned here, we should continue this way.

> Messy, but but unavoidable.  I think CTAN gets whatever an author wants
> to provide, so it is up to each distribution to create missing doc files
> in the preferred format for the distro.

Addressed in <873b06dffx.fsf at florent.maison> with the per-TEXMF-tree
amendment from this mail.

> Some heuristics can be applied to generate default tags using the information
> that is readily available (one tag can indicate that only default tags were
> applied).

I don't much like the idea of default tags. debtags uses a
"special::not-yet-tagged" tag for not yet tagged packages, and I think
this is fine.

> There needs to be some thought for integrating local packages.   With tetex,
> some administrator just copied files to the local texmf tree.  It is
> getting to the point where (with lots of linux workstations, many
> controlled by users with only vague understanding of texmf trees) we
> need a way to create local packages that can be installed/removed
> using the TL tools.

With the proposal at the beginning of this mail, you can easily register
your locally-installed packages.

Regards,

-- 
Florent


More information about the tex-live mailing list