debiandoc-sgml/README

213 lines
8.2 KiB
Plaintext

This is DebianDoc-SGML, an SGML-based documentation formatting package used
for the Debian manuals.
To install it on a non-Debian system edit the Makefile and then run
`make', `make install'.
The changelog is in the debian subdirectory.
Ardo van Rangelrooij <ardo@debian.org>
Ian Jackson <ijackson@gnu.ai.mit.edu>
-----------------------------------------------------------------------------
Message to the future maintainer(s): (Osamu Aoki)
I have re-factored and extended the DebianDoc-SGML package while adding some
UTF-8 support, DebianDoc-SGML pretty print support, XHTML support, DocBook-XML
output support, Wiki support, etc. since 2005. I have to say this has been a
steep learning experience for me who had no formal SGML education before.
In order to help future maintainer to get started quickly, I will summarize
helper information here at the end of README file here which is only be seen in
the source tree.
* package structure
(Please refer to the user documentation on this for explanation based on the
installed file location)
This package is made with following files:
|-- COPYING (GPL2)
|-- Makefile
|-- README (This file)
|-- debian (Debian package meta-data)
| |-- README.Debian (User documentation)
| |-- TODO
| |-- changelog
| |-- compat
| |-- control
| |-- copyright
| |-- debiandoc-sgml.install
| |-- debiandoc-sgml.postinst
| |-- debiandoc-sgml.postrm
| |-- debiandoc-sgml.prerm
| |-- debiandoc-sgml.sgmlcatalogs
| `-- rules
|-- sgml (DTD definition)
| |-- dtd
| | |-- catalog
| | |-- debiandoc.dcl
| | `-- debiandoc.dtd
| `-- entities
| |-- catalog
| |-- debiandoc-lat1
| `-- debiandoc-lat2
`-- tools
|-- bin (source for executables)
| |-- fixlatex
| |-- mkconversions
| |-- saspconvert
| `-- template
|-- lib
| |-- Format (output formatting engine)
| | |-- Alias.pm (alias (.pm) definition of format)
| | |-- Driver.pm
| | |-- Format.pm
| | |-- HTML.pm (format driver for HTML)
| | |-- LaTeX.pm (format driver for LaTeX)
| | |-- Texinfo.pm
| | |-- Text.pm
| | |-- TextOV.pm
| | `-- XML.pm
| |-- Locale (locale and format specific data)
| | |-- Alias.pm (alias definition of locale values)
| | |-- SGML (locale independent data for SGML)
| | |-- XML (locale independent data for XML)
| | |-- convert-encoding (conversion script for the locale data)
| | |-- ca_ES.ISO8859-1 (data for the ca_ES.ISO8859-1 locale)
| | | |-- HTML (locale specific data for HTML)
| | | |-- LaTeX (locale specific data for LaTeX)
| | | |-- Texinfo
| | | |-- Text
| | | `-- TextOV
......... (directory for all locales)
| `-- Map (Mapping for non ASCII characters)
| |-- Alias.pm
| |-- HTML.pm
| |-- LaTeX.pm
| |-- Texinfo.pm
| |-- Text.pm
| |-- TextOV.pm
| `-- XML.pm
`-- man (manual page)
`-- debiandoc-sgml.1
* How to add new locale
1. Create locale named directory by copying en_US.ISO8859-1
2. Translate phrases and make needed changes
3. Create alternative encoding data such as UTF-8 ones
using convert-encoding script
4. Adjust UTF-8 data for Unicode.
utf-8 for HTML
utf8 for LaTeX
5. Add new locales to Locale/Alias.pm .
* main conversion scripts debiandoc2*
In order to make all debiandoc2* commands to be consistent, I have merged all
of them completely in to one template file 'tools/bin/template' and introduced
few new format support using existing script as my guide.
All the debiandoc2* commands are generated by the script
'tools/bin/mkconversions' while parsing this unified script source
'tools/bin/template'.
(This infrastructure of shell/sed combination was there when I
started so please do not ask me why I did not use CPP for this.)
For the debug purpose, I provide 'make diff' which creates 'diff -u' for all
the debiandoc2* commands against the current installed version. This
functionality is added to help developer to understand implication of the
changes made to the 'tools/bin/template' file and to avoid unintended changes
to the existing scripts when adding features.
Basically these generated script uses SGML parser to produce output text file
such as plain text, HTML, LaTeX source, etc. For PostScript and PDF output,
LaTeX source is further processed to produce desired results.
Since Chinese Big5 encoding is not compatible with TeX (thus neither with
LaTeX), internal fixlatex script is run on the source before handing generated
LaTeX source to LaTeX. This is because 2nd byte of 16bit Big5 encoding uses
ASCII ranges which makes some 16 bit character to collide with meta characters
such as \ { } used in the LaTeX context. (The same problem should happen with
Japanese Shift-JIS encoding but we do not support this encoding now thus no
problem suffered.)
New -X option enable to use user provided Locale dependent data. Execution of
"make test" will execute test build sequence using package source version of
Locale dependent data. This -X is most useful when fixing Locale dependent
problem or testing new Locale data.
The use of -s option with updated fixlatex script can be used to add Japanese
Shift-JIS encoding support. But, -X option is better choice for most case for
debugging.
For adjusting language specific data such as the LaTeX starting code:
* study Format/LaTeX.pm ,
* play with -X option as described in README.Debian and manpage to find out
right /usr/share/perl5/DebianDoc_SGML/Locale/* data alternative.
* adjust tools/lib/Locale/Alias.pm and tools/lib/Locale/xx_YY.encoding/LaTeX.pm
files in the source code.
* The meaning of %locale
This has following contents for LaTeX. The Format/LaTeX.pm file use the value
defined here.
%locale = (
'babel' => '',
'inputenc' => '',
'abstract' => '',
'copyright notice' => '',
'before begin document' => '',
'after begin document' => '',
'before end document' => '',
'pdfhyperref' => ''
);
* The first 2 are used to define language scheme based on the babel macro.
For CJK, this can be undefined.
* The next 2 are for the word used for abstract and copyright notice in that
pertinent language.
* The next 3 are recent addition which provide very flexible ways to create
proper LaTeX source. CJK uses these (Can be omitted for European languages)
* The last one defines how hyperref for PDF are generated with hyperref
package. (We may need this to be defined otherwise for UTF-8 but I do
not know?) "hypertex" is the default value if none is given. If UTF-8
locale, I use unicode at this moment as the value.
* For LaTeX language dependent parameter, I use babel name of
"*.sty" from /usr/share/texmf-texlive/tex/generic/babel if available.
Exception:
* vietnam
* lithuanian
* Read "The Not So Short Introduction to LaTeX 2ε" by Tobias Oetiker
to get some LaTeX idea.
* Read "The CJK package for LaTeX 2ε — Multilingual support beyond babel"
by Werner Lemberg to get some CJK idea. It looks like current CJK
environment (2007/08) is not good enough for UTF-8.
* Read CTAN archive for unicode. (me too.)
* http://tug.ctan.org/cgi-bin/ctanPackageInformation.py?id=unicode
* http://tug.ctan.org/tex-archive/macros/latex/contrib/unicode/
Similar thing can be done for HTML with %locale. The Format/HTML.pm file use
the value for "charset" in this when generating HTML.
Package requirements:
As for required packages (especially for LaTeX processing (PS,
PDF formats)), see cjk-latex-* packages.
Please note that a ghostscript interpreter such as gs-gpl, gs-esp should
(not must) be installed too for PDF thumnail generation.
Conversion functions back to normalized SGML and XML formats are
available. The XML generated require some manual action.
Osamu Aoki <osamu@debian.org> Sat, 04 Aug 2007 21:46:45 +0900