[vox-tech] DocBook

Micah Cowan vox-tech@lists.lugod.org
Thu, 17 Jan 2002 15:04:50 -0800


MOn Thu, Jan 17, 2002 at 01:31:04PM -0800, speck@blkmtn.org wrote:
> On Thu, 17 January 2002, Micah Cowan wrote:
> 
> > As to Office - I haven't used a Word Processor in a couple years.  Not
> > nearly enough power to 'em.  Sure, the interface is convenient, but
> > it's *power*, not convenience, that I crave - and for that, the choice
> > is clear: content-oriented (vs. format-oriented) typesetting tools
> > like TeX or FOP.  Nobody's ever made a Word document can touch my
> > DocBook documents.  And I'm not even that good.
> > 
> > Micah
> 
> Hey, maybe you could do a presentation on DocBook.  Why people
> should use it and tools that would be useful, etc.

That could be good.  But it'd probably need to include some overview
on SGML, DSSSL, XML, XSLT, XSL-FO, Norman Walsh's stylesheets,
OpenJade, Xalan, FOP, PassiveTex, and maybe a little SVG.  Actually,
if I talked on just the variety of options you have for processing
tools, and left the DocBook format itself for another talk, it might
be possible to give a broad overview of those things.

If people are interested in learning more about DocBook, allow me to
make the following introduction to it.

I love DocBook.  I also love TeX (but am somewhat less comfortable
with it).  I think that XML/XSLT and TeX cover roughly the same niche
in terms of document publishing, and either one can be used as tools
of approximately equal power (though of course XML/XSLT covers a much
broader range than just document publishing).  Since many more
UNIX/GNU/Linux users are already familiar with TeX (especially in
using the LaTeX package), that is probably more often the ideal
documentation tool for many people.  My personal chocies have led me
to prefer DocBook.

The biggest differences in my mind between DocBook and LaTeX, are:

1. Both are considered to be "content-oriented", versus word
   processing or desktop publishing which are considered
   "format-oriented".  In actuality, though, LaTeX is a
   *pseudo*-"content-oriented" high-level wrapper around what is
   really a "format-oriented" typesetting language (TeX), that happens
   to be powerful and flexible enough to allow you to be
   "sort-of content-oriented" when you want to be, due to the fact that you
   can wrap low-level functionality in high-level packages.  However,
   at any point in writing a LaTeX document, you can immediately drop
   down to "low-level" again, writing in pure TeX commands.  But even
   when you're dealing with LaTeX-only, there are many elements to it
   which are decidedly more format-oriented than content-oriented -
   i.e., there is nothing content-oriented at all about using /hfill
   or /vbox (or whatever they are).

   In contrast, DocBook really *is* purely content-oriented.  DocBook
   has no mechanism for describing formatting at all - only
   content-oriented markup describing what a thing *is*, not what it
   looks like.  This is not necessarily a good or bad thing - it is
   simply a distinguishing point.  All formatting is described by a
   totally seperate document - the stylesheet driver, with which you
   specify exactly what you want the DocBook elements to look like.
   There is no point in a DocBook document at which you can "drop down"
   into low-level format-oriented stuff like you can in LaTeX -
   instead, you use the appropriate element, and edit your stylesheet
   to format it how you like it.  This has the advantage of forcing
   your document to be highly structured, but it has the disadvantage
   of placing your power-of-formatting under the flexibility of
   DocBook (or whatever DTD or Schema you use for publishing).  If you
   want to get a particular formatting element into your document but
   DocBook doesn't supply an appropriate element of which you can take
   advantage in your stylesheet, then you are out of luck.  If you
   needed that element sorely, then the best thing you can do is make
   your own DocBook-like DTD, and invent the element you're lacking.
   There's nothing wrong with doing this - and so you're never really
   constricted in what you can do format-wise with XML/XSLT; however,
   once you go outside the realm of "pure" DocBook, you can't expect
   other people to take your XML or SGML as-is and do stuff with it
   unless you also pass around your DTD and your stylesheets.

2. I consider XML a more universal language of expression than LaTeX
   or TeX.  Not that TeX isn't available on pretty much every platform
   that matters, but XML is in much more widespread use outside the
   realm of UNIX.  Also, XML is "newer" than TeX - although it's
   really a redesign of SGML, which is about as old as TeX (don't know
   which is older).  DocBook is available in both XML and SGML, and
   has itself been around for some time (but not as long as TeX), so
   this is a somewhat flimsy argument :)   ...but mainly, XML (in
   general) has massive attention right now, in comparison with TeX,
   and so various XML-related technologies are constantly being
   improved.

Anyway, having addressed those differences (mostly the first one,
which is likely to make many die-hard LaTeX users feel unduly
restricted, since there is a shift as to where the power of expression
lies), I would just reiterate that I consider them both extremely
powerful tools for expression, both having about an equal number of
advantages and disadvantages in relation to eachother.  My use of
DocBook comes from about 2 years of thinking hard about what I wanted
in a documentation tools suite, which mostly ended up with my
comparing TeX packages and DocBook.  This might be a good time for me
to point out that my nearest runner-up to DocBook wasn't LaTeX, but a
more obscure package called conTeXt.  Very nice.

It's also worth knowing, for those of you who have been with TeX long
enough that you'd never seperate from it, out of a sense of loyalty
(akin to the loyalty I feel toward Emacs) - that using DocBook doesn't
mean not using TeX.  Once you've converted your DocBook document into
a tree of formatting objects (an XSL-FO document) - which is necessary
in order to make a typeset document (though you could turn it into
something else, like text or html instead of FO) - you still need to
use an FO processor to generate the final form (e.g., PostScript or
PDF).  I've opted for Apache's FOP - but there is a TeX package called
PassiveTeX which also processes XSL-FO documents.

It's very important for me to point out that while DocBook is a mature and
excellent tool for publishing documents in from a high-level,
theoretical standpoint, it is nonetheless important to realize that
the processing tools for DocBook are not yet fully mature.  Apache's
Xalan and FOP are relatively new to the scene, and so don't have the
years of testing that TeX has - however, I have never experienced any
problems in working with them at all, and they produce extremely
high-quality output.  The weak link in the chain would be Norman
Walsh's stylesheets, which are used most commonly in processing
DocBook documents.  There are many problems with these, and it is in
active development.  The SGML stylesheets, in DSSSL, are more mature,
but still buggy - and the XSLT stylesheets are even more buggy.  This
is not to knock Mr. Walsh - he has done an outstanding job, and in
particular has done a great job in organizing the development team on
sourceforge that is actively improving and debugging the stylesheets;
it's just that creating such flexible and generally usable stylesheets
as these is no small task.  However, to write a custom, less generally
usable, specific-to-what-I-want stylesheet set should be doable, so I
intend on moving over to that eventually.  But prospective DocBook
newbies should be forewarned of this.

In conclusion:  If you are already a TeX- or LaTeX-guru, you probably
have little reason to switch to DocBook (except that you need to write
in DocBook format to write LDP HOWTOs, as Pete discovered).  I picked
it because it suits my particular needs, and YMMV, as always.

Micah