This website
The technology behind this website serves to demonstrate le-tex's approach to media-neutral publishing. Many of the tools used in day-to-day production are also used to edit and publish these pages.
Data format: DocBook XML
The text content is managed in a single XML file according to DocBook document type book that is also used for some of the books typeset by le-tex.
Version control: svn
This XML file, the remaining content data (mainly the master files for the images), the layout information, and the configuration for the HTML generation are located under version control in an svn repository.
le-tex uses Subversion (svn) version control for numerous software development projects. In some production lines, the content and configuration (e.g. project-specific LaTeX macros or makefiles) are also located in svn repositories.
For this and other projects, svn version control allows:
-
Simultaneous (concurrent) editing with automatic merging of changes and/or conflict recognition,
-
Tracking of changes (who has changed it, when, how and why), and
-
Recovery of older versions.
Editing
The XML file is managed using a visual XML editor or an XML-compatible text editor. A spellcheck can optionally be performed from the editor. The tools used are GNU Emacs/xemacs with aspell or, more recently, oXygen XML Editor.
Larger newly written passages generally undergo linguistic copy editing.
Translation
This website is in German and English. Changes to content are made in the German file. The translation workflow is as follows.
An external translator receives the complete file and translates it with the help of a computer-aided translation (CAT) tool, SDL Trados.
This tool is used to manage a Translation Memory that memorizes pairs of translated phrases for this website (and for other le-tex publications). The tool can then automatically translate text passages that are already known – depending the level of match – so that only the changed or new passages need to be newly translated.
HTML generation
The individual XHTML pages from the DocBook files are generated using an XSLT 2.0 stylesheet.
When you click on the link, all browsers except Firefox will tell you that the page may
not be displayed. This is probably because they try to interpret the .xsl file as
if it was meant to
generate HTML output for them to render. If you use your browser’s view source code
function for the purportedly faulty page, you will see the XSLT code.
A CSS2 stylesheet creates the actual layout of the HTML pages generated in this way.
The raw versions of the images are scaled to the target size using ImageMagick.
A makefile automates the generation processes and, in combination with a validating XML parser and other scripts, ensures that the data corresponds to the web standards and that all the links function correctly (the XSLT sheet ensures that links are only provided to foreign-language pages where a translated version is available).
The only commands that a copy editor needs to bear in mind are svn update, make, and svn commit, as these enable him or her to update the changed source data locally, generate a preview of the site, and resubmit the changed source files to the svn repository after editing. Additionally, the webmaster must (and has permission to) execute a make install command in order to activate the pages, i.e. upload them to the web server.
Why no CMS?
This site consists of static content that is updated every so often (sometimes several times a week, then not again for another month). As explained above, editing, translation, and HTML generation are performed using sophisticated tools.
le-tex uses this command-line-controlled xemacs/aspell/make toolchain not because it simply knows of no alternative, but because this toolchain is more powerful, streamlined, and therefore cost-effective for the task in hand than one of the standard content management systems that require a lot more infrastructure (database, caching). The database models of these systems are often inadequate for semi-structured content and the systems themselves typically require some integration and compromises in order to interact with standard tools like a Translation Memory.
The alternative at the other end of the spectrum, static HTML pages, would have meant forfeiting any agility in terms of restructuring or changes to the layout. If you had to tackle 40 HTML files with redundant code in order to make global changes, for instance, it's likely you'd leave the page unchanged for years. That was exactly the fate of the old pages.
The only aspect that could be improved in the future (and an advantage that good CMSs have over the approach described) is a workflow component that goes beyond the purely technical workflow control of a makefile. The website management, copy editing, translation, and activation processes are currently still initiated in a purely informal way. These processes are subsequently to be packaged into a clear framework with the help of a business process management system.