Python Documentation

The documentation for Python is often considered some of the best documentation available for a free programming language, though there are some complaints that go beyond issues of completeness. The most significant problem is that there is no single documentation system that includes both the standard distribution and third-party extensions. Some other popular languages in wide-spread use do offer pervasive facilities for API documentation.

The Python Doc-SIG is chartered to improve both the documentation available for the standard Python distribution and to support the creation and distribution of additional documentation that may be of use to the community. This includes both API reference documentation and documentation that may be more expository or tutorial in nature.

Python's Current Situation

The current situation supports both reference material and more prosaic material as stand-alone documents. Many useful output formats are supported, and others are not too difficult to add. Some level of content-based hypertext support is available, which makes hypertext-capable output formats more useful. The "strongest" current format, measured by the strength of linking, is HTML.

Unfortunately, the current source format for documentation has a steep learning curve for new authors, and the necessary tools to process the documents are not common outside academic environments (and are losing ground even there). While the highly stylized application of LaTeX makes the learning curve less steep than for typical LaTeX applications, there is quite a bit of tedium in learning the specialized marks and effects of the many characters given special treatment by the underlying TeX engine. TeX-based markup is sufficiently tedious to process that creating additional formatters based on the source documents is difficult and fraught with peril.

Past Efforts

Jim Fulton's "Structured Text" proposal provides some measure of text structuring within a single docstring, including nicely esoteric typographical conventions for bold, italic, and monospace fonts. Limited support for hyperlinking was provided based on ideas from SETEXT (Structure-Enhanced TEXT).

Daniel Larsson's gendoc, pythondoc -- current status of pythondoc? These tools understand the "Structured Text" markup.

How widely accepted are any of these tools/formats? How to tell without someone who has time to review what's available?

Work in Progress: Fred's XML Conversion Effort

The Doc/tools/sgmlconv/ directory of the Python source tree includes a number of scripts that can be used to convert the existing LaTeX documents to an SGML or XML version with an estimated 90% of the work done for a fairly simple conversion. Some small structural improvements are implemented, but most of the document structure is unchanged. Additional re-writing of the tree can be implemented without too much pain since each source file (*.tex) is loaded as a DOM DocumentFragment. (Loading the entire Library Reference this way may be difficult on some machines, however!)

Using the result of the existing conversion can easily result in documents for which the current quality of formatting is easily achivable by mapping it back to LaTeX for typesetting or using a hypertext generation engine that can create HTML and possibly Texinfo (yes, there is a consistent stream of requests for GNU info support). This allows high quality typesetting for print and richer hyperlinking for browsing. The documents, however, would largely remain independent. It may be possible to generate better inter-document hyperlinks with more work.

The primary drawback to a simple conversion is that a subsequent conversion to a new SGML or XML document type may require another transformation of the source tree to support an incompatible structure. This is undesirable since it involves an additional re-education for authors using the SGML/XML markup. One of the goals of moving away from LaTeX is to make it easier for people to become contributors; changing formats under foot reduces the likelihood that new contributors will appear.

Work in Progress: David's Docstring Proposal

XXX Need a link to an online version of the revised text. (Forthcoming.)

David Ascher has offered a proposal for marking docstrings with enough information to provide good information tagging. This can be used to facilitate hyperlink generation between modules within a collection (whether it be a package or not), and, with some additional support, among disparate collections, such as a third-party package and the standard library. This information, combined with information extracted from the parse tree of a module, can be used to produce effective hypertextual documentation.

Work in Progress: Fred's Docstring & Parse-tree Analysis Tool

This code is still a mess, but can build a Python data structure from a module with docstrings. The docstring format is a bit different from that proposed by David Ascher and is "lighter" in the sense that there is less explicit markup for some things; it may be slightly less flexible.

Only the perliminary loading has been implemented at this time. A fair amount of the analysis hasn't been written yet (needs to be a second phase after module loading), and no testing of handling multiple modules together (required for proper link analysis).

No formatting back-ends have been written. There needs to be some sort of database backend as well; a simple database representation would be very useful for interactive environments like PythonWin and IDLE.

Synthesis

Python documentation producers need to select not one but two ways to create documentation:

  • One specification for documentation embedded in source code, for use in generating API documentation.

  • One specification for out-of-line documentation, such as tutorials and instructional material. This must be sufficient to support module reference material derived from the existing documentation.

This Developers Day session should concentrate on identifying and resolving issues with the proposals current at the time of the session with an eye toward defining the document formats which will be used in the evolving Python infrastructure.