Structured Text Formatting Rules

For the gendoc processor we (Daniel and Robin) have decided on the following format rules for Python doc strings so that they may be marked up in a reasonable manner. People developing other automated tools may think about adopting this standard as well. Feel free to discuss this in the doc-sig list.

By and large we have decided to use the Structured Text approach offered by Jim Fulton. We have added to and modified it slightly for the needs of gendoc. In the text below I have borrowed heavily from Jim's description of Structured Text.

A structured string consists of a sequence of paragraphs separated by one or more blank lines. Each paragraph has a level which is defined as the minimum indentation of the paragraph. A paragraph is a sub-paragraph of another paragraph if the other paragraph is the last preceding paragraph that has a lower level. In doc strings the level 0 indentation (left margin) is set to the starting column of the first non-blank line after the first line. This means the first tab(s) or 8 characters, for example, are stripped off the doc string in the process. Folks using python-mode won't have to worry; indentation will be correct.

Special symbology is used to indicate special constructs:

  • A paragraph that begins with a '-', '*', or 'o' is treated as an unordered list (bullet) element. (Note: In list paragraphs it is not necessary to separate each list item with a blank line.)

  • A paragraph that begins with a sequence of digits followed by a white-space character is treated as an ordered list element.

  • A paragraph that begins with a sequence of sequences, where each sequence is a sequence of digits or a sequence of letters followed by a period, is treated as an ordered list element.

  • A paragraph with a first line that contains some text, followed by some white-space and '--' is treated as a descriptive list element. The leading text is treated as the element title. (Note: This loose definition can mess up where people use '--' outside this list context. We may need to restrict this definition somehow.)

  • Sub-paragraphs of a paragraph that ends in the word 'example' or the word 'examples' is treated as example code and is output as is. (trailing punctuation on the word 'example' ignored)

  • Text enclosed single quotes (with white-space to the left of the first quote and whitespace or punctuation to the right of the second quote) is treated as example code.

  • Text surrounded by '*' characters (with white-space to the left of the first '*' and whitespace or punctuation to the right of the second '*') is emphasized (rendered italic).

  • Text surrounded by '**' characters (with white-space to the left of the first '**' and whitespace or punctuation to the right of the second '**') is marked as strong (rendered bold).

An addition was made to support hypertext references. Hypertext references are marked with double quotes in the body of the doc string. At the end of the doc string will be a matching line starting with two dots '.. ' and a space followed by the same quoted text and then followed by the mapping (URL). This is patterned after the footnote notion in setext but is easier on the eyes. For example, "Pythonland" will be marked as a hyper-references to Python.org. If no matching trailing reference is found then nothing is done.