[Python Home] [PEP Index] [PEP Source] |
PEP: | 350 |
---|---|
Title: | Codetags |
Version: | 2143 |
Last-Modified: | 2005-09-26 16:38:14 -0700 (Mon, 26 Sep 2005) |
Author: | Micah Elliott <mde at tracos.org> |
Status: | Draft |
Type: | Informational |
Content-Type: | text/x-rst |
Created: | 27-Jun-2005 |
Post-History: | 10-Aug-2005, 26-Sep-2005 |
This informational PEP aims to provide guidelines for consistent use of codetags, which would enable the construction of standard utilities to take advantage of the codetag information, as well as making Python code more uniform across projects. Codetags also represent a very lightweight programming micro-paradigm and become useful for project management, documentation, change tracking, and project health monitoring. This is submitted as a PEP because its ideas are thought to be Pythonic, although the concepts are not unique to Python programming. Herein are the definition of codetags, the philosophy behind them, a motivation for standardized conventions, some examples, a specification, a toolset description, and possible objections to the Codetag project/paradigm.
This PEP is also living as a wiki [1] for people to add comments.
Programmers widely use ad-hoc code comment markup conventions to serve as reminders of sections of code that need closer inspection or review. Examples of markup include FIXME, TODO, XXX, BUG, but there many more in wide use in existing software. Such markup will henceforth be referred to as codetags. These codetags may show up in application code, unit tests, scripts, general documentation, or wherever suitable.
Codetags have been under discussion and in use (hundreds of codetags in the Python 2.4 sources) in many places (e.g., c2 [3]) for many years. See References for further historic and current information.
If you subscribe to most of these values, then codetags will likely be useful for you.
Codetag design was influenced by the following goals:
Various productivity tools can be built around codetags.
See Tools.
Encourages consistency.
Historically, a subset of these codetags has been used informally in the majority of source code in existence, whether in Python or in other languages. Tags have been used in an inconsistent manner with different spellings, semantics, format, and placement. For example, some programmers might include datestamps and/or user identifiers, limit to a single line or not, spell the codetag differently than others, etc.
Encourages adherence to SPOT/DRY principle.
E.g., generating a roadmap dynamically from codetags instead of keeping TODOs in sync with separate roadmap document.
Easy to remember.
All codetags must be concise, intuitive, and semantically non-overlapping with others. The format must also be simple.
Use not required/imposed.
If you don't use codetags already, there's no obligation to start, and no risk of affecting code (but see Objections). A small subset can be adopted and the Tools will still be useful (a few codetags have probably already been adopted on an ad-hoc basis anyway). Also it is very easy to identify and remove (and possibly record) a codetag that is no longer deemed useful.
Gives a global view of code.
Tools can be used to generate documentation and reports.
A logical location for capturing CRCs/Stories/Requirements.
The XP community often does not electronically capture Stories, but codetags seem like a good place to locate them.
Extremely lightweight process.
Creating tickets in a tracking system for every thought degrades development velocity. Even if a ticketing system is employed, codetags are useful for simply containing links to those tickets.
This shows a simple codetag as commonly found in sources everywhere (with the addition of a trailing <>):
# FIXME: Seems like this loop should be finite. <> while True: ...
The following contrived example demonstrates a typical use of codetags. It uses some of the available fields to specify the assignees (a pair of programmers with initials MDE and CLE), the Date of expected completion (Week 14), and the Priority of the item (2):
# FIXME: Seems like this loop should be finite. <MDE,CLE d:14w p:2> while True: ...
This codetag shows a bug with fields describing author, discovery (origination) date, due date, and priority:
# BUG: Crashes if run on Sundays. # <MDE 2005-09-04 d:14w p:2> if day == 'Sunday': ...
Here is a demonstration of how not to use codetags. This has many problems: 1) Codetags cannot share a line with code; 2) Missing colon after mnemonic; 3) A codetag referring to codetags is usually useless, and worse, it is not completable; 4) No need to have a bunch of fields for a trivial codetag; 5) Fields with unknown values (t:XXX) should not be used:
i = i + 1 # TODO Add some more codetags. # <JRNewbie 2005-04-03 d:2005-09-03 t:XXX d:14w p:0 s:inprogress>
This describes the format: syntax, mnemonic names, fields, and semantics, and also the separate DONE File.
Each codetag should be inside a comment, and can be any number of lines. It should not share a line with code. It should match the indentation of surrounding code. The end of the codetag is marked by a pair of angle brackets <> containing optional fields, which must not be split onto multiple lines. It is preferred to have a codetag in # comments instead of string comments. There can be multiple fields per codetag, all of which are optional.
In short, a codetag consists of a mnemonic, a colon, commentary text, an opening angle bracket, an optional list of fields, and a closing angle bracket. E.g.,
# MNEMONIC: Some (maybe multi-line) commentary. <field field ...>
The codetags of interest are listed below, using the following format:
| recommended mnemonic (& synonym list) | canonical name: semantics
File-level codetags might be better suited as properties in the revision control system, but might still be appropriately specified in a codetag.
Some of these are temporary (e.g., FIXME) while others are persistent (e.g., REQ). A mnemonic was chosen over a synonym using three criteria: descriptiveness, length (shorter is better), commonly used.
Choosing between FIXME and XXX is difficult. XXX seems to be more common, but much less descriptive. Furthermore, XXX is a useful placeholder in a piece of code having a value that is unknown. Thus FIXME is the preferred spelling. Sun says [4] that XXX and FIXME are slightly different, giving XXX higher severity. However, with decades of chaos on this topic, and too many millions of developers who won't be influenced by Sun, it is easy to rightly call them synonyms.
DONE is always a completed TODO item, but this should probably be indicated through the revision control system and/or a completion recording mechanism (see DONE File).
It may be a useful metric to count NOTE tags: a high count may indicate a design (or other) problem. But of course the majority of codetags indicate areas of code needing some attention.
An FAQ is probably more appropriately documented in a wiki where users can more easily view and contribute.
All fields are optional. The proposed standard fields are described in this section. Note that upper case field characters are intended to be replaced.
The Originator/Assignee and Origination Date/Week fields are the most common and don't usually require a prefix.
This lengthy list of fields is liable to scare people (the intended minimalists) away from adopting codetags, but keep in mind that these only exist to support programmers who either 1) like to keep BUG or RFE codetags in a complete form, or 2) are using codetags as their complete and only tracking system. In other words, many of these fields will be used very rarely. They are gathered largely from industry-wide conventions, and example sources include GCC Bugzilla [5] and Python's SourceForge [6] tracking systems.
The following fields are also available but expected to be less common.
Development cycle Release (r). Useful for grouping codetags into completion target groups.
To summarize, the non-prefixed fields are initials and origination date, and the prefixed fields are: assignee (a), due (d), priority (p), tracker (t), category (c), status (s), iteration (i), and release (r).
It should be possible for groups to define or add their own fields, and these should have upper case prefixes to distinguish them from the standard set. Examples of custom fields are Operating System (O), Severity (S), Affected Version (A), Customer (C), etc.
Some codetags have an ability to be completed (e.g., FIXME, TODO, BUG). It is often important to retain completed items by recording them with a completion date stamp. Such completed items are best stored in a single location, global to a project (or maybe a package). The proposed format is most easily described by an example, say ~/src/fooproj/DONE:
# TODO: Recurse into subdirs only on blue # moons. <MDE 2003-09-26> [2005-09-26 Oops, I underestimated this one a bit. Should have used Warsaw's First Law!] # FIXME: ... ...
You can see that the codetag is copied verbatim from the original source file. The date stamp is then entered on the following line with an optional post-mortem commentary. The entry is terminated by a blank line (\n\n).
It may sound burdensome to have to delete codetag lines every time one gets completed. But in practice it is quite easy to setup a Vim or Emacs mapping to auto-record a codetag deletion in this format (sans the commentary).
Currently, programmers (and sometimes analysts) typically use grep to generate a list of items corresponding to a single codetag. However, various hypothetical productivity tools could take advantage of a consistent codetag format. Some example tools follow.
There are some tools already in existence that take advantage of a smaller set of pseudo-codetags (see References). There is also an example codetags implementation under way, known as the Codetag Project [7].
Objection: | Extreme Programming argues that such codetags should not ever exist in code since the code is the documentation. |
---|---|
Defense: | Maybe you should put the codetags in the unit test files instead. Besides, it's tough to generate documentation from uncommented source code. |
Objection: | Too much existing code has not followed proposed guidelines. |
---|---|
Defense: | [Simple] utilities (ctlint) could convert existing codes. |
Objection: | Causes duplication with tracking system. |
---|---|
Defense: | Not really, unless fields are abused. If an item exists in the tracker, a simple ticket number in the codetag tracker field is sufficient. Maybe a duplicated title would be acceptable. Furthermore, it's too burdensome to have a ticket filed for every item that pops into a developer's mind on-the-go. Additionally, the tracking system could possibly be obviated for simple or small projects that can reasonably fit the relevant data into a codetag. |
Objection: | Codetags are ugly and clutter code. |
---|---|
Defense: | That is a good point. But I'd still rather have such info in a single place (the source code) than various other documents, likely getting duplicated or forgotten about. The completed codetags can be sent off to the DONE File, or to the bit bucket. |
Objection: | Codetags (and all comments) get out of date. |
---|---|
Defense: | Not so much if other sources (externally visible documentation) depend on their being accurate. |
Objection: | Codetags tend to only rarely have estimated completion dates of any sort. OK, the fields are optional, but you want to suggest fields that actually will be widely used. |
---|---|
Defense: | If an item is inestimable don't bother with specifying a date field. Using tools to display items with order and/or color by due date and/or priority, it is easier to make estimates. Having your roadmap be a dynamic reflection of your codetags makes you much more likely to keep the codetags accurate. |
Objection: | Named variables for the field parameters in the <> should be used instead of cryptic one-character prefixes. I.e., <MDE p:3> should rather be <author=MDE, priority=3>. |
---|---|
Defense: | It is just too much typing/verbosity to spell out fields. I argue that p:3 i:2 is as readable as priority=3, iteration=2 and is much more likely to by typed and remembered (see bullet C in Philosophy). In this case practicality beats purity. There are not many fields to keep track of so one letter prefixes are suitable. |
Objection: | Synonyms should be deprecated since it is better to have a single way to spell something. |
---|---|
Defense: | Many programmers prefer short mnemonic names, especially in comments. This is why short mnemonics were chosen as the primary names. However, others feel that an explicit spelling is less confusing and less prone to error. There will always be two camps on this subject. Thus synonyms (and complete, full spellings) should remain supported. |
Objection: | It is cruel to use [for mnemonics] opaque acronyms and abbreviations which drop vowels; it's hard to figure these things out. On that basis I hate: MLSTN RFCTR RFE FEETCH, NYI, FR, FTRQ, FTR WKRD RVDBY |
---|---|
Defense: | Mnemonics are preferred since they are pretty easy to remember and take up less space. If programmers didn't like dropping vowels we would be able to fit very little code on a line. The space is important for those who write comments that often fit on a single line. But when using a canon everywhere it is much less likely to get something to fit on a line. |
Objection: | It takes too long to type the fields. |
---|---|
Defense: | Then don't use (most or any of) them, especially if you're the only programmer. Terminating a codetag with <> is a small chore, and in doing so you enable the use of the proposed tools. Editor auto-completion of codetags is also useful: You can program your editor to stamp a template (e.g. # FIXME . <MDE {date}>) with just a keystroke or two. |
Objection: | WorkWeek is an obscure and uncommon time unit. |
---|---|
Defense: | That's true but it is a highly suitable unit of granularity for estimation/targeting purposes, and it is very compact. The ISO 8601 [2] is widely understood but allows you to only specify either a specific day (restrictive) or month (broad). |
Objection: | I aesthetically dislike for the comment to be terminated with <> in the empty field case. |
---|---|
Defense: | It is necessary to have a terminator since codetags may be followed by non-codetag comments. Or codetags could be limited to a single line, but that's prohibitive. I can't think of any single-character terminator that is appropriate and significantly better than <>. Maybe @ could be a terminator, but then most codetags will have an unnecessary @. |
Objection: | I can't use codetags when writing HTML, or less specifically, XML. Maybe @fields@ would be a better than <fields> as the delimiters. |
---|---|
Defense: | Maybe you're right, but <> looks nicer whenever applicable. XML/SGML could use @ while more common programming languages stick to <>. |
Some other tools have approached defining/exploiting codetags. See http://tracos.org/codetag/wiki/Links.
[1] | http://tracos.org/codetag/wiki/Pep |
[2] | (1, 2) http://en.wikipedia.org/wiki/ISO_8601 |
[3] | http://c2.com/cgi/wiki?FixmeComment |
[4] | http://java.sun.com/docs/codeconv/html/CodeConventions.doc9.html#395 |
[5] | http://gcc.gnu.org/bugzilla/ |
[6] | http://sourceforge.net/tracker/?group_id=5470 |
[7] | http://tracos.org/codetag |