Internationalization-SIG ("i18n")

This SIG provides a forum for discussing issues relating to the internationalization of Python.

At the time of writing (March 2000), internationalization (henceforth spelled as "i18n" to save typing) features are being added to Python. This sig is the primary forum for discussing those features. Topics covered include but are not limited to:

  • Unicode support and building a library of codecs
  • Support for locale information, date, number and time formatting
  • Frameworks for translating and localizing GUI and Web applications

Deliverables

The immediate deliverables relate to Unicode and encoding support; the other topics above are too general and application-specific.

  • Test the new Unicode features and supplied single-byte codecs as soon as the Unicode patch moves into public CVS (Q2 2000)
  • Assist with documentation of the new features (Q2/Q3 2000)
  • Do some prototype implementations of double-byte codecs. On the basis of this, determine what changes and additions, if any, are needed to the Unicode API at the C level support professional i18n work, and help get these into the core in time for 1.6. (Q2 2000)
  • Thereafter, begin implementing a full set of codecs supporting most of the world's languages. (Q3-4 2000 and ongoing)

Background

In mid-1999, Python consortium members made some strong requests for i18n features to be added to Python. Following discussions on the python-dev list, CNRI contracted Marc-Andre Lemburg to add Unicode strings and the associated changes to the Python core (based on a running Unicode string implementation from Fredrik Lundh), and Fredrik Lundh to develop a Unicode- based regex engine. These tasks are nearing completion and will shortly be released into public CVS.

The specification on which these are based can be found at http://starship.python.net/crew/lemburg/unicode-proposal.txt. (Now also in the Python 2.0 distribution as Misc/unicode.txt.)

The proposal defines a 'codec' interface, but implementing codecs for multi-byte languages is out of Marc-Andre's scope and will be left to members of the SIG.

There was an i18n forum at the Eighth International Python Conference, at which the above deliverables were agreed, and the SIG was formed as a result of this.

Key People and Organisations Involved

If you can help out n a specific way and feel you should be listed in here, please contact Andy Robinson with a brief bio. This list is a first draft and omits many key people on the sig, but you'll have to tell me about yourself to get listed!

Marc-André Lemburg
Implementing the core Unicode features
Fredrik Lundh, Secret Labs AB
Implementing the Unicode regex engine
Andy Robinson, ReportLab Inc.
SIG Moderator; Japanese speaker, and worked on Japanese conversion in Python for the past year at a mutual fund company. ReportLab (http://www.reportlab.com/) are developing multilingual PDF reporting tools and need to implement full support for Asian writing systems later this year.
Brian Takashi Hooper, Digital Garage
Digital Garage are a consulting and internet services firm based in Tokyo with over 70 employees. They have implemented their own codecs for Japanese in the past and have a pressing need for Japanese support in Python.
Hewlett Packard
HP are members of the Python consortium and pushed initially for the Unicode work for Python to assist with their global eSpeak program.
Christian Wittern
Christian is a sinologist based in Taiwan and can assist with Chinese-related issues.

Resources and Links

Thanks to Peter Funk for kicking off this list:

More contributions to the list are welcome!