Home Search Download Documentation
Help News Community SIGs
String SIG
Index
Current Status
About Python SIGs
 
Email Us
string-sig-owner@python.org
 
  

Python Enhanced String Processing SIG Status

Jeff Ollie is doing much of the implementation and has snapshots of his code on his web pages. Andrew Kuchling, from Magnet, is hosting a web page that is a summary point for discussions and tools developed as a part of this SIG. Definitely some things to bookmark.

In order to keep the rest of the Python world up to date with activity going on within the string-sig I'll post monthly summaries to this page. Regular Expressions - The Next Generation is taking shape. The efforts of the SIG members have been excellent so far and I look forward to using the new re module. Special thanks to Jeff Ollie for doing most of the implimentation. The new features will first appear in Python 1.5 as a development snapshot.

First of all, rest assured that the regex module as is will not change out from under you. We have decided to freeze the current regex interface as is. Below is a summary of what we have in mind.

NOTABLE DEVELOPMENTS

regex module

The current regex interface will remain unchanged for backward compatibility. However, some long standing bugs in the engine are to be fixed and some performance optimizations are to be added. After 1.5 no significant enhancements will be done on this module.

regsub module

Currently implemented in Python and is unchanged. The functionality to be folded into the new re module.

string module

Guido added a replace() function to perform simple string substitutions.

re module

The new prototype module, first implemented as a python module translating new syntax to old underneath, later implemented in C. The new syntax standard will be perl-like (as much as possible). Successful matches will return a new match object from which strings and groups and indices can be extracted (ala pregex). Unsuccessful matches return None. Global syntax flags are gone in favor of a single syntax. With these changes re can be made thread-safe. Symbolic grouping capability will be ON by default in patterns. Optional compile arguments available to support perl's /i /m /s /x options, (see below for example). Translation table compile argument gone in favor of re.IGNORECASE flag; otherwise use string.translate(). New replace function used instead of regsub for both sub and gsub. (Consistant with string module.)

raw strings

New string type which provides no escaping; just literally places characters into string. Signified by r prefix, e.g. r"aw\text", will store exactly 7 bytes. Very useful in patterns and replacement strings but is made generally available throughout the language.

Example compile using new features:

import re
rocker = re.compile(
  r"""\b           # only match an animal at a word boundary
      (?<animal>   # symbolic group with name 'animal'
       camel|snake # cool animals
      )            # close the group
      \b           # and another word boundary
  """,  re.IGNORECASE | re.EXTENDED) # same as i and x options in perl
Note ignoring whitespace and comments is enabled by the EXTENDED option.

That's a quick summary. For more details see the refs below and/or join the SIG.

Python 1.5 should contain a Python re module supporting the above interface. The snapshot won't be blazing fast as performance tuning will come later but it should be quite usable. And the new interface will definately answer many criticisms about Python's regex facilities.