|
|
||||||||||||
|
Python Enhanced String Processing SIG StatusJeff Ollie is doing much of the implementation and has snapshots of his code on his web pages. Andrew Kuchling, from Magnet, is hosting a web page that is a summary point for discussions and tools developed as a part of this SIG. Definitely some things to bookmark. In order to keep the rest of the Python world up to date with activity going on within the string-sig I'll post monthly summaries to this page. Regular Expressions - The Next Generation is taking shape. The efforts of the SIG members have been excellent so far and I look forward to using the new re module. Special thanks to Jeff Ollie for doing most of the implimentation. The new features will first appear in Python 1.5 as a development snapshot. First of all, rest assured that the regex module as is will not change out from under you. We have decided to freeze the current regex interface as is. Below is a summary of what we have in mind. NOTABLE DEVELOPMENTSregex moduleThe current regex interface will remain unchanged for backward compatibility. However, some long standing bugs in the engine are to be fixed and some performance optimizations are to be added. After 1.5 no significant enhancements will be done on this module.regsub moduleCurrently implemented in Python and is unchanged. The functionality to be folded into the new re module.string moduleGuido added areplace() function to perform simple string substitutions.
re moduleThe new prototype module, first implemented as a python module translating new syntax to old underneath, later implemented in C. The new syntax standard will be perl-like (as much as possible). Successful matches will return a newmatch object from which
strings and groups and
indices can be extracted (ala pregex). Unsuccessful matches return None.
Global syntax flags are gone in favor of a single syntax. With these
changes re can be made thread-safe. Symbolic grouping capability will be ON
by default in patterns. Optional compile arguments available to support
perl's /i /m /s /x options, (see below for example). Translation table
compile argument gone in favor of re.IGNORECASE flag; otherwise use
string.translate().
New replace function used instead of regsub for both sub and gsub.
(Consistant with string module.)
raw stringsNew string type which provides no escaping; just literally places characters into string. Signified by r prefix, e.g. r"aw\text", will store exactly 7 bytes. Very useful in patterns and replacement strings but is made generally available throughout the language.Example compile using new features: import re rocker = re.compile( r"""\b # only match an animal at a word boundary (?<animal> # symbolic group with name 'animal' camel|snake # cool animals ) # close the group \b # and another word boundary """, re.IGNORECASE | re.EXTENDED) # same as i and x options in perlNote ignoring whitespace and comments is enabled by the EXTENDED option. That's a quick summary. For more details see the refs below and/or join the SIG. Python 1.5 should contain a Python re module supporting the above interface. The snapshot won't be blazing fast as performance tuning will come later but it should be quite usable. And the new interface will definately answer many criticisms about Python's regex facilities. |