As usual, Python's standard library received a number of enhancements and bug fixes. Here's a partial list of the most notable changes, sorted alphabetically by module name. Consult the Misc/NEWS file in the source tree for a more complete list of changes, or look through the CVS logs for all the details.
+=
assignment operator to add another array's
contents, and the *=
assignment operator to repeat an array.
(Contributed by Jason Orendorff.)
The old version of the module has been renamed to
bsddb185 and is no longer built automatically; you'll
have to edit Modules/Setup to enable it. Note that the new
bsddb package is intended to be compatible with the
old module, so be sure to file bugs if you discover any
incompatibilities. When upgrading to Python 2.3, if the new interpreter is compiled
with a new version of
the underlying BerkeleyDB library, you will almost certainly have to
convert your database files to the new version. You can do this
fairly easily with the new scripts db2pickle.py and
pickle2db.py which you will find in the distribution's
Tools/scripts directory. If you've already been using the PyBSDDB
package and importing it as bsddb3, you will have to change your
import
statements to import it as bsddb.
ext = Extension("samp", sources=["sampmodule.c"], depends=["sample.h"])
Modifying sample.h would then cause the module to be recompiled. (Contributed by Jeremy Hylton.)
>>> getopt.getopt(['-f', 'filename', 'output', '-v'], 'f:v') ([('-f', 'filename')], ['output', '-v']) >>> getopt.gnu_getopt(['-f', 'filename', 'output', '-v'], 'f:v') ([('-f', 'filename'), ('-v', '')], ['output'])
(Contributed by Peter Åstrand.)
>>> import grp >>> g = grp.getgrnam('amk') >>> g.gr_name, g.gr_gid ('amk', 500)
heap[k] <= heap[2*k+1]
and
heap[k] <= heap[2*k+2]
. This makes it quick to
remove the smallest item, and inserting a new item while maintaining
the heap property is O(lg n). (See
http://www.nist.gov/dads/HTML/priorityque.html for more
information about the priority queue data structure.)
The heapq module provides heappush() and heappop() functions for adding and removing items while maintaining the heap property on top of some other mutable Python sequence type. Here's an example that uses a Python list:
>>> import heapq >>> heap = [] >>> for item in [3, 7, 5, 11, 1]: ... heapq.heappush(heap, item) ... >>> heap [1, 3, 5, 11, 7] >>> heapq.heappop(heap) 1 >>> heapq.heappop(heap) 3 >>> heap [5, 7, 11]
(Contributed by Kevin O'Connor.)
reload()
operations.
IDLE's core code has been incorporated into the standard library as the
idlelib package.
itertools.ifilter(predicate, iterator)
returns all elements in
the iterator for which the function predicate() returns
True, and itertools.repeat(obj, N)
returns
obj
N times. There are a number of other functions in
the module; see the package's reference
documentation for details.
(Contributed by Raymond Hettinger.)
e
and 10
. (Contributed by Raymond
Hettinger.)
During testing, it was found that some applications will break if time stamps are floats. For compatibility, when using the tuple interface of the stat_result time stamps will be represented as integers. When using named fields (a feature first introduced in Python 2.2), time stamps are still represented as integers, unless os.stat_float_times() is invoked to enable float return values:
>>> os.stat("/tmp").st_mtime 1034791200 >>> os.stat_float_times(True) >>> os.stat("/tmp").st_mtime 1034791200.6335014
In Python 2.4, the default will change to always returning floats.
Application developers should enable this feature only if all their libraries work properly when confronted with floating point time stamps, or if they use the tuple API. If used, the feature should be activated on an application level instead of trying to enable it on a per-use basis.
len(population)
. For example:
>>> days = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'St', 'Sn'] >>> random.sample(days, 3) # Choose 3 elements ['St', 'Sn', 'Th'] >>> random.sample(days, 7) # Choose 7 elements ['Tu', 'Th', 'Mo', 'We', 'St', 'Fr', 'Sn'] >>> random.sample(days, 7) # Choose 7 again ['We', 'Mo', 'Sn', 'Fr', 'Tu', 'St', 'Th'] >>> random.sample(days, 8) # Can't choose eight Traceback (most recent call last): File "<stdin>", line 1, in ? File "random.py", line 414, in sample raise ValueError, "sample larger than population" ValueError: sample larger than population >>> random.sample(xrange(1,10000,2), 10) # Choose ten odd nos. under 10000 [3407, 3805, 1505, 7023, 2401, 2267, 9733, 3151, 8083, 9195]
The random module now uses a new algorithm, the Mersenne Twister, implemented in C. It's faster and more extensively studied than the previous algorithm.
(All changes contributed by Raymond Hettinger.)
(Sticking with Python 2.2 or 2.1 will not make your applications any safer because there are known bugs in the rexec module in those versions. To repeat: if you're using rexec, stop using it immediately.)
The original timeout implementation was by Tim O'Malley. Michael Gilfix integrated it into the Python socket module and shepherded it through a lengthy review. After the code was checked in, Guido van Rossum rewrote parts of it. (This is a good example of a collaborative development process in action.)
sys.api_version
. The current
exception can be cleared by calling the new sys.exc_clear()
function.
>>> import textwrap >>> paragraph = "Not a whit, we defy augury: ... more text ..." >>> textwrap.wrap(paragraph, 60) ["Not a whit, we defy augury: there's a special providence in", "the fall of a sparrow. If it be now, 'tis not to come; if it", ...] >>> print textwrap.fill(paragraph, 35) Not a whit, we defy augury: there's a special providence in the fall of a sparrow. If it be now, 'tis not to come; if it be not to come, it will be now; if it be not now, yet it will come: the readiness is all. >>>
The module also contains a TextWrapper class that actually implements the text wrapping strategy. Both the TextWrapper class and the wrap() and fill() functions support a number of additional keyword arguments for fine-tuning the formatting; consult the module's documentation for details. (Contributed by Greg Ward.)
try: import threading as _threading except ImportError: import dummy_threading as _threading
In this example, _threading is used as the module name to make it clear that the module being used is not necessarily the actual threading module. Code can call functions and use classes in _threading whether or not threads are supported, avoiding an if statement and making the code slightly clearer. This module will not magically make multithreaded code run without threads; code that waits for another thread to return or to do something will simply hang forever.
import timeit timer1 = timeit.Timer('unicode("abc")') timer2 = timeit.Timer('"abc" + u""') # Run three trials print timer1.repeat(repeat=3, number=100000) print timer2.repeat(repeat=3, number=100000) # On my laptop this outputs: # [0.36831796169281006, 0.37441694736480713, 0.35304892063140869] # [0.17574405670166016, 0.18193507194519043, 0.17565798759460449]
When using _tkinter through the Tkinter module (as most Tkinter applications will), this feature is always activated. It should not cause compatibility problems, since Tkinter would always convert string results to Python types where possible.
If any incompatibilities are found, the old behavior can be restored by setting the wantobjects variable in the Tkinter module to false before creating the first tkapp object.
import Tkinter Tkinter.wantobjects = 0
Any breakage caused by this change should be reported as a bug.
Adding the mix-in as a superclass provides the full dictionary interface whenever the class defines __getitem__, __setitem__, __delitem__, and keys. For example:
>>> import UserDict >>> class SeqDict(UserDict.DictMixin): ... """Dictionary lookalike implemented with lists.""" ... def __init__(self): ... self.keylist = [] ... self.valuelist = [] ... def __getitem__(self, key): ... try: ... i = self.keylist.index(key) ... except ValueError: ... raise KeyError ... return self.valuelist[i] ... def __setitem__(self, key, value): ... try: ... i = self.keylist.index(key) ... self.valuelist[i] = value ... except ValueError: ... self.keylist.append(key) ... self.valuelist.append(value) ... def __delitem__(self, key): ... try: ... i = self.keylist.index(key) ... except ValueError: ... raise KeyError ... self.keylist.pop(i) ... self.valuelist.pop(i) ... def keys(self): ... return list(self.keylist) ... >>> s = SeqDict() >>> dir(s) # See that other dictionary methods are implemented ['__cmp__', '__contains__', '__delitem__', '__doc__', '__getitem__', '__init__', '__iter__', '__len__', '__module__', '__repr__', '__setitem__', 'clear', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keylist', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'valuelist', 'values']
(Contributed by Raymond Hettinger.)
None
. Nil values
are always supported on unmarshalling an XML-RPC response. To
generate requests containing None
, you must supply a true value
for the allow_none parameter when creating a Marshaller
instance.
>>> u"www.Alliancefran¸ caise.nu".encode("idna")
'www.xn-alliancefranaise-npb.nu'
The socket module has also been extended to transparently
convert Unicode hostnames to the ACE version before passing them to
the C library. Modules that deal with hostnames such as
httplib and ftplib) also support Unicode host names;
httplib also sends HTTP "Host" headers using the ACE
version of the domain name. urllib supports Unicode URLs
with non-ASCII host names as long as the path
part of the URL
is ASCII only.
To implement this change, the stringprep module, the
mkstringprep
tool and the punycode
encoding have been added.
Date and time types suitable for expressing timestamps were added as the datetime module. The types don't support different calendars or many fancy features, and just stick to the basics of representing time.
The three primary types are: date, representing a day, month, and year; time, consisting of hour, minute, and second; and datetime, which contains all the attributes of both date and time. There's also a timedelta class representing differences between two points in time, and time zone logic is implemented by classes inheriting from the abstract tzinfo class.
You can create instances of date and time by either
supplying keyword arguments to the appropriate constructor,
e.g. datetime.date(year=1972, month=10, day=15)
, or by using
one of a number of class methods. For example, the date.today()
class method returns the current local date.
Once created, instances of the date/time classes are all immutable. There are a number of methods for producing formatted strings from objects:
>>> import datetime >>> now = datetime.datetime.now() >>> now.isoformat() '2002-12-30T21:27:03.994956' >>> now.ctime() # Only available on date, datetime 'Mon Dec 30 21:27:03 2002' >>> now.strftime('%Y %d %b') '2002 30 Dec'
The replace() method allows modifying one or more fields of a date or datetime instance, returning a new instance:
>>> d = datetime.datetime.now() >>> d datetime.datetime(2002, 12, 30, 22, 15, 38, 827738) >>> d.replace(year=2001, hour = 12) datetime.datetime(2001, 12, 30, 12, 15, 38, 827738) >>>
Instances can be compared, hashed, and converted to strings (the result is the same as that of isoformat()). date and datetime instances can be subtracted from each other, and added to timedelta instances. The largest missing feature is that there's no standard library support for parsing strings and getting back a date or datetime.
For more information, refer to the module's reference documentation. (Contributed by Tim Peters.)
The getopt module provides simple parsing of command-line arguments. The new optparse module (originally named Optik) provides more elaborate command-line parsing that follows the Unix conventions, automatically creates the output for --help, and can perform different actions for different options.
You start by creating an instance of OptionParser and telling it what your program's options are.
import sys from optparse import OptionParser op = OptionParser() op.add_option('-i', '--input', action='store', type='string', dest='input', help='set input filename') op.add_option('-l', '--length', action='store', type='int', dest='length', help='set maximum length of output')
Parsing a command line is then done by calling the parse_args() method.
options, args = op.parse_args(sys.argv[1:]) print options print args
This returns an object containing all of the option values, and a list of strings containing the remaining arguments.
Invoking the script with the various arguments now works as you'd expect it to. Note that the length argument is automatically converted to an integer.
$ ./python opt.py -i data arg1 <Values at 0x400cad4c: {'input': 'data', 'length': None}> ['arg1'] $ ./python opt.py --input=data --length=4 <Values at 0x400cad2c: {'input': 'data', 'length': 4}> [] $
The help message is automatically generated for you:
$ ./python opt.py --help usage: opt.py [options] options: -h, --help show this help message and exit -iINPUT, --input=INPUT set input filename -lLENGTH, --length=LENGTH set maximum length of output $
See the module's documentation for more details.
Optik was written by Greg Ward, with suggestions from the readers of the Getopt SIG.
See About this document... for information on suggesting changes.