Unfortunately the implementation of dictionaries makes this unsafe if
it is modified during the loop: if the modification happens to cause a
reorganization of the dictionary, the state kept for the __cursor__()
method (I presume this would be an index into the internal hash table)
would stop making sense.
I also did a little timing test -- creating the keys() list of a
dictionary with 100,000 items takes a negligeable amount of time
(less than 3%) compared to accessing all items.
The memory used up for the keys() list is also small compared to that
used for the hash table: assuming a hash table which is 2/3 full,
4-byte object pointers, a hash table for 100,000 elements uses 1.2
Mbyte, while the list is 0.4 Mbyte. The actual data will be a fair
multiple of this -- assuming the keys are 5-char strings, each string
object uses 24 bytes (*), or another 2.4 Mbyte (for the keys alone --
some applications might share most values so I won't count them).
This would make the space used up by the keys() list about 11% of the
space used by the dictionary. Not negligeable, but this is using
fairly minimal assumptions -- if you use the dictionary to store
100,000 different values that are 10-byte strings it is already
reduced to 5%.
(*) The object header for a string object looks like this on a typical
32-bit machine:
#bytes description
4 type pointer
4 reference count
4 string length (N)
N+1 string data (rounded up to multiple of 4)
For a 5 byte string this amounts to 20 bytes; add to this 4 bytes of
overhead for malloc() and we have 24 bytes.
Conclusion: don't worry about the cost of using keys() to iterate over
a dictionary unless you are on a memory-starved machine.
--Guido van Rossum, CWI, Amsterdam <Guido.van.Rossum@cwi.nl>
<URL:http://www.cwi.nl/cwi/people/Guido.van.Rossum.html>