I take back my first (posted) thought on this topic...
Well, I thought about it some more, and realized that you *could*
probably get away with "lazy uniquification," and not get burnt by the
temp creation. The idea would be to only unique-ify when someone
tried to do a compare, but then you would cache the results of your
efforts :-). When I started looking at it this way, I suddenly
realized what was going on, and I had what seemed like a good question
to ask:
Why do you want Python to do this for you?
The two reasons I came up with are to a) increase speed of string
compare operations; and b) decrease storage of duplicate strings.
With that answer I realized a very fundamental question (which I like
to ask whenever an optimization is suggested):
What is the provable benefit that you will derive from such a
optimization?
This gets back to the basic question of how to optimize: Only optimize
the sections where the time is being spent :-). Since you can almost
never guess where this place is, use a profiler :-o
Hence instead of having Tim scan his code to see that this activity
would not speed up his code, I think the burden of proof falls on the
proponent to show at least some applications where this change would
make a significant positive difference (gut feel doesn't count,
profile data does). The more I thought about it, the more I came to
expect that on most machine architectures, a string compare operation
really flies. Hence I'm a bit doubtful that you'll really save that
much by optimizing away the string compares. If you have an
application where this is *really* all you are doing, then I'd bet
that lex/flex was really the sort of tool you were looking for, and
even the unique-ification stuff won't help you.
Then again, maybe you're after the space savings, hmmmmmmm.... I guess
I'm not on a DOS box today, and I'm less sympathetic ;-) If this if
what you're after, you'd also have to show a sample app that really
filled up memory with strings (for a good reason!). I'd then note
that what you're really after is a space time trade-off
(unique-ification takes time, but will save memory), and you'd have a
harder time selling us a slower python so that that it could shoe-horn
itself into some minimal amount of RAM.
Jim
Jim Roskind
408-982-4469
jar@infoseek.com