>We all know this is unsafe:
> for m in list:
> if m matches some criterion:
> list.remove(m)
>since the list is mutating out from under the internal pointers that the for
>loop is using.
>I've seen Guido recommend:
> l = list[:]
> for m in l:
> if m matches some criterion:
> list.remove(m)
>which separates the list being modified from the list being iterated over.
>Unfortunately, I have a big list that is expensive to duplicate (~ 100K
>elements) and I run over it several times looking for elements that match
>different criteria and culling them. I came up with the following way out
>of my dilemma.
>Instead of creating the list with my data directly as elements of the list,
>I create elements that are themselves lists, with the first component the
>real data and the second a marker number:
> list = []
> for l in foo.readlines():
> list.append([l, 0])
>Then I can iterate over them and simply set the marker when the real data
>satisfies the criteria:
> for elt in list:
> if elt[1] == 1: continue
> if elt[0] matches some criterion:
> # do something with elt[0]
> elt[1] == 1
>I got a tremendous performance increase in my little application --
>analyzing my Web logs
Another way of doing this is to iterate backwards over your big list
and without making a copy:
list = [ <YourVeryHugeList> ]
i = len(list)
while i != 0:
i = i - 1
if list[i] matches some criterion:
do something with list[i]
It works fine for the small tests I tried (on a list of numbers, with modulo
as a test and 'del' as the doSomething part). Is it safe, anyone?
Magnus Lindberg
#include <disclaimer.h>