l1 = [1, 2, 3, 4, 5]
l2 = [1, 3, 5, 7, 9]
union = l1
intersection = []
for e in l2:
if e not in l1: union.append(e)
if e in l1: intersection.append(e)
This works fine for short sequences, but fails miserably for long ones. (My
lists are each several hundred elements long and I wind up often with a
resulting union of well over 1000 elements. The elements are themselves
Python classes, so there is a __cmp__ function to contend with that may well
be what's killing me.)
I sped things up somewhat (read: I was willing to wait for the process to
complete :-) by sorting the lists and using a binary insert:
def binsert(list, e):
if len(list) == 0: list.append(e)
if type(list[0]) != type(e):
raise TypeError
l = 0
h = len(list) - 1
while l <= h:
m = l + (h-l)/2
mid = list[m]
if mid < e:
l = m + 1
elif mid > e:
h = m - 1
else:
return
list[l:l] = [e]
Before I go and reimplement someone else's wheel or wander off in fruitless
search of the unobtainable, can somebody answer:
1. Is it possible to speed this process up by recoding it in C or will
accessing the list and its components be the bottleneck?
2. Has somebody already implemented reasonably fast set
intersection/union for Python lists?