4.4.2 Examples

This example compares two strings, considering blanks to be ``junk:''

>>> s = SequenceMatcher(lambda x: x == " ",
...                     "private Thread currentThread;",
...                     "private volatile Thread currentThread;")

ratio() returns a float in [0, 1], measuring the similarity of the sequences. As a rule of thumb, a ratio() value over 0.6 means the sequences are close matches:

>>> print round(s.ratio(), 3)
0.866

If you're only interested in where the sequences match, get_matching_blocks() is handy:

>>> for block in s.get_matching_blocks():
...     print "a[%d] and b[%d] match for %d elements" % block
a[0] and b[0] match for 8 elements
a[8] and b[17] match for 6 elements
a[14] and b[23] match for 15 elements
a[29] and b[38] match for 0 elements

Note that the last tuple returned by get_matching_blocks() is always a dummy, (len(a), len(b), 0), and this is the only case in which the last tuple element (number of elements matched) is 0.

If you want to know how to change the first sequence into the second, use get_opcodes():

>>> for opcode in s.get_opcodes():
...     print "%6s a[%d:%d] b[%d:%d]" % opcode
 equal a[0:8] b[0:8]
insert a[8:8] b[8:17]
 equal a[8:14] b[17:23]
 equal a[14:29] b[23:38]

See Tools/scripts/ndiff.py from the Python source distribution for a fancy human-friendly file differencer, which uses SequenceMatcher both to view files as sequences of lines, and lines as sequences of characters.

See also the function get_close_matches() in this module, which shows how simple code building on SequenceMatcher can be used to do useful work.

See About this document... for information on suggesting changes.