This example compares two strings, considering blanks to be ``junk:''
>>> s = SequenceMatcher(lambda x: x == " ", ... "private Thread currentThread;", ... "private volatile Thread currentThread;")
ratio() returns a float in [0, 1], measuring the similarity of the sequences. As a rule of thumb, a ratio() value over 0.6 means the sequences are close matches:
>>> print round(s.ratio(), 3) 0.866
If you're only interested in where the sequences match, get_matching_blocks() is handy:
>>> for block in s.get_matching_blocks(): ... print "a[%d] and b[%d] match for %d elements" % block a[0] and b[0] match for 8 elements a[8] and b[17] match for 6 elements a[14] and b[23] match for 15 elements a[29] and b[38] match for 0 elements
Note that the last tuple returned by get_matching_blocks() is
always a dummy, (len(a), len(b), 0)
, and this is
the only case in which the last tuple element (number of elements
matched) is 0
.
If you want to know how to change the first sequence into the second, use get_opcodes():
>>> for opcode in s.get_opcodes(): ... print "%6s a[%d:%d] b[%d:%d]" % opcode equal a[0:8] b[0:8] insert a[8:8] b[8:17] equal a[8:14] b[17:23] equal a[14:29] b[23:38]
See Tools/scripts/ndiff.py from the Python source distribution for a fancy human-friendly file differencer, which uses SequenceMatcher both to view files as sequences of lines, and lines as sequences of characters.
See also the function get_close_matches() in this module, which shows how simple code building on SequenceMatcher can be used to do useful work.
See About this document... for information on suggesting changes.