regex
This module provides regular expression matching operations similar to those found in Emacs. It is always available.
By default the patterns are Emacs-style regular expressions (with one exception). There is a way to change the syntax to match that of several well-known Unix utilities. The exception is that Emacs' `\s' pattern is not supported, since the original implementation references the Emacs syntax tables.
This module is 8-bit clean: both patterns and strings may contain null bytes and characters whose high bit is set.
Please note: There is a little-known fact about Python string
literals which means that you don't usually have to worry about
doubling backslashes, even though they are used to escape special
characters in string literals as well as in regular expressions. This
is because Python doesn't remove backslashes from string literals if
they are followed by an unrecognized escape character.
However, if you want to include a literal backslash in a
regular expression represented as a string literal, you have to
quadruple it. E.g. to extract LaTeX `\section{
...}' headers from a document, you can use this pattern:
'\\\\section{\(.*\)}'
. Another exception:
the escape sequece `\b' is significant in string literals
(where it means the ASCII bell character) as well as in Emacs regular
expressions (where it stands for a word boundary), so in order to
search for a word boundary, you should use the pattern '\\b'
.
Similarly, a backslash followed by a digit 0-7 should be doubled to
avoid interpretation as an octal escape.