Subject: [String-SIG] \ escapes in re.sub From: "Andrew M. Kuchling" To: string-sig@python.org Date: Sun, 22 Mar 1998 22:48:27 -0500 (EST) The patch below changes the way escapes are handled in the replacement string of re.sub and re.subn. An unknown escape like \s is currently treated as just an 's', which is different from the way unknown escapes are usually treated in Python's string literals. This patch modifies the behaviour so that the \ is preserved, and \s results in '\s'. This was reported as a bug a little while back. I'm not sure if it really is; keeping just the 's' is inconsistent with Python string literals, but it's consistent with re patterns. For example, r'\j' matches just 'j'. It can be argued that the behaviour should be left as it is. So it's a stylistic question: which consistency is better? Which inconsistency is worse? Some of the tests in test_re.py will break as a result of this change; I'm going to take care of that in a separate patch in a little bit, if it's decided that the change should be accepted. Once this is settled, I'll be issuing a new PCRE release that implements \g<1> in replacement strings, as a synonym for \1 that isn't vulnerable to ambiguities as in \10\1. No patch will be made for that, because it's a new feature and not a bug. A.M. Kuchling http://starship.skyport.net/crew/amk/ There was not a lot that could be done to make Morpork a worse place. A direct hit by a meteorite, for example, would count as gentrification. -- Terry Pratchett, _Pyramids_ *** pcremodule.c 1998/03/11 04:49:44 1.9 --- Modules/pcremodule.c 1998/03/19 03:55:57 *************** *** 73,77 **** #define BEGINNING_OF_BUFFER 7 #define END_OF_BUFFER 8 ! static PcreObject * --- 73,77 ---- #define BEGINNING_OF_BUFFER 7 #define END_OF_BUFFER 8 ! #define STRING 9 static PcreObject * *************** *** 284,287 **** --- 284,291 ---- return Py_BuildValue("c", (char)8); break; + case('\\'): + *indexptr=index; + return Py_BuildValue("c", '\\'); + break; case('x'): *************** *** 450,455 **** default: *indexptr = index; ! return Py_BuildValue("c", c); break; } --- 454,462 ---- default: + /* It's some unknown escape like \s, so return a string containing + \s */ + *typeptr = STRING; *indexptr = index; ! return Py_BuildValue("s#", pattern+index-2, 2); break; } *************** *** 543,546 **** --- 550,559 ---- } break; + case(STRING): + { + PyList_Append(results, value); + total_len += PyString_Size(value); + break; + } default: Py_DECREF(results); ------------------------------------------------------ String-SIG maillist - String-SIG@python.org http://www.python.org/mailman/listinfo/string-sig