Python Regular Expressions to implement string unescaping

I am attempting to implement string unescaping with Python regex and also backreferences, and also it does not appear to intend to function quite possibly. I'm certain it is something I'm doing incorrect yet I can not identify what ...

>>> import re
>>> mystring = r"This is \n a test \r"
>>> p = re.compile( "\\\\(\\S)" )
>>> p.sub( "\\1", mystring )
'This is n a test r'
>>> p.sub( "\\\\\\1", mystring )
'This is \\n a test \\r'
>>> p.sub( "\\\\1", mystring )
'This is \\1 a test \\1'

I would certainly such as to change \\ [char ] with \ [char ], yet backreferences in Python do not show up to adhere to the very same regulations they carry out in every various other execution I've ever before made use of. Could a person lost some light?

0
2019-12-02 03:02:57
Source Share
Answers: 5

You are being fooled by Python is depiction of the outcome string. The Python expression :

'This is \\n a test \\r'

stands for the string

This is \n a test \r

which is I assume what you desired. Attempt including 'print' before each of your p.sub () phones call to publish the real string returned as opposed to a Python depiction of the string.

>>> mystring = r"This is \n a test \r"
>>> mystring
'This is \\n a test \\r'
>>> print mystring
This is \n a test \r
0
2019-12-03 04:49:48
Source

The suggestion is that I'll read in a run away string, and also unescape it (an attribute especially doing not have from Python, which you should not require to consider normal expressions for to begin with). However I'm not being fooled by the backslashes ...

Another illustratory instance :

>>> mystring = r"This is \n ridiculous"
>>> print mystring
This is \n ridiculous
>>> p = re.compile( r"\\(\S)" )
>>> print p.sub( 'bloody', mystring )
This is bloody ridiculous
>>> print p.sub( r'\1', mystring )
This is n ridiculous
>>> print p.sub( r'\\1', mystring )
This is \1 ridiculous
>>> print p.sub( r'\\\1', mystring )
This is \n ridiculous

What I would certainly like it to print is

This is 
ridiculous
0
2019-12-03 04:47:17
Source

Well, I assume you could have missed out on the r or miscounted the backslashes ...

"\\n" == r"\n"

>>> import re
>>> mystring = r"This is \\n a test \\r"
>>> p = re.compile( r"[\\][\\](.)" )
>>> print p.sub( r"\\\1", mystring )
This is \n a test \r
>>>

Which, if I recognized is what was asked for.

I believe the even more usual demand is this :

>>> d = {'n':'\n', 'r':'\r', 'f':'\f'}
>>> p = re.compile(r"[\\]([nrfv])")
>>> print p.sub(lambda mo: d[mo.group(1)], mystring)
This is \
 a test \
>>>

The interested pupil need to additionally read Ken Thompson is Reflections on Trusting Trust", in which our hero makes use of a comparable instance to clarify the risks of relying on compilers you have not bootstrapped from equipment code on your own.

0
2019-12-03 04:46:59
Source

Isn't that what Anders' 2nd instance does?

In 2.5 there is additionally a string-escape inscribing you can use :

>>> mystring = r"This is \n a test \r"
>>> mystring.decode('string-escape')
'This is \n a test \r'
>>> print mystring.decode('string-escape')
This is 
 a test 
>>> 
0
2019-12-03 04:17:05
Source

Mark ; his 2nd instance calls for every ran away personality tossed right into an array originally, which creates a KeyError if the retreat series takes place not to be in the array. It will certainly pass away on anything yet the 3 personalities given (offer \v a shot), and also identifying every feasible retreat series every single time you intend to unescape a string (or maintaining an international array) is an actually negative remedy. Similar to PHP, that is making use of preg_replace_callback() with a lambda as opposed to preg_replace(), which is entirely unneeded in this scenario.

I'm sorry if I'm coming off as a prick concerning it, I'm simply entirely discouraged with Python. This is sustained by every various other normal expression engine I've ever before made use of, and also I can not recognize why this would not function.

Thanks for reacting ; the string.decode('string-escape') function is specifically what i was seeking originally. If a person has a basic remedy to the regex backreference trouble, do not hesitate to upload it and also I'll approve that as a solution too.

0
2019-12-03 04:16:29
Source