问题:在Python中转义正则表达式字符串

我想使用用户输入作为正则表达式模式来搜索某些文本。它可以工作,但是如何处理用户在正则表达式中放置具有含义的字符的情况?例如,用户要搜索Word (s):正则表达式引擎会将(s)分组。我希望它像对待字符串一样对待它"(s)"。我可以replace在用户输入上运行并将(with \()with 替换,\)但是问题是我将需要对每个可能的正则表达式符号进行替换。你知道更好的方法吗?

I want to use input from a user as a regex pattern for a search over some text. It works, but how I can handle cases where user puts characters that have meaning in regex?

For example, the user wants to search for Word (s): regex engine will take the (s) as a group. I want it to treat it like a string "(s)" . I can run replace on user input and replace the ( with \( and the ) with \) but the problem is I will need to do replace for every possible regex symbol.

Do you know some better way ?


回答 0

re.escape()为此使用函数:

4.2.3 re模块内容

转义(字符串)

返回所有非字母数字加反斜杠的字符串;如果要匹配可能包含正则表达式元字符的任意文字字符串,则此功能很有用。

一个简单的示例,搜索提供的字符串的任何出现情况(可选)后跟“ s”,然后返回匹配对象。

def simplistic_plural(word, text):
    word_or_plural = re.escape(word) + 's?'
    return re.match(word_or_plural, text)

Use the re.escape() function for this:

4.2.3 re Module Contents

escape(string)

Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

A simplistic example, search any occurence of the provided string optionally followed by ‘s’, and return the match object.

def simplistic_plural(word, text):
    word_or_plural = re.escape(word) + 's?'
    return re.match(word_or_plural, text)

回答 1

您可以使用re.escape()

re.escape(string)返回所有非字母数字加反斜杠的字符串;如果要匹配可能包含正则表达式元字符的任意文字字符串,则此功能很有用。

>>> import re
>>> re.escape('^a.*$')
'\\^a\\.\\*\\$'

You can use re.escape():

re.escape(string) Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

>>> import re
>>> re.escape('^a.*$')
'\\^a\\.\\*\\$'

If you are using a Python version < 3.7, this will escape non-alphanumerics that are not part of regular expression syntax as well.

If you are using a Python version < 3.7 but >= 3.3, this will escape non-alphanumerics that are not part of regular expression syntax, except for specifically underscore (_).


回答 2

不幸的是,re.escape()不适合替换字符串:

>>> re.sub('a', re.escape('_'), 'aa')
'\\_\\_'

一种解决方案是将替换项放在lambda中:

>>> re.sub('a', lambda _: '_', 'aa')
'__'

因为lambda的返回值被视为re.sub()文字字符串。

Unfortunately, re.escape() is not suited for the replacement string:

>>> re.sub('a', re.escape('_'), 'aa')
'\\_\\_'

A solution is to put the replacement in a lambda:

>>> re.sub('a', lambda _: '_', 'aa')
'__'

because the return value of the lambda is treated by re.sub() as a literal string.


回答 3

请尝试:

\ Q和\ E作为锚点

放置“或”条件以匹配完整单词或正则表达式。

参考链接:如何匹配包含正则表达式中特殊字符的整个单词

Please give a try:

\Q and \E as anchors

Put an Or condition to match either a full word or regex.

Ref Link : How to match a whole word that includes special characters in regex


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。