如何在正则表达式中使用变量?

问题:如何在正则表达式中使用变量?

我想在a variable内部使用regex,该怎么办Python

TEXTO = sys.argv[1]

if re.search(r"\b(?=\w)TEXTO\b(?!\w)", subject, re.IGNORECASE):
    # Successful match
else:
    # Match attempt failed

I’d like to use a variable inside a regex, how can I do this in Python?

TEXTO = sys.argv[1]

if re.search(r"\b(?=\w)TEXTO\b(?!\w)", subject, re.IGNORECASE):
    # Successful match
else:
    # Match attempt failed

回答 0

从python 3.6开始,您还可以使用文字字符串插值(“ f-strings”)。在您的特定情况下,解决方案是:

if re.search(rf"\b(?=\w){TEXTO}\b(?!\w)", subject, re.IGNORECASE):
    ...do something

编辑:

既然评论中存在一些有关如何处理特殊字符的问题,我想扩展一下我的答案:

原始字符串(’r’):

在正则表达式中处理特殊字符时,您必须了解的主要概念之一是区分字符串文字和正则表达式本身。这是很好的解释在这里

简而言之:

假设您要匹配字符串\b之后,而不是查找单词边界。你必须写:TEXTO\boundary

TEXTO = "Var"
subject = r"Var\boundary"

if re.search(rf"\b(?=\w){TEXTO}\\boundary(?!\w)", subject, re.IGNORECASE):
    print("match")

这仅起作用,因为我们使用的是原始字符串(正则表达式以’r’开头),否则我们必须在正则表达式中写入“ \\\\ boundary”(四个反斜杠)。另外,如果没有’\ r’,\ b’将不再转换为单词边界,而是转换为退格键!

重新转义

基本上在任何特殊字符的前面放置一个空格。因此,如果您希望TEXTO中有特殊字符,则需要编写:

if re.search(rf"\b(?=\w){re.escape(TEXTO)}\b(?!\w)", subject, re.IGNORECASE):
    print("match")

注:对于任何版本> = 3.7蟒:!"%',/:;<=>@,和`都没有逃脱。仅对正则表达式中具有含义的特殊字符进行转义。_因为Python 3.3没有逃脱。(送。这里

大括号:

如果要在使用f字符串的正则表达式中使用量词,则必须使用双花括号。假设您要匹配TEXTO,然后再精确匹配2位数字:

if re.search(rf"\b(?=\w){re.escape(TEXTO)}\d{{2}}\b(?!\w)", subject, re.IGNORECASE):
    print("match")

From python 3.6 on you can also use Literal String Interpolation, “f-strings”. In your particular case the solution would be:

if re.search(rf"\b(?=\w){TEXTO}\b(?!\w)", subject, re.IGNORECASE):
    ...do something

EDIT:

Since there have been some questions in the comment on how to deal with special characters I’d like to extend my answer:

raw strings (‘r’):

One of the main concepts you have to understand when dealing with special characters in regular expressions is to distinguish between string literals and the regular expression itself. It is very well explained here:

In short:

Let’s say instead of finding a word boundary \b after TEXTO you want to match the string \boundary. The you have to write:

TEXTO = "Var"
subject = r"Var\boundary"

if re.search(rf"\b(?=\w){TEXTO}\\boundary(?!\w)", subject, re.IGNORECASE):
    print("match")

This only works because we are using a raw-string (the regex is preceded by ‘r’), otherwise we must write “\\\\boundary” in the regex (four backslashes). Additionally, without ‘\r’, \b’ would not converted to a word boundary anymore but to a backspace!

re.escape:

Basically puts a backspace in front of any special character. Hence, if you expect a special character in TEXTO, you need to write:

if re.search(rf"\b(?=\w){re.escape(TEXTO)}\b(?!\w)", subject, re.IGNORECASE):
    print("match")

NOTE: For any version >= python 3.7: !, ", %, ', ,, /, :, ;, <, =, >, @, and ` are not escaped. Only special characters with meaning in a regex are still escaped. _ is not escaped since Python 3.3.(s. here)

Curly braces:

If you want to use quantifiers within the regular expression using f-strings, you have to use double curly braces. Let’s say you want to match TEXTO followed by exactly 2 digits:

if re.search(rf"\b(?=\w){re.escape(TEXTO)}\d{{2}}\b(?!\w)", subject, re.IGNORECASE):
    print("match")

回答 1

您必须将正则表达式构建为字符串:

TEXTO = sys.argv[1]
my_regex = r"\b(?=\w)" + re.escape(TEXTO) + r"\b(?!\w)"

if re.search(my_regex, subject, re.IGNORECASE):
    etc.

请注意使用,re.escape这样如果您的文本中包含特殊字符,则不会这样解释它们。

You have to build the regex as a string:

TEXTO = sys.argv[1]
my_regex = r"\b(?=\w)" + re.escape(TEXTO) + r"\b(?!\w)"

if re.search(my_regex, subject, re.IGNORECASE):
    etc.

Note the use of re.escape so that if your text has special characters, they won’t be interpreted as such.


回答 2

if re.search(r"\b(?<=\w)%s\b(?!\w)" % TEXTO, subject, re.IGNORECASE):

这会将TEXTO中的内容作为字符串插入到正则表达式中。

if re.search(r"\b(?<=\w)%s\b(?!\w)" % TEXTO, subject, re.IGNORECASE):

This will insert what is in TEXTO into the regex as a string.


回答 3

rx = r'\b(?<=\w){0}\b(?!\w)'.format(TEXTO)
rx = r'\b(?<=\w){0}\b(?!\w)'.format(TEXTO)

回答 4

我发现通过将多个较小的模式串在一起来构建正则表达式模式非常方便。

import re

string = "begin:id1:tag:middl:id2:tag:id3:end"
re_str1 = r'(?<=(\S{5})):'
re_str2 = r'(id\d+):(?=tag:)'
re_pattern = re.compile(re_str1 + re_str2)
match = re_pattern.findall(string)
print(match)

输出:

[('begin', 'id1'), ('middl', 'id2')]

I find it very convenient to build a regular expression pattern by stringing together multiple smaller patterns.

import re

string = "begin:id1:tag:middl:id2:tag:id3:end"
re_str1 = r'(?<=(\S{5})):'
re_str2 = r'(id\d+):(?=tag:)'
re_pattern = re.compile(re_str1 + re_str2)
match = re_pattern.findall(string)
print(match)

Output:

[('begin', 'id1'), ('middl', 'id2')]

回答 5

我同意以上所有条件,除非:

sys.argv[1] 就像 Chicken\d{2}-\d{2}An\s*important\s*anchor

sys.argv[1] = "Chicken\d{2}-\d{2}An\s*important\s*anchor"

您不想使用re.escape,因为在这种情况下,您希望它的行为类似于正则表达式

TEXTO = sys.argv[1]

if re.search(r"\b(?<=\w)" + TEXTO + "\b(?!\w)", subject, re.IGNORECASE):
    # Successful match
else:
    # Match attempt failed

I agree with all the above unless:

sys.argv[1] was something like Chicken\d{2}-\d{2}An\s*important\s*anchor

sys.argv[1] = "Chicken\d{2}-\d{2}An\s*important\s*anchor"

you would not want to use re.escape, because in that case you would like it to behave like a regex

TEXTO = sys.argv[1]

if re.search(r"\b(?<=\w)" + TEXTO + "\b(?!\w)", subject, re.IGNORECASE):
    # Successful match
else:
    # Match attempt failed

回答 6

我需要搜索彼此相似的用户名,Ned Batchelder所说的话非常有用。但是,当我使用re.compile创建我的搜索项时,发现输出更清晰:

pattern = re.compile(r"("+username+".*):(.*?):(.*?):(.*?):(.*)"
matches = re.findall(pattern, lines)

可以使用以下命令打印输出:

print(matches[1]) # prints one whole matching line (in this case, the first line)
print(matches[1][3]) # prints the fourth character group (established with the parentheses in the regex statement) of the first line.

I needed to search for usernames that are similar to each other, and what Ned Batchelder said was incredibly helpful. However, I found I had cleaner output when I used re.compile to create my re search term:

pattern = re.compile(r"("+username+".*):(.*?):(.*?):(.*?):(.*)"
matches = re.findall(pattern, lines)

Output can be printed using the following:

print(matches[1]) # prints one whole matching line (in this case, the first line)
print(matches[1][3]) # prints the fourth character group (established with the parentheses in the regex statement) of the first line.

回答 7

您可以使用formatgrammer suger 尝试另一种用法:

re_genre = r'{}'.format(your_variable)
regex_pattern = re.compile(re_genre)  

you can try another usage using format grammer suger:

re_genre = r'{}'.format(your_variable)
regex_pattern = re.compile(re_genre)  

回答 8

您也可以为此使用format关键字。Format方法将{}占位符替换为您作为参数传递给format方法的变量。

if re.search(r"\b(?=\w)**{}**\b(?!\w)".**format(TEXTO)**, subject, re.IGNORECASE):
    # Successful match**strong text**
else:
    # Match attempt failed

You can use format keyword as well for this.Format method will replace {} placeholder to the variable which you passed to the format method as an argument.

if re.search(r"\b(?=\w)**{}**\b(?!\w)".**format(TEXTO)**, subject, re.IGNORECASE):
    # Successful match**strong text**
else:
    # Match attempt failed

回答 9

更多例子

我有带有流文件的configus.yml

"pattern":
  - _(\d{14})_
"datetime_string":
  - "%m%d%Y%H%M%f"

在我使用的python代码中

data_time_real_file=re.findall(r""+flows[flow]["pattern"][0]+"", latest_file)

more example

I have configus.yml with flows files

"pattern":
  - _(\d{14})_
"datetime_string":
  - "%m%d%Y%H%M%f"

in python code I use

data_time_real_file=re.findall(r""+flows[flow]["pattern"][0]+"", latest_file)