re.search和re.match有什么区别?

问题:re.search和re.match有什么区别?

search()Python 模块中的match()函数和有什么区别?re

我已经阅读了文档当前文档),但是我似乎从未记得它。我一直在查找并重新学习它。我希望有人会用示例清楚地回答它,以便(也许)它会贴在我的头上。或者至少我将有一个更好的地方来回答我的问题,并且重新学习它所花的时间会更少。

What is the difference between the search() and match() functions in the Python re module?

I’ve read the documentation (current documentation), but I never seem to remember it. I keep having to look it up and re-learn it. I’m hoping that someone will answer it clearly with examples so that (perhaps) it will stick in my head. Or at least I’ll have a better place to return with my question and it will take less time to re-learn it.


回答 0

re.match锚定在字符串的开头。这与换行无关,因此它与^在模式中使用的方式不同。

重新匹配文档所述

如果字符串开头的零个或多个字符 与正则表达式模式匹配,则返回相应的MatchObject实例。None如果字符串与模式不匹配,则返回;否则返回false。请注意,这与零长度匹配不同。

注意:如果要在字符串中的任何位置找到匹配项,请search() 改用。

re.search搜索整个字符串,如文档所述

扫描字符串以查找正则表达式模式产生匹配项的位置,然后返回相应的MatchObject实例。None如果字符串中没有位置与模式匹配,则返回;否则返回false。请注意,这与在字符串中的某个点找到零长度匹配不同。

因此,如果您需要匹配字符串的开头,或者匹配整个字符串,请使用match。它更快。否则使用search

该文档中有一个专门针对matchvs.的部分search,还涵盖了多行字符串:

基于正则表达式的Python提供两种不同的基本操作:match检查是否有比赛 才刚刚开始的字符串,同时search检查是否有匹配 的任何地方的字符串(这是Perl并默认情况下)。

请注意, 即使使用以:开头的正则表达式match也可能仅在字符串的开头,或者在模式下也紧接换行符之后才匹配 。仅当模式在字符串开头 无论模式如何)或在可选 参数给定的起始位置匹配(无论换行符是否在其前面)时,“ ”操作才会成功。search'^''^'MULTILINEmatchpos

现在,足够的讨论。现在来看一些示例代码:

# example code:
string_with_newlines = """something
someotherthing"""

import re

print re.match('some', string_with_newlines) # matches
print re.match('someother', 
               string_with_newlines) # won't match
print re.match('^someother', string_with_newlines, 
               re.MULTILINE) # also won't match
print re.search('someother', 
                string_with_newlines) # finds something
print re.search('^someother', string_with_newlines, 
                re.MULTILINE) # also finds something

m = re.compile('thing$', re.MULTILINE)

print m.match(string_with_newlines) # no match
print m.match(string_with_newlines, pos=4) # matches
print m.search(string_with_newlines, 
               re.MULTILINE) # also matches

re.match is anchored at the beginning of the string. That has nothing to do with newlines, so it is not the same as using ^ in the pattern.

As the re.match documentation says:

If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.

Note: If you want to locate a match anywhere in string, use search() instead.

re.search searches the entire string, as the documentation says:

Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

So if you need to match at the beginning of the string, or to match the entire string use match. It is faster. Otherwise use search.

The documentation has a specific section for match vs. search that also covers multiline strings:

Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string (this is what Perl does by default).

Note that match may differ from search even when using a regular expression beginning with '^': '^' matches only at the start of the string, or in MULTILINE mode also immediately following a newline. The “match” operation succeeds only if the pattern matches at the start of the string regardless of mode, or at the starting position given by the optional pos argument regardless of whether a newline precedes it.

Now, enough talk. Time to see some example code:

# example code:
string_with_newlines = """something
someotherthing"""

import re

print re.match('some', string_with_newlines) # matches
print re.match('someother', 
               string_with_newlines) # won't match
print re.match('^someother', string_with_newlines, 
               re.MULTILINE) # also won't match
print re.search('someother', 
                string_with_newlines) # finds something
print re.search('^someother', string_with_newlines, 
                re.MULTILINE) # also finds something

m = re.compile('thing$', re.MULTILINE)

print m.match(string_with_newlines) # no match
print m.match(string_with_newlines, pos=4) # matches
print m.search(string_with_newlines, 
               re.MULTILINE) # also matches

回答 1

search ⇒在字符串中的任何地方找到东西,然后返回一个匹配对象。

match⇒ 在字符串的开头找到一些内容,然后返回匹配对象。

search ⇒ find something anywhere in the string and return a match object.

match ⇒ find something at the beginning of the string and return a match object.


回答 2

re.search 搜索 ES该模式在整个字符串,而re.match没有搜索不到的格局; 如果没有,则除了在字符串开头匹配外,别无选择。

re.search searches for the pattern throughout the string, whereas re.match does not search the pattern; if it does not, it has no other choice than to match it at start of the string.


回答 3

match比搜索快得多,因此如果不使用regex.search(“ word”),则可以执行regex.match((。*?)word(。*?)),如果您使用数百万个样品。

@ivan_bilan在上面接受的答案下的评论让我开始思考是否有这样的hack是否真的在加速任何事情,所以让我们找出您将真正获得多少性能。

我准备了以下测试套件:

import random
import re
import string
import time

LENGTH = 10
LIST_SIZE = 1000000

def generate_word():
    word = [random.choice(string.ascii_lowercase) for _ in range(LENGTH)]
    word = ''.join(word)
    return word

wordlist = [generate_word() for _ in range(LIST_SIZE)]

start = time.time()
[re.search('python', word) for word in wordlist]
print('search:', time.time() - start)

start = time.time()
[re.match('(.*?)python(.*?)', word) for word in wordlist]
print('match:', time.time() - start)

我进行了10次测量(1M,2M,…,10M个单词),得出以下图表:

产生的直线令人惊讶地(实际上并不那么令人惊讶)直线。鉴于这种特定的模式组合,该search功能(略)更快。该测试的实质是避免过度优化代码。

match is much faster than search, so instead of doing regex.search(“word”) you can do regex.match((.*?)word(.*?)) and gain tons of performance if you are working with millions of samples.

This comment from @ivan_bilan under the accepted answer above got me thinking if such hack is actually speeding anything up, so let’s find out how many tons of performance you will really gain.

I prepared the following test suite:

import random
import re
import string
import time

LENGTH = 10
LIST_SIZE = 1000000

def generate_word():
    word = [random.choice(string.ascii_lowercase) for _ in range(LENGTH)]
    word = ''.join(word)
    return word

wordlist = [generate_word() for _ in range(LIST_SIZE)]

start = time.time()
[re.search('python', word) for word in wordlist]
print('search:', time.time() - start)

start = time.time()
[re.match('(.*?)python(.*?)', word) for word in wordlist]
print('match:', time.time() - start)

I made 10 measurements (1M, 2M, …, 10M words) which gave me the following plot:

The resulting lines are surprisingly (actually not that surprisingly) straight. And the search function is (slightly) faster given this specific pattern combination. The moral of this test: Avoid overoptimizing your code.


回答 4

您可以参考以下示例以了解re.matchand.search 的工作原理

a = "123abc"
t = re.match("[a-z]+",a)
t = re.search("[a-z]+",a)

re.match会回来none,但是re.search会回来abc

You can refer the below example to understand the working of re.match and re.search

a = "123abc"
t = re.match("[a-z]+",a)
t = re.search("[a-z]+",a)

re.match will return none, but re.search will return abc.


回答 5

区别在于,re.match()误导习惯于Perlgrepsed正则表达式匹配的任何人,并且re.search()不会。:-)

正如约翰·D·库克(John D. Cook)所说,要更加清醒一些,re.match()“表现得好像每个模式都以^开头。” 换句话说,re.match('pattern')等于re.search('^pattern')。因此,它锚定了图案的左侧。但这也没有固定模式的右侧:仍然需要终止$

坦白地说,我认为re.match()应该弃用。我很想知道应该保留它的原因。

The difference is, re.match() misleads anyone accustomed to Perl, grep, or sed regular expression matching, and re.search() does not. :-)

More soberly, As John D. Cook remarks, re.match() “behaves as if every pattern has ^ prepended.” In other words, re.match('pattern') equals re.search('^pattern'). So it anchors a pattern’s left side. But it also doesn’t anchor a pattern’s right side: that still requires a terminating $.

Frankly given the above, I think re.match() should be deprecated. I would be interested to know reasons it should be retained.


回答 6

re.match尝试匹配字符串开头的模式。re.search尝试在整个字符串中匹配模式直到找到匹配项。

re.match attempts to match a pattern at the beginning of the string. re.search attempts to match the pattern throughout the string until it finds a match.


回答 7

矮得多:

  • search 扫描整个字符串。

  • match 仅扫描字符串的开头。

以下Ex表示:

>>> a = "123abc"
>>> re.match("[a-z]+",a)
None
>>> re.search("[a-z]+",a)
abc

Much shorter:

  • search scans through the whole string.

  • match scans only the beginning of the string.

Following Ex says it:

>>> a = "123abc"
>>> re.match("[a-z]+",a)
None
>>> re.search("[a-z]+",a)
abc