标签归档:regex-group

Python Regex立即替换组

问题:Python Regex立即替换组

有没有办法使用正则表达式语法直接替换所有组?

正常方式:

re.match(r"(?:aaa)(_bbb)", string1).group(1)

但我想实现以下目标:

re.match(r"(\d.*?)\s(\d.*?)", "(CALL_GROUP_1) (CALL_GROUP_2)")

我想从正则表达式刚刚捕获的组中立即构建新字符串。

Is there any way to directly replace all groups using regex syntax?

The normal way:

re.match(r"(?:aaa)(_bbb)", string1).group(1)

But I want to achieve something like this:

re.match(r"(\d.*?)\s(\d.*?)", "(CALL_GROUP_1) (CALL_GROUP_2)")

I want to build the new string instantaneously from the groups the Regex just captured.


回答 0

看一下re.sub

result = re.sub(r"(\d.*?)\s(\d.*?)", r"\1 \2", string1)

这是Python的正则表达式替换(替换)功能。替换字符串可以用所谓的反向引用(反斜杠,组号)填充,这些反向引用将被组匹配的内容替换。该组的计数与该group(...)函数的计数相同,即1,从,从左到右,通过打开括号开始。

Have a look at re.sub:

result = re.sub(r"(\d.*?)\s(\d.*?)", r"\1 \2", string1)

This is Python’s regex substitution (replace) function. The replacement string can be filled with so-called backreferences (backslash, group number) which are replaced with what was matched by the groups. Groups are counted the same as by the group(...) function, i.e. starting from 1, from left to right, by opening parentheses.


回答 1

公认的答案是完美的。我想补充一点,使用以下语法可能会更好地实现组引用:

r"\g<1> \g<2>"

用于替换字符串。这样,您就可以解决语法限制,在语法限制中,组后面可以跟数字。再说一次,这一切都存在于文档中,没有什么新鲜的,只是有时很难一眼看出来。

The accepted answer is perfect. I would add that group reference is probably better achieved by using this syntax:

r"\g<1> \g<2>"

for the replacement string. This way, you work around syntax limitations where a group may be followed by a digit. Again, this is all present in the doc, nothing new, just sometimes difficult to spot at first sight.


命名正则表达式组“(?P regexp)”:“ P”代表什么?

问题:命名正则表达式组“(?P regexp)”:“ P”代表什么?

在Python中,该(?P<group_name>…) 语法允许人们通过其名称引用匹配的字符串:

>>> import re
>>> match = re.search('(?P<name>.*) (?P<phone>.*)', 'John 123456')
>>> match.group('name')
'John'

“ P”代表什么?我在官方文档中找不到任何提示。

我很想获得有关如何帮助我的学生记住该语法的想法。知道“ P”代表(或可能代表)什么会很有用。

In Python, the (?P<group_name>…) syntax allows one to refer to the matched string through its name:

>>> import re
>>> match = re.search('(?P<name>.*) (?P<phone>.*)', 'John 123456')
>>> match.group('name')
'John'

What does “P” stand for? I could not find any hint in the official documentation.

I would love to get ideas about how to help my students remember this syntax. Knowing what “P” does stand for (or might stand for) would be useful.


回答 0

既然我们都在猜测,我还是不妨告诉我:我一直认为它代表Python。这听起来可能很愚蠢-什么,P for Python?-但为了辩护,我隐约记得了这个主题[我的重点]:

主题:声明(?P …)正则表达式语法扩展

来自:Guido van Rossum(gui … @ CNRI.Reston.Va.US)

日期:1997年12月10日下午3:36:19

我对Perl开发人员(开发Perl语言的人)有不同寻常的要求。我希望这个(perl5-porters)是正确的列表。我正在抄送Python字符串信号,因为它是我在此讨论的大部分工作的起源。

您可能知道Python。我是Python的创造者;我计划在今年年底之前发布下一个“主要”版本Python 1.5。我希望Python和Perl可以在未来的几年中共存。异花授粉对两种语言都有好处。(我相信Larry在向Perl 5添加对象时对Python有很好的了解; O’Reilly出版了有关这两种语言的书籍。)

如您所知,Python 1.5添加了一个新的正则表达式模块,该模块与Perl的语法更加匹配。我们试图在Python的语法中尽可能地接近Perl语法。但是,正则表达式语法具有一些特定于Python的扩展名,它们都以(?P开头。目前有两个:

(?P<foo>...)与常规分组括号类似,但是在
执行匹配后,可以通过符号组名“ foo”访问该组所匹配的文本。

(?P=foo)匹配与名为“ foo”的组匹配的字符串。等效于\ 1,\ 2等,除了组是
通过名称而不是数字来引用的。

我希望这个特定于Python的扩展名不会与以后的Perl regex语法的任何Perl扩展名冲突。如果你有计划的使用(?P,请让我们尽快知道,以便我们能够解决冲突。 否则,这将是很好,如果(?P语法可以永久的Python特定的语法扩展保留。 (是有某种扩展注册表吗?)

拉里·沃尔(Larry Wall)回答:

[…]到目前为止,还没有注册表-您的请求是来自外部perl5-porter的第一个请求,因此这是一个相当低的带宽活动。(对不起,上周价格甚至更低-我去纽约的互联网世界。)

无论如何,就我而言,我的祝福一定会让你“ P”。(显然,Perl在这一点上不需要’P’。:-) […]

所以我不知道P最初的选择是由-模式引起的吗?占位符?企鹅?-但您可以理解为什么我总是将其与Python关联。考虑到(1)我不喜欢正则表达式并且尽可能避免使用它们,以及(2)这个线程发生在15年前,这有点奇怪。

Since we’re all guessing, I might as well give mine: I’ve always thought it stood for Python. That may sound pretty stupid — what, P for Python?! — but in my defense, I vaguely remembered this thread [emphasis mine]:

Subject: Claiming (?P…) regex syntax extensions

From: Guido van Rossum (gui…@CNRI.Reston.Va.US)

Date: Dec 10, 1997 3:36:19 pm

I have an unusual request for the Perl developers (those that develop the Perl language). I hope this (perl5-porters) is the right list. I am cc’ing the Python string-sig because it is the origin of most of the work I’m discussing here.

You are probably aware of Python. I am Python’s creator; I am planning to release a next “major” version, Python 1.5, by the end of this year. I hope that Python and Perl can co-exist in years to come; cross-pollination can be good for both languages. (I believe Larry had a good look at Python when he added objects to Perl 5; O’Reilly publishes books about both languages.)

As you may know, Python 1.5 adds a new regular expression module that more closely matches Perl’s syntax. We’ve tried to be as close to the Perl syntax as possible within Python’s syntax. However, the regex syntax has some Python-specific extensions, which all begin with (?P . Currently there are two of them:

(?P<foo>...) Similar to regular grouping parentheses, but the text
matched by the group is accessible after the match has been performed, via the symbolic group name “foo”.

(?P=foo) Matches the same string as that matched by the group named “foo”. Equivalent to \1, \2, etc. except that the group is referred
to by name, not number.

I hope that this Python-specific extension won’t conflict with any future Perl extensions to the Perl regex syntax. If you have plans to use (?P, please let us know as soon as possible so we can resolve the conflict. Otherwise, it would be nice if the (?P syntax could be permanently reserved for Python-specific syntax extensions. (Is there some kind of registry of extensions?)

to which Larry Wall replied:

[…] There’s no registry as of now–yours is the first request from outside perl5-porters, so it’s a pretty low-bandwidth activity. (Sorry it was even lower last week–I was off in New York at Internet World.)

Anyway, as far as I’m concerned, you may certainly have ‘P’ with my blessing. (Obviously Perl doesn’t need the ‘P’ at this point. :-) […]

So I don’t know what the original choice of P was motivated by — pattern? placeholder? penguins? — but you can understand why I’ve always associated it with Python. Which considering that (1) I don’t like regular expressions and avoid them wherever possible, and (2) this thread happened fifteen years ago, is kind of odd.


回答 1

模式!该组命名一个(子)模式,供以后在正则表达式中使用。有关如何使用此类组的详细信息,请参见此处的文档

Pattern! The group names a (sub)pattern for later use in the regex. See the documentation here for details about how such groups are used.


回答 2

Python扩展。从Python Docos:

Perl开发人员选择的解决方案是使用(?…)作为扩展语法。?括号后立即是语法错误,因为?无需重复,因此不会带来任何兼容性问题。?之后的字符 指示正在使用什么扩展名,因此(?= foo)是一回事(正向超前断言),而(?:foo)是另外一回事(包含子表达式foo的非捕获组)。

Python支持Perl的几种扩展,并在Perl的扩展语法中添加了扩展语法。如果问号后的第一个字符是P,则说明它是特定于Python的扩展名

https://docs.python.org/3/howto/regex.html

Python Extension. From the Python Docos:

The solution chosen by the Perl developers was to use (?…) as the extension syntax. ? immediately after a parenthesis was a syntax error because the ? would have nothing to repeat, so this didn’t introduce any compatibility problems. The characters immediately after the ? indicate what extension is being used, so (?=foo) is one thing (a positive lookahead assertion) and (?:foo) is something else (a non-capturing group containing the subexpression foo).

Python supports several of Perl’s extensions and adds an extension syntax to Perl’s extension syntax.If the first character after the question mark is a P, you know that it’s an extension that’s specific to Python

https://docs.python.org/3/howto/regex.html


python re.sub组:\ number之后的数字

问题:python re.sub组:\ number之后的数字

如何替换foobarfoo123bar

这不起作用:

>>> re.sub(r'(foo)', r'\1123', 'foobar')
'J3bar'

这有效:

>>> re.sub(r'(foo)', r'\1hi', 'foobar')
'foohibar'

我认为,遇到时,这是一个普遍的问题\number。谁能给我一个关于如何处理的提示?

How can I replace foobar with foo123bar?

This doesn’t work:

>>> re.sub(r'(foo)', r'\1123', 'foobar')
'J3bar'

This works:

>>> re.sub(r'(foo)', r'\1hi', 'foobar')
'foohibar'

I think it’s a common issue when having something like \number. Can anyone give me a hint on how to handle this?


回答 0

答案是:

re.sub(r'(foo)', r'\g<1>123', 'foobar')

相关摘录:

除了如上所述的字符转义和反向引用之外,\ g将使用由(?P …)语法定义的名为name的组匹配的子字符串。\ g使用​​相应的组号;因此,\ g <2>等效于\ 2,但在诸如\ g <2> 0之类的替换中并没有歧义。\ 20将被解释为对组20的引用,而不是对组2的引用,后跟文字字符“ 0”。反向引用\ g <0>替换RE匹配的整个子字符串。

The answer is:

re.sub(r'(foo)', r'\g<1>123', 'foobar')

Relevant excerpt from the docs:

In addition to character escapes and backreferences as described above, \g will use the substring matched by the group named name, as defined by the (?P…) syntax. \g uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn’t ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character ‘0’. The backreference \g<0> substitutes in the entire substring matched by the RE.