基于正则表达式的Python拆分字符串

问题:基于正则表达式的Python拆分字符串

"HELLO there HOW are YOU"用大写单词分割字符串的最佳方法是什么(在Python中)?

所以我最终得到一个像这样的数组: results = ['HELLO there', 'HOW are', 'YOU']


编辑:

我努力了:

p = re.compile("\b[A-Z]{2,}\b")
print p.split(page_text)

不过,它似乎不起作用。

What is the best way to split a string like "HELLO there HOW are YOU" by upper case words (in Python)?

So I’d end up with an array like such: results = ['HELLO there', 'HOW are', 'YOU']


EDIT:

I have tried:

p = re.compile("\b[A-Z]{2,}\b")
print p.split(page_text)

It doesn’t seem to work, though.


回答 0

我建议

l = re.compile("(?<!^)\s+(?=[A-Z])(?!.\s)").split(s)

检查这个演示

I suggest

l = re.compile("(?<!^)\s+(?=[A-Z])(?!.\s)").split(s)

Check this demo.


回答 1

您可以先行使用:

re.split(r'[ ](?=[A-Z]+\b)', input)

这将在每个空格处分开,后跟一串大写字母,每个大写字母以单词边界结尾。

请注意,方括号仅出于可读性考虑,也可以省略。

如果一个单词的第一个字母大写就足够了(因此,如果您也想在其前面拆分Hello),它将变得更加容易:

re.split(r'[ ](?=[A-Z])', input)

现在,这会在每个空格处分开,后跟任何大写字母。

You could use a lookahead:

re.split(r'[ ](?=[A-Z]+\b)', input)

This will split at every space that is followed by a string of upper-case letters which end in a word-boundary.

Note that the square brackets are only for readability and could as well be omitted.

If it is enough that the first letter of a word is upper case (so if you would want to split in front of Hello as well) it gets even easier:

re.split(r'[ ](?=[A-Z])', input)

Now this splits at every space followed by any upper-case letter.


回答 2

您的问题包含字符串文字"\b[A-Z]{2,}\b",但是这\b意味着退格,因为没有r-修饰符。

试试:r"\b[A-Z]{2,}\b"

Your question contains the string literal "\b[A-Z]{2,}\b", but that \b will mean backspace, because there is no r-modifier.

Try: r"\b[A-Z]{2,}\b".