为什么在split()结果中返回空字符串?

问题:为什么在split()结果中返回空字符串?

什么是点'/segment/segment/'.split('/')回来['', 'segment', 'segment', '']

注意空元素。如果您要分割的分隔符恰好位于字符串的第一位置,并且位于字符串的末尾,那么它又能为您带来什么额外的价值呢?

What is the point of '/segment/segment/'.split('/') returning ['', 'segment', 'segment', '']?

Notice the empty elements. If you’re splitting on a delimiter that happens to be at position one and at the very end of a string, what extra value does it give you to have the empty string returned from each end?


回答 0

str.splitstr.join,所以

"/".join(['', 'segment', 'segment', ''])

让您返回原始字符串。

如果没有空字符串,则第一个和最后符串'/'将丢失join()

str.split complements str.join, so

"/".join(['', 'segment', 'segment', ''])

gets you back the original string.

If the empty strings were not there, the first and last '/' would be missing after the join()


回答 1

更一般而言,要删除split()结果中返回的空字符串,您可能需要查看该filter函数。

例:

filter(None, '/segment/segment/'.split('/'))

退货

['segment', 'segment']

More generally, to remove empty strings returned in split() results, you may want to look at the filter function.

Example:

f = filter(None, '/segment/segment/'.split('/'))
s_all = list(f)

returns

['segment', 'segment']

回答 2

这里有两点要考虑:

  • 期望结果'/segment/segment/'.split('/')['segment', 'segment']于是合理的,但这会丢失信息。如果split()按照您想要的方式工作,如果我告诉您a.split('/') == ['segment', 'segment'],您将无法告诉我是什么a
  • 结果应该是什么'a//b'.split()['a', 'b']?或['a', '', 'b']?即,是否应split()合并相邻的定界符?如果需要,那么将很难解析由字符分隔的数据,并且某些字段可以为空。我可以肯定,有很多人确实想要上述情况的结果中的空值!

最后,归结为两点:

一致性:如果我有n定界符,则在中a,我会在n+1返回值split()

应该可以做复杂的事情,并且可以轻松地做简单的事情:如果由于想要忽略空字符串split(),可以始终这样做:

def mysplit(s, delim=None):
    return [x for x in s.split(delim) if x]

但是如果不想忽略空值,则应该可以。

该语言必须选择一种定义split()-有太多不同的用例无法满足所有人的默认要求。我认为Python的选择是不错的选择,也是最合乎逻辑的选择。(顺便说一句,我不喜欢C的原因之一strtok()是因为它合并了相邻的定界符,因此很难对其进行认真的解析/标记化处理。)

有一个exceptions:a.split()没有参数会挤压连续的空格,但是有人可以认为在这种情况下这样做是正确的。如果您不想要这种行为,则可以始终这样做a.split(' ')

There are two main points to consider here:

  • Expecting the result of '/segment/segment/'.split('/') to be equal to ['segment', 'segment'] is reasonable, but then this loses information. If split() worked the way you wanted, if I tell you that a.split('/') == ['segment', 'segment'], you can’t tell me what a was.
  • What should be the result of 'a//b'.split() be? ['a', 'b']?, or ['a', '', 'b']? I.e., should split() merge adjacent delimiters? If it should, then it will be very hard to parse data that’s delimited by a character, and some of the fields can be empty. I am fairly sure there are many people who do want the empty values in the result for the above case!

In the end, it boils down to two things:

Consistency: if I have n delimiters, in a, I get n+1 values back after the split().

It should be possible to do complex things, and easy to do simple things: if you want to ignore empty strings as a result of the split(), you can always do:

def mysplit(s, delim=None):
    return [x for x in s.split(delim) if x]

but if one doesn’t want to ignore the empty values, one should be able to.

The language has to pick one definition of split()—there are too many different use cases to satisfy everyone’s requirement as a default. I think that Python’s choice is a good one, and is the most logical. (As an aside, one of the reasons I don’t like C’s strtok() is because it merges adjacent delimiters, making it extremely hard to do serious parsing/tokenization with it.)

There is one exception: a.split() without an argument squeezes consecutive white-space, but one can argue that this is the right thing to do in that case. If you don’t want the behavior, you can always to a.split(' ').


回答 3

x.split(y)始终返回列表1 + x.count(y)项是一种珍贵的规律性-为@ gnibbler本已指出,这让splitjoin对方的确切逆(因为它们显然应该是),这也正是各种分隔符连记录的语义(映射例如csv文件行[[引用网络的净额]],/etc/groupUnix中的行等等),它允许(如@Roman的回答所述)轻松检查(例如)绝对路径与相对路径(在文件路径和URL中),等等。

另一种看待它的方式是,您不应该肆意地将信息扔出窗外,以免获得任何好处。x.split(y)等于会得到什么x.strip(y).split(y)?没事,当然-它很容易使用第二种形式时,这就是你的意思,但如果第一种形式是任意视为指第二个,你有很多工作要做,当你希望第一个(如上一段所指出的,这绝非罕见。

但是实际上,根据数学规律进行思考是您可以自学的设计可传递API的最简单,最通用的方法。举一个不同的例子,对于任何有效的xy x == x[:y] + x[y:]-这立即表明为什么应该排除切片的一个极端非常重要。您可以表述不变的断言越简单,就越有可能产生的语义就是您在现实生活中需要的语义-这是神秘的事实,即数学在处理宇宙中非常有用。

尝试为split前导和尾随定界符是特殊情况的方言制定不变式…反例:像这样的字符串方法isspace并不是最大程度的简单- x.isspace()等效于x and all(c in string.whitespace for c in x)-愚蠢的前导x and是您经常发现自己编码的原因not x or x.isspace(),返回到应该is...字符串方法中设计的简单性(因此,空字符串“就是”您想要的任何东西)与街上人马的感觉相反,也许[[空集,如零&c,始终使大多数人感到困惑;-)]],但完全符合明显完善的数学常识!-)。

Having x.split(y) always return a list of 1 + x.count(y) items is a precious regularity — as @gnibbler’s already pointed out it makes split and join exact inverses of each other (as they obviously should be), it also precisely maps the semantics of all kinds of delimiter-joined records (such as csv file lines [[net of quoting issues]], lines from /etc/group in Unix, and so on), it allows (as @Roman’s answer mentioned) easy checks for (e.g.) absolute vs relative paths (in file paths and URLs), and so forth.

Another way to look at it is that you shouldn’t wantonly toss information out of the window for no gain. What would be gained in making x.split(y) equivalent to x.strip(y).split(y)? Nothing, of course — it’s easy to use the second form when that’s what you mean, but if the first form was arbitrarily deemed to mean the second one, you’d have lot of work to do when you do want the first one (which is far from rare, as the previous paragraph points out).

But really, thinking in terms of mathematical regularity is the simplest and most general way you can teach yourself to design passable APIs. To take a different example, it’s very important that for any valid x and y x == x[:y] + x[y:] — which immediately indicates why one extreme of a slicing should be excluded. The simpler the invariant assertion you can formulate, the likelier it is that the resulting semantics are what you need in real life uses — part of the mystical fact that maths is very useful in dealing with the universe.

Try formulating the invariant for a split dialect in which leading and trailing delimiters are special-cased… counter-example: string methods such as isspace are not maximally simple — x.isspace() is equivalent to x and all(c in string.whitespace for c in x) — that silly leading x and is why you so often find yourself coding not x or x.isspace(), to get back to the simplicity which should have been designed into the is... string methods (whereby an empty string “is” anything you want — contrary to man-in-the-street horse-sense, maybe [[empty sets, like zero &c, have always confused most people;-)]], but fully conforming to obvious well-refined mathematical common-sense!-).


回答 4

我不确定您要寻找哪种答案?您得到三个匹配,因为您有三个定界符。如果您不想要那个空的,只需使用:

'/segment/segment/'.strip('/').split('/')

I’m not sure what kind of answer you’re looking for? You get three matches because you have three delimiters. If you don’t want that empty one, just use:

'/segment/segment/'.strip('/').split('/')

回答 5

好吧,它让您知道那里有一个定界符。因此,看到4个结果会让您知道您有3个定界符。这使您能够使用此信息执行任何所需的操作,而不是让Python删除空元素,然后让您手动检查是否需要定界符(如果需要)。

一个简单的例子:假设您要检查绝对文件名和相对文件名。这样,您就可以使用拆分完成所有操作,而不必检查文件名的第一个字符是什么。

Well, it lets you know there was a delimiter there. So, seeing 4 results lets you know you had 3 delimiters. This gives you the power to do whatever you want with this information, rather than having Python drop the empty elements, and then making you manually check for starting or ending delimiters if you need to know it.

Simple example: Say you want to check for absolute vs. relative filenames. This way you can do it all with the split, without also having to check what the first character of your filename is.


回答 6

考虑以下最小示例:

>>> '/'.split('/')
['', '']

split必须给您定界符前后的内容'/',但没有其他字符。因此,它必须为您提供一个空字符串,从技术上讲,它在之前和之后'/',因为'' + '/' + '' == '/'

Consider this minimal example:

>>> '/'.split('/')
['', '']

split must give you what’s before and after the delimiter '/', but there are no other characters. So it has to give you the empty string, which technically precedes and follows the '/', because '' + '/' + '' == '/'.