为什么在split（）结果中返回空字符串？

Question 1

What is the point of '/segment/segment/'.split('/') returning ['', 'segment', 'segment', '']?

Notice the empty elements. If you’re splitting on a delimiter that happens to be at position one and at the very end of a string, what extra value does it give you to have the empty string returned from each end?

Question 2

str.split complements str.join, so

"/".join(['', 'segment', 'segment', ''])

gets you back the original string.

If the empty strings were not there, the first and last '/' would be missing after the join()

Question 3

More generally, to remove empty strings returned in split() results, you may want to look at the filter function.

Example:

f = filter(None, '/segment/segment/'.split('/'))
s_all = list(f)

returns

['segment', 'segment']

Question 4

There are two main points to consider here:

Expecting the result of '/segment/segment/'.split('/') to be equal to ['segment', 'segment'] is reasonable, but then this loses information. If split() worked the way you wanted, if I tell you that a.split('/') == ['segment', 'segment'], you can’t tell me what a was.
What should be the result of 'a//b'.split() be? ['a', 'b']?, or ['a', '', 'b']? I.e., should split() merge adjacent delimiters? If it should, then it will be very hard to parse data that’s delimited by a character, and some of the fields can be empty. I am fairly sure there are many people who do want the empty values in the result for the above case!

In the end, it boils down to two things:

Consistency: if I have n delimiters, in a, I get n+1 values back after the split().

It should be possible to do complex things, and easy to do simple things: if you want to ignore empty strings as a result of the split(), you can always do:

def mysplit(s, delim=None):
    return [x for x in s.split(delim) if x]

but if one doesn’t want to ignore the empty values, one should be able to.

The language has to pick one definition of split()—there are too many different use cases to satisfy everyone’s requirement as a default. I think that Python’s choice is a good one, and is the most logical. (As an aside, one of the reasons I don’t like C’s strtok() is because it merges adjacent delimiters, making it extremely hard to do serious parsing/tokenization with it.)

There is one exception: a.split() without an argument squeezes consecutive white-space, but one can argue that this is the right thing to do in that case. If you don’t want the behavior, you can always to a.split(' ').

Question 5

Having x.split(y) always return a list of 1 + x.count(y) items is a precious regularity — as @gnibbler’s already pointed out it makes split and join exact inverses of each other (as they obviously should be), it also precisely maps the semantics of all kinds of delimiter-joined records (such as csv file lines [[net of quoting issues]], lines from /etc/group in Unix, and so on), it allows (as @Roman’s answer mentioned) easy checks for (e.g.) absolute vs relative paths (in file paths and URLs), and so forth.

Another way to look at it is that you shouldn’t wantonly toss information out of the window for no gain. What would be gained in making x.split(y) equivalent to x.strip(y).split(y)? Nothing, of course — it’s easy to use the second form when that’s what you mean, but if the first form was arbitrarily deemed to mean the second one, you’d have lot of work to do when you do want the first one (which is far from rare, as the previous paragraph points out).

But really, thinking in terms of mathematical regularity is the simplest and most general way you can teach yourself to design passable APIs. To take a different example, it’s very important that for any valid x and y x == x[:y] + x[y:] — which immediately indicates why one extreme of a slicing should be excluded. The simpler the invariant assertion you can formulate, the likelier it is that the resulting semantics are what you need in real life uses — part of the mystical fact that maths is very useful in dealing with the universe.

Try formulating the invariant for a split dialect in which leading and trailing delimiters are special-cased… counter-example: string methods such as isspace are not maximally simple — x.isspace() is equivalent to x and all(c in string.whitespace for c in x) — that silly leading x and is why you so often find yourself coding not x or x.isspace(), to get back to the simplicity which should have been designed into the is... string methods (whereby an empty string “is” anything you want — contrary to man-in-the-street horse-sense, maybe [[empty sets, like zero &c, have always confused most people;-)]], but fully conforming to obvious well-refined mathematical common-sense!-).

Question 6

I’m not sure what kind of answer you’re looking for? You get three matches because you have three delimiters. If you don’t want that empty one, just use:

'/segment/segment/'.strip('/').split('/')

Question 7

Well, it lets you know there was a delimiter there. So, seeing 4 results lets you know you had 3 delimiters. This gives you the power to do whatever you want with this information, rather than having Python drop the empty elements, and then making you manually check for starting or ending delimiters if you need to know it.

Simple example: Say you want to check for absolute vs. relative filenames. This way you can do it all with the split, without also having to check what the first character of your filename is.

Question 8

Consider this minimal example:

>>> '/'.split('/')
['', '']

split must give you what’s before and after the delimiter '/', but there are no other characters. So it has to give you the empty string, which technically precedes and follows the '/', because '' + '/' + '' == '/'.

为什么在split（）结果中返回空字符串？

问题：为什么在split（）结果中返回空字符串？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

Python 流程图 — 一键转化代码为流程图

7行代码 Python热力图可视化分析缺失数据处理

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

在Python中获取命令行密码输入

如何使用open with语句打开文件

ValueError：数学域错误

如何打印字典的键？

合并两个列表并删除重复项，而不删除原始列表中的重复项

用户输入和命令行参数

为什么在split（）结果中返回空字符串？

问题：为什么在split（）结果中返回空字符串？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

相关文章

排行榜展示

文章展示