问题:为什么是string.join(list)而不是list.join(string)?
这一直使我感到困惑。看起来这样会更好:
my_list = ["Hello", "world"]
print(my_list.join("-"))
# Produce: "Hello-world"
比这个:
my_list = ["Hello", "world"]
print("-".join(my_list))
# Produce: "Hello-world"
是否有特定原因?
This has always confused me. It seems like this would be nicer:
my_list = ["Hello", "world"]
print(my_list.join("-"))
# Produce: "Hello-world"
Than this:
my_list = ["Hello", "world"]
print("-".join(my_list))
# Produce: "Hello-world"
Is there a specific reason it is like this?
回答 0
这是因为任何可迭代项都可以连接(例如,列表,元组,字典,集合),但是结果和“连接器” 必须是字符串。
例如:
'_'.join(['welcome', 'to', 'stack', 'overflow'])
'_'.join(('welcome', 'to', 'stack', 'overflow'))
'welcome_to_stack_overflow'
使用字符串以外的其他东西会引发以下错误:
TypeError:序列项0:预期的str实例,找到的int
It’s because any iterable can be joined (e.g, list, tuple, dict, set), but the result and the “joiner” must be strings.
For example:
'_'.join(['welcome', 'to', 'stack', 'overflow'])
'_'.join(('welcome', 'to', 'stack', 'overflow'))
'welcome_to_stack_overflow'
Using something else than strings will raise the following error:
TypeError: sequence item 0: expected str instance, int found
回答 1
这在String方法中进行了讨论……最终在Python-Dev中实现,并被Guido接受。该线程始于1999年6月,并str.join
包含在2000年9月发布的Python 1.6中(并支持Unicode)。Python 2.0(受支持的str
方法,包括join
)于2000年10月发布。
- 此线程中提出了四个选项:
str.join(seq)
seq.join(str)
seq.reduce(str)
join
作为内置功能
- Guido不仅希望支持
list
s,tuple
s,而且还支持所有序列/可迭代对象。
seq.reduce(str)
对于新来者来说很难。
seq.join(str)
从序列到str / unicode引入了意外的依赖关系。
join()
因为内置函数仅支持特定的数据类型。因此,使用内置的命名空间是不好的。如果join()
支持许多数据类型,则创建优化的实现将很困难,如果使用该__add__
方法实现,则为O(n²)。
- 分隔符(
sep
)不应省略。显式胜于隐式。
此线程中没有其他原因。
以下是一些其他想法(我自己和我朋友的想法):
- Unicode支持即将到来,但这不是最终的。当时,UTF-8最有可能取代UCS2 / 4。要计算UTF-8字符串的总缓冲区长度,需要知道字符编码规则。
- 那时,Python已经决定了通用的序列接口规则,用户可以在其中创建类似序列的(可迭代)类。但是Python直到2.2才支持扩展内置类型。那时,很难提供基本的可迭代类(在另一条评论中提到)。
Guido的决定记录在历史邮件中,决定str.join(seq)
:
有趣,但看起来确实正确!巴里,去吧…-
吉多·范·罗苏姆(Guido van Rossum)
This was discussed in the String methods… finally thread in the Python-Dev achive, and was accepted by Guido. This thread began in Jun 1999, and str.join
was included in Python 1.6 which was released in Sep 2000 (and supported Unicode). Python 2.0 (supported str
methods including join
) was released in Oct 2000.
- There were four options proposed in this thread:
str.join(seq)
seq.join(str)
seq.reduce(str)
join
as a built-in function
- Guido wanted to support not only
list
s, tuple
s, but all sequences/iterables.
seq.reduce(str)
is difficult for new-comers.
seq.join(str)
introduces unexpected dependency from sequences to str/unicode.
join()
as a built-in function would support only specific data types. So using a built in namespace is not good. If join()
supports many datatypes, creating optimized implementation would be difficult, if implemented using the __add__
method then it’s O(n²).
- The separator string (
sep
) should not be omitted. Explicit is better than implicit.
There are no other reasons offered in this thread.
Here are some additional thoughts (my own, and my friend’s):
- Unicode support was coming, but it was not final. At that time UTF-8 was the most likely about to replace UCS2/4. To calculate total buffer length of UTF-8 strings it needs to know character coding rule.
- At that time, Python had already decided on a common sequence interface rule where a user could create a sequence-like (iterable) class. But Python didn’t support extending built-in types until 2.2. At that time it was difficult to provide basic iterable class (which is mentioned in another comment).
Guido’s decision is recorded in a historical mail, deciding on str.join(seq)
:
Funny, but it does seem right! Barry, go for it…
–Guido van Rossum
回答 2
因为该join()
方法位于字符串类中,而不是列表类中?
我同意这看起来很有趣。
参见http://www.faqs.org/docs/diveintopython/odbchelper_join.html:
历史记录。当我第一次学习Python时,我期望join是一个列表方法,它将分隔符作为参数。很多人都有相同的感觉,join方法背后还有一个故事。在Python 1.6之前,字符串没有所有这些有用的方法。有一个单独的字符串模块,其中包含所有字符串函数。每个函数都将字符串作为第一个参数。这些功能被认为很重要,足以放在字符串本身上,这对于诸如lower,upper和split这样的功能是有意义的。但是许多铁杆Python程序员反对使用新的join方法,认为它应该是列表的方法,或者根本不应该移动,而只是保留旧字符串模块的一部分(仍然有很多方法)里面有用的东西)。
— Mark Pilgrim,深入Python
Because the join()
method is in the string class, instead of the list class?
I agree it looks funny.
See http://www.faqs.org/docs/diveintopython/odbchelper_join.html:
Historical note. When I first learned
Python, I expected join to be a method
of a list, which would take the
delimiter as an argument. Lots of
people feel the same way, and there’s
a story behind the join method. Prior
to Python 1.6, strings didn’t have all
these useful methods. There was a
separate string module which contained
all the string functions; each
function took a string as its first
argument. The functions were deemed
important enough to put onto the
strings themselves, which made sense
for functions like lower, upper, and
split. But many hard-core Python
programmers objected to the new join
method, arguing that it should be a
method of the list instead, or that it
shouldn’t move at all but simply stay
a part of the old string module (which
still has lots of useful stuff in it).
I use the new join method exclusively,
but you will see code written either
way, and if it really bothers you, you
can use the old string.join function
instead.
— Mark Pilgrim, Dive into Python
回答 3
我同意起初这是违反直觉的,但是有充分的理由。Join不能成为列表的方法,因为:
- 它也必须适用于不同的可迭代对象(元组,生成器等)
- 在不同类型的字符串之间它必须具有不同的行为。
实际上有两种连接方法(Python 3.0):
>>> b"".join
<built-in method join of bytes object at 0x00A46800>
>>> "".join
<built-in method join of str object at 0x00A28D40>
如果join是列表的一种方法,则它必须检查其参数以确定要调用的参数。而且您不能将byte和str结合在一起,因此它们现在的用法很有意义。
I agree that it’s counterintuitive at first, but there’s a good reason. Join can’t be a method of a list because:
- it must work for different iterables too (tuples, generators, etc.)
- it must have different behavior between different types of strings.
There are actually two join methods (Python 3.0):
>>> b"".join
<built-in method join of bytes object at 0x00A46800>
>>> "".join
<built-in method join of str object at 0x00A28D40>
If join was a method of a list, then it would have to inspect its arguments to decide which one of them to call. And you can’t join byte and str together, so the way they have it now makes sense.
回答 4
为什么用它string.join(list)
代替list.join(string)
?
这是因为join
是“字符串”方法!它从任何迭代创建一个字符串。如果我们将方法卡在列表中,那么当我们拥有非列表的可迭代对象时该怎么办?
如果您有一个字符串元组怎么办?如果这是一种list
方法,则必须将每个这样的字符串迭代器都转换为,list
然后才能将元素连接到单个字符串中!例如:
some_strings = ('foo', 'bar', 'baz')
让我们推出自己的列表连接方法:
class OurList(list):
def join(self, s):
return s.join(self)
并使用它,请注意,我们必须首先从每个可迭代对象创建一个列表,以将该字符串连接到该可迭代对象,从而浪费内存和处理能力:
>>> l = OurList(some_strings) # step 1, create our list
>>> l.join(', ') # step 2, use our list join method!
'foo, bar, baz'
因此,我们看到我们必须添加一个额外的步骤来使用我们的列表方法,而不仅仅是使用内置的字符串方法:
>>> ' | '.join(some_strings) # a single step!
'foo | bar | baz'
生成器性能警告
Python用于创建最终字符串的算法str.join
实际上必须传递两次迭代,因此,如果为其提供生成器表达式,则必须先将其具体化为列表,然后才能创建最终字符串。
因此,尽管绕过生成器通常比列表理解更好,但这str.join
是一个exceptions:
>>> import timeit
>>> min(timeit.repeat(lambda: ''.join(str(i) for i in range(10) if i)))
3.839168446022086
>>> min(timeit.repeat(lambda: ''.join([str(i) for i in range(10) if i])))
3.339879313018173
但是,该str.join
操作在语义上仍然是“字符串”操作,因此将其放在str
对象上而不是在其他可迭代对象上还是有意义的。
Why is it string.join(list)
instead of list.join(string)
?
This is because join
is a “string” method! It creates a string from any iterable. If we stuck the method on lists, what about when we have iterables that aren’t lists?
What if you have a tuple of strings? If this were a list
method, you would have to cast every such iterator of strings as a list
before you could join the elements into a single string! For example:
some_strings = ('foo', 'bar', 'baz')
Let’s roll our own list join method:
class OurList(list):
def join(self, s):
return s.join(self)
And to use it, note that we have to first create a list from each iterable to join the strings in that iterable, wasting both memory and processing power:
>>> l = OurList(some_strings) # step 1, create our list
>>> l.join(', ') # step 2, use our list join method!
'foo, bar, baz'
So we see we have to add an extra step to use our list method, instead of just using the builtin string method:
>>> ' | '.join(some_strings) # a single step!
'foo | bar | baz'
Performance Caveat for Generators
The algorithm Python uses to create the final string with str.join
actually has to pass over the iterable twice, so if you provide it a generator expression, it has to materialize it into a list first before it can create the final string.
Thus, while passing around generators is usually better than list comprehensions, str.join
is an exception:
>>> import timeit
>>> min(timeit.repeat(lambda: ''.join(str(i) for i in range(10) if i)))
3.839168446022086
>>> min(timeit.repeat(lambda: ''.join([str(i) for i in range(10) if i])))
3.339879313018173
Nevertheless, the str.join
operation is still semantically a “string” operation, so it still makes sense to have it on the str
object than on miscellaneous iterables.
回答 5
将其视为拆分的自然正交运算。
我明白为什么它适用于任何可迭代的,所以不能简单地执行只是在列表中。
为了提高可读性,我想用该语言查看它,但我认为这实际上是不可行的-如果可迭代性是一个接口,则可以将其添加到该接口中,但这只是一个约定,因此没有中央方法将其添加到可迭代的事物集中。
Think of it as the natural orthogonal operation to split.
I understand why it is applicable to anything iterable and so can’t easily be implemented just on list.
For readability, I’d like to see it in the language but I don’t think that is actually feasible – if iterability were an interface then it could be added to the interface but it is just a convention and so there’s no central way to add it to the set of things which are iterable.
回答 6
主要是因为a的结果someString.join()
是字符串。
序列(列表或元组等)不会出现在结果中,而只是一个字符串。因为结果是一个字符串,所以作为字符串的方法是有意义的。
Primarily because the result of a someString.join()
is a string.
The sequence (list or tuple or whatever) doesn’t appear in the result, just a string. Because the result is a string, it makes sense as a method of a string.
回答 7
-
在“-”中。join(my_list)声明您正在从列表的连接元素转换为字符串。它以结果为导向。(为便于记忆和理解)
我制作了一个methods_of_string的详尽备忘单,供您参考。
string_methonds_44 = {
'convert': ['join','split', 'rsplit','splitlines', 'partition', 'rpartition'],
'edit': ['replace', 'lstrip', 'rstrip', 'strip'],
'search': ['endswith', 'startswith', 'count', 'index', 'find','rindex', 'rfind',],
'condition': ['isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isnumeric','isidentifier',
'islower','istitle', 'isupper','isprintable', 'isspace', ],
'text': ['lower', 'upper', 'capitalize', 'title', 'swapcase',
'center', 'ljust', 'rjust', 'zfill', 'expandtabs','casefold'],
'encode': ['translate', 'maketrans', 'encode'],
'format': ['format', 'format_map']}
-
in “-“.join(my_list) declares that you are converting to a string from joining elements a list.It’s result-oriented.(just for easy memory and understanding)
I make a exhaustive cheatsheet of methods_of_string for your reference.
string_methonds_44 = {
'convert': ['join','split', 'rsplit','splitlines', 'partition', 'rpartition'],
'edit': ['replace', 'lstrip', 'rstrip', 'strip'],
'search': ['endswith', 'startswith', 'count', 'index', 'find','rindex', 'rfind',],
'condition': ['isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isnumeric','isidentifier',
'islower','istitle', 'isupper','isprintable', 'isspace', ],
'text': ['lower', 'upper', 'capitalize', 'title', 'swapcase',
'center', 'ljust', 'rjust', 'zfill', 'expandtabs','casefold'],
'encode': ['translate', 'maketrans', 'encode'],
'format': ['format', 'format_map']}
回答 8
两者都不好。
string.join(xs,delimit)表示字符串模块知道列表的存在,而列表列表却没有任何业务意义,因为字符串模块仅适用于字符串。
list.join(delimit)更好一点,因为我们习惯于将字符串作为基本类型(从语言上讲,它们是)。但是,这意味着需要动态调度连接,因为在a.split("\n")
python编译器,可能不知道a是什么,因此需要查找它(类似于vtable查找),如果您花很多时间这样做,这会很昂贵。次。
如果python运行时编译器知道列表是内置模块,则它可以跳过动态查找并将意图直接编码为字节码,否则,它需要动态地解析“ a”的“ join”,这可能是多层的每次调用的继承关系(因为两次调用之间,join的含义可能已更改,因为python是一种动态语言)。
可悲的是,这是抽象的最终缺陷。无论您选择哪种抽象,您的抽象都仅在您要解决的问题的背景下才有意义,因此,当您开始将它们胶合在一起时,您将永远无法获得与基础意识形态相一致的一致抽象而不将它们包装在与您的意识形态相符的视图中。知道了这一点,python的方法更灵活,因为它更便宜,您可以自己制作包装器或自己的预处理器,为此要花更多的钱才能使它看起来“更漂亮”。
Both are not nice.
string.join(xs, delimit) means that the string module is aware of the existence of a list, which it has no business knowing about, since the string module only works with strings.
list.join(delimit) is a bit nicer because we’re so used to strings being a fundamental type(and lingually speaking, they are). However this means that join needs to be dispatched dynamically because in the arbitrary context of a.split("\n")
the python compiler might not know what a is, and will need to look it up(analogously to vtable lookup), which is expensive if you do it a lot of times.
if the python runtime compiler knows that list is a built in module, it can skip the dynamic lookup and encode the intent into the bytecode directly, whereas otherwise it needs to dynamically resolve “join” of “a”, which may be up several layers of inheritence per call(since between calls, the meaning of join may have changed, because python is a dynamic language).
sadly, this is the ultimate flaw of abstraction; no matter what abstraction you choose, your abstraction will only make sense in the context of the problem you’re trying to solve, and as such you can never have a consistent abstraction that doesn’t become inconsistent with underlying ideologies as you start gluing them together without wrapping them in a view that is consistent with your ideology. Knowing this, python’s approach is more flexible since it’s cheaper, it’s up to you to pay more to make it look “nicer”, either by making your own wrapper, or your own preprocessor.
回答 9
变量my_list
和"-"
都是对象。具体来说,它们分别是类list
和的实例str
。该join
函数属于该类str
。因此,使用语法"-".join(my_list)
是因为对象"-"
将my_list
作为输入。
The variables my_list
and "-"
are both objects. Specifically, they’re instances of the classes list
and str
, respectively. The join
function belongs to the class str
. Therefore, the syntax "-".join(my_list)
is used because the object "-"
is taking my_list
as an input.