标签归档:Python

从Ruby学习Python;异同

问题:从Ruby学习Python;异同

我非常了解Ruby。我认为目前可能需要学习Python。对于既了解这两个概念的人,两者之间哪些概念相似,又有什么不同?

我正在寻找与我为《面向JavaScripters的学习Lua》写的入门读物相似的列表:诸如空格含义和循环构造之类的简单内容;nilPython中的名称,以及哪些值被认为是“真实的”;是惯用的使用相当于mapeach,或者是咕哝 somethingaboutlistcomprehensions MUMBLE常态?

如果我得到各种各样的答案,我很乐意将它们汇总到社区Wiki中。否则,大家可以互相搏斗,尝试创建一个真正的综合清单。

编辑:明确地说,我的目标是“适当的”和惯用的Python。如果有Python的等价物inject,但没有人使用它,因为有一种更好/不同的方法来实现迭代列表和沿途累积结果的通用功能,我想知道您的工作方式。也许我会用一个共同的目标列表来更新这个问题,这些目标是如何在Ruby中实现的,并询问在Python中等效的目标。

I know Ruby very well. I believe that I may need to learn Python presently. For those who know both, what concepts are similar between the two, and what are different?

I’m looking for a list similar to a primer I wrote for Learning Lua for JavaScripters: simple things like whitespace significance and looping constructs; the name of nil in Python, and what values are considered “truthy”; is it idiomatic to use the equivalent of map and each, or are mumble somethingaboutlistcomprehensions mumble the norm?

If I get a good variety of answers I’m happy to aggregate them into a community wiki. Or else you all can fight and crib from each other to try to create the one true comprehensive list.

Edit: To be clear, my goal is “proper” and idiomatic Python. If there is a Python equivalent of inject, but nobody uses it because there is a better/different way to achieve the common functionality of iterating a list and accumulating a result along the way, I want to know how you do things. Perhaps I’ll update this question with a list of common goals, how you achieve them in Ruby, and ask what the equivalent is in Python.


回答 0

这是我的一些主要区别:

  1. Ruby有块;Python没有。

  2. Python具有功能;Ruby没有。在Python中,您可以采用任何函数或方法并将其传递给另一个函数。在Ruby中,一切都是方法,不能直接传递方法。相反,您必须将它们包装在Proc中才能通过。

  3. Ruby和Python都支持闭包,但是方式不同。在Python中,您可以在另一个函数中定义一个函数。内部函数对外部函数的变量具有读访问权限,但对写函数没有访问权限。在Ruby中,您可以使用块定义闭包。闭包对外部作用域的变量具有完全的读取和写入访问权限。

  4. Python具有列表表达能力,非常具有表现力。例如,如果您有一个数字列表,则可以写

    [x*x for x in values if x > 15]

    要获得所有大于15的平方的新列表,在Ruby中,您必须编写以下代码:

    values.select {|v| v > 15}.map {|v| v * v}

    Ruby代码并不那么紧凑。由于它首先将values数组转换为包含值大于15的较短的中间数组,因此效率也不高。然后,它获取中间数组并生成包含中间平方的最终数组。然后将中间数组扔掉。因此,Ruby在计算过程中最终在内存中存储了3个数组。Python仅需要输入列表和结果列表。

    Python还提供了类似的地图理解。

  5. Python支持元组;露比没有 在Ruby中,您必须使用数组来模拟元组。

  6. Ruby支持switch / case语句;Python没有。

  7. Ruby支持标准的expr ? val1 : val2三元运算符。Python没有。

  8. Ruby仅支持单继承。如果您需要模拟多重继承,则可以定义模块并使用混入将模块方法拉入类。Python支持多重继承,而不是模块混合。

  9. Python仅支持单行lambda函数。Ruby块是lambda函数的一种/种类,可以任意大。因此,Ruby代码通常以比Python代码更实用的方式编写。例如,要遍历Ruby中的列表,通常

    collection.each do |value|
      ...
    end

    该块的工作原理非常类似于传递给的函数collection.each。如果要在Python中执行相同的操作,则必须定义一个命名的内部函数,然后将其传递给每个方法的集合(如果列表支持此方法):

    def some_operation(value):
      ...
    
    collection.each(some_operation)

    那不是很好。因此,通常在Python中使用以下非功能性方法:

    for value in collection:
      ...
  10. 两种语言之间以安全的方式使用资源是完全不同的。在这里,问题在于您想要分配一些资源(打开文件,获取数据库游标等),对其进行一些任意操作,然后即使发生异常也以安全的方式关闭它。

    在Ruby中,由于块非常易于使用(请参阅#9),因此通常将这种模式编码为一种方法,该方法采用块来对资源执行任意操作。

    在Python中,为任意动作传递函数会比较麻烦,因为您必须编写一个命名的内部函数(请参阅#9)。相反,Python使用with语句来安全地处理资源。请参阅如何正确清理Python对象?更多细节。

Here are some key differences to me:

  1. Ruby has blocks; Python does not.

  2. Python has functions; Ruby does not. In Python, you can take any function or method and pass it to another function. In Ruby, everything is a method, and methods can’t be directly passed. Instead, you have to wrap them in Proc’s to pass them.

  3. Ruby and Python both support closures, but in different ways. In Python, you can define a function inside another function. The inner function has read access to variables from the outer function, but not write access. In Ruby, you define closures using blocks. The closures have full read and write access to variables from the outer scope.

  4. Python has list comprehensions, which are pretty expressive. For example, if you have a list of numbers, you can write

    [x*x for x in values if x > 15]
    

    to get a new list of the squares of all values greater than 15. In Ruby, you’d have to write the following:

    values.select {|v| v > 15}.map {|v| v * v}
    

    The Ruby code doesn’t feel as compact. It’s also not as efficient since it first converts the values array into a shorter intermediate array containing the values greater than 15. Then, it takes the intermediate array and generates a final array containing the squares of the intermediates. The intermediate array is then thrown out. So, Ruby ends up with 3 arrays in memory during the computation; Python only needs the input list and the resulting list.

    Python also supplies similar map comprehensions.

  5. Python supports tuples; Ruby doesn’t. In Ruby, you have to use arrays to simulate tuples.

  6. Ruby supports switch/case statements; Python does not.

  7. Ruby supports the standard expr ? val1 : val2 ternary operator; Python does not.

  8. Ruby supports only single inheritance. If you need to mimic multiple inheritance, you can define modules and use mix-ins to pull the module methods into classes. Python supports multiple inheritance rather than module mix-ins.

  9. Python supports only single-line lambda functions. Ruby blocks, which are kind of/sort of lambda functions, can be arbitrarily big. Because of this, Ruby code is typically written in a more functional style than Python code. For example, to loop over a list in Ruby, you typically do

    collection.each do |value|
      ...
    end
    

    The block works very much like a function being passed to collection.each. If you were to do the same thing in Python, you’d have to define a named inner function and then pass that to the collection each method (if list supported this method):

    def some_operation(value):
      ...
    
    collection.each(some_operation)
    

    That doesn’t flow very nicely. So, typically the following non-functional approach would be used in Python:

    for value in collection:
      ...
    
  10. Using resources in a safe way is quite different between the two languages. Here, the problem is that you want to allocate some resource (open a file, obtain a database cursor, etc), perform some arbitrary operation on it, and then close it in a safe manner even if an exception occurs.

    In Ruby, because blocks are so easy to use (see #9), you would typically code this pattern as a method that takes a block for the arbitrary operation to perform on the resource.

    In Python, passing in a function for the arbitrary action is a little clunkier since you have to write a named, inner function (see #9). Instead, Python uses a with statement for safe resource handling. See How do I correctly clean up a Python object? for more details.


回答 1

经过6年的Ruby,我花了几个月的时间学习Python。对于这两种语言,确实没有很好的比较,所以我决定亲自写一个。现在,它主要关注的函数式编程,但既然你提到了Ruby的inject方法,我猜我们是在相同的波长。

我希望这会有所帮助:Python的“丑陋”

有几点可以使您朝正确的方向前进:

  • 您在Ruby中使用的所有函数式编程优势都在Python中,并且更加容易。例如,您可以完全按预期映射功能:

    def f(x):
        return x + 1
    
    map(f, [1, 2, 3]) # => [2, 3, 4]
  • Python没有类似的方法each。由于您仅each用于副作用,因此Python中的等效方法是for循环:

    for n in [1, 2, 3]:
        print n
  • 当a)您必须同时处理函数和对象集合,以及b)需要使用多个索引进行迭代时,列表的理解非常有用。例如,要查找字符串中的所有回文(假设您有一个p()对回文返回true 的函数),那么您只需要一个列表即可:

    s = 'string-with-palindromes-like-abbalabba'
    l = len(s)
    [s[x:y] for x in range(l) for y in range(x,l+1) if p(s[x:y])]

I’ve just spent a couple of months learning Python after 6 years of Ruby. There really was no great comparison out there for the two languages, so I decided to man up and write one myself. Now, it is mainly concerned with functional programming, but since you mention Ruby’s inject method, I’m guessing we’re on the same wavelength.

I hope this helps: The ‘ugliness’ of Python

A couple of points that will get you moving in the right direction:

  • All the functional programming goodness you use in Ruby is in Python, and it’s even easier. For example, you can map over functions exactly as you’d expect:

    def f(x):
        return x + 1
    
    map(f, [1, 2, 3]) # => [2, 3, 4]
    
  • Python doesn’t have a method that acts like each. Since you only use each for side effects, the equivalent in Python is the for loop:

    for n in [1, 2, 3]:
        print n
    
  • List comprehensions are great when a) you have to deal with functions and object collections together and b) when you need to iterate using multiple indexes. For example, to find all the palindromes in a string (assuming you have a function p() that returns true for palindromes), all you need is a single list comprehension:

    s = 'string-with-palindromes-like-abbalabba'
    l = len(s)
    [s[x:y] for x in range(l) for y in range(x,l+1) if p(s[x:y])]
    

回答 2

我的建议:不要尝试去学习差异。了解如何使用Python解决问题。就像有一个针对每种问题的Ruby方法(考虑到语言的局限性和优势,这种方法非常有效),也有一种针对该问题的Python方法。他们都是不同的。为了充分利用每种语言,您确实应该学习语言本身,而不仅仅是学习一种语言到另一种语言的“翻译”。

如此说来,两者之间的区别将帮助您更快地适应和对Python程序进行一次修改。刚开始写作就很好。但是尝试从其他项目中了解架构和设计决策背后的原因,而不是语言语义背后的原因…

My suggestion: Don’t try to learn the differences. Learn how to approach the problem in Python. Just like there’s a Ruby approach to each problem (that works very well givin the limitations and strengths of the language), there’s a Python approach to the problem. they are both different. To get the best out of each language, you really should learn the language itself, and not just the “translation” from one to the other.

Now, with that said, the difference will help you adapt faster and make 1 off modifications to a Python program. And that’s fine for a start to get writing. But try to learn from other projects the why behind the architecture and design decisions rather than the how behind the semantics of the language…


回答 3

我几乎不了解Ruby,但是这里有一些关于您提到的内容的要点:

  • nil,表示缺少值的值将是None(请注意,您可以像x is None或那样检查它x is not None,而不用==-或通过强制布尔值检查它,请参阅下一点)。
  • None零式的数字(00.00j(复数))和空集([]{}set(),空字符串"",等等)被认为falsy,一切被认为是truthy。
  • 对于副作用,(for-)显式循环。要生成一堆没有副作用的新东西,请使用列表推导(或它们的亲戚-懒惰的一次性迭代器的生成器表达式,所述集合的dict / set推导)。

关于循环:您具有for,它以可迭代(!不计算在内)进行操作,而while,它可以实现预期的效果。得益于对迭代器的广泛支持,源代码的功能要强大得多。不仅几乎所有可以成为迭代器而不是列表的东西都是迭代器(至少在Python 3中-在Python 2中,您同时拥有了这两个,默认情况下,列表是一个列表)。有很多使用迭代器的工具- zip并行迭代任意数量的可迭代对象,enumerate为您(index, item)(在任何可迭代对象上,不仅在列表中)提供切片,甚至切片可迭代对象(可能很大或无限)!我发现这些使许多循环任务变得更加简单。不用说,它们可以很好地与列表推导,生成器表达式等集成。

I know little Ruby, but here are a few bullet points about the things you mentioned:

  • nil, the value indicating lack of a value, would be None (note that you check for it like x is None or x is not None, not with == – or by coercion to boolean, see next point).
  • None, zero-esque numbers (0, 0.0, 0j (complex number)) and empty collections ([], {}, set(), the empty string "", etc.) are considered falsy, everything else is considered truthy.
  • For side effects, (for-)loop explicitly. For generating a new bunch of stuff without side-effects, use list comprehensions (or their relatives – generator expressions for lazy one-time iterators, dict/set comprehensions for the said collections).

Concerning looping: You have for, which operates on an iterable(! no counting), and while, which does what you would expect. The fromer is far more powerful, thanks to the extensive support for iterators. Not only nearly everything that can be an iterator instead of a list is an iterator (at least in Python 3 – in Python 2, you have both and the default is a list, sadly). The are numerous tools for working with iterators – zip iterates any number of iterables in parallel, enumerate gives you (index, item) (on any iterable, not just on lists), even slicing abritary (possibly large or infinite) iterables! I found that these make many many looping tasks much simpler. Needless to say, they integrate just fine with list comprehensions, generator expressions, etc.


回答 4

在Ruby中,实例变量和方法是完全不相关的,除非您将它们与attr_accessor或类似的东西显式关联。

在Python中,方法只是一种特殊的属性类:一种可执行的属性。

因此,例如:

>>> class foo:
...     x = 5
...     def y(): pass
... 
>>> f = foo()
>>> type(f.x)
<type 'int'>
>>> type(f.y)
<type 'instancemethod'>

这种差异具有很多含义,例如,引用fx引用方法对象而不是调用它。另外,如您所见,fx默认情况下是公共的,而在Ruby中,实例变量默认情况下是私有的。

In Ruby, instance variables and methods are completely unrelated, except when you explicitly relate them with attr_accessor or something like that.

In Python, methods are just a special class of attribute: one that is executable.

So for example:

>>> class foo:
...     x = 5
...     def y(): pass
... 
>>> f = foo()
>>> type(f.x)
<type 'int'>
>>> type(f.y)
<type 'instancemethod'>

That difference has a lot of implications, like for example that referring to f.x refers to the method object, rather than calling it. Also, as you can see, f.x is public by default, whereas in Ruby, instance variables are private by default.


在Python中遍历字典时,为什么必须调用.items()?

问题:在Python中遍历字典时,为什么必须调用.items()?

为什么需要调用items()以遍历字典中的键,值对?即。

dic = {'one': '1', 'two': '2'}
for k, v in dic.items():
    print(k, v)

为什么不是在字典上进行迭代的默认行为

for k, v in dic:
    print(k, v)

Why do you have to call items() to iterate over key, value pairs in a dictionary? ie.

dic = {'one': '1', 'two': '2'}
for k, v in dic.items():
    print(k, v)

Why isn’t that the default behavior of iterating over a dictionary

for k, v in dic:
    print(k, v)

回答 0

对于每个python容器C,期望是

for item in C:
    assert item in C

会顺利通过- 如果一种感觉(循环子句)与另一种感觉(存在检查)完全不同,会不会感到惊讶in?我一定会的!它自然适用于列表,集合,元组,…

因此,当C是一个字典时,如果infor循环生成键/值元组,那么,根据最小惊讶的原理,in还必须将其元组作为其包含检查中的左操作数。

那会有用吗?好看不中用的确,基本上做if (key, value) in C的代名词if C.get(key) == value-这是一张支票,我相信我可能已经执行,或要执行,100倍以上的很少比if k in C实际手段,检查钥匙的存在,完全无视值。

另一方面,只在键上循环很常见,例如:

for k in thedict:
    thedict[k] += 1

拥有价值也无济于事:

for k, v in thedict.items():
    thedict[k] = v + 1

实际上有点不太清晰和简洁。(请注意,这items是用于获取键/值对的“正确”方法的原始拼写:不幸的是,这是在此类访问器返回整个列表的时代,因此,为了支持“公正迭代”,必须引入替代拼写,并且iteritems 在-Python 3中,与以前的Python版本的向后兼容性约束被大大削弱,后来items又变成了)。

For every python container C, the expectation is that

for item in C:
    assert item in C

will pass just fine — wouldn’t you find it astonishing if one sense of in (the loop clause) had a completely different meaning from the other (the presence check)? I sure would! It naturally works that way for lists, sets, tuples, …

So, when C is a dictionary, if in were to yield key/value tuples in a for loop, then, by the principle of least astonishment, in would also have to take such a tuple as its left-hand operand in the containment check.

How useful would that be? Pretty useless indeed, basically making if (key, value) in C a synonym for if C.get(key) == value — which is a check I believe I may have performed, or wanted to perform, 100 times more rarely than what if k in C actually means, checking the presence of the key only and completely ignoring the value.

On the other hand, wanting to loop just on keys is quite common, e.g.:

for k in thedict:
    thedict[k] += 1

having the value as well would not help particularly:

for k, v in thedict.items():
    thedict[k] = v + 1

actually somewhat less clear and less concise. (Note that items was the original spelling of the “proper” methods to use to get key/value pairs: unfortunately that was back in the days when such accessors returned whole lists, so to support “just iterating” an alternative spelling had to be introduced, and iteritems it was — in Python 3, where backwards compatibility constraints with previous Python versions were much weakened, it became items again).


回答 1

我的猜测:使用完整的元组进行循环会更直观,但使用进行成员资格测试可能会更不那么直观in

if key in counts:
    counts[key] += 1
else:
    counts[key] = 1

如果您必须同时指定key和value,那么该代码将无法正常工作in。我很难想象用例,您将检查键和值是否都在字典中。仅测试密钥更为自然。

# When would you ever write a condition like this?
if (key, value) in dict:

现在,in操作员和for ... in操作员不必对相同的项目进行操作。在实现方面,它们是不同的操作(__contains__vs. __iter__)。但是,这种微小的不一致会造成一些混乱,而且不一致。

My guess: Using the full tuple would be more intuitive for looping, but perhaps less so for testing for membership using in.

if key in counts:
    counts[key] += 1
else:
    counts[key] = 1

That code wouldn’t really work if you had to specify both key and value for in. I am having a hard time imagining use case where you’d check if both the key AND value are in the dictionary. It is far more natural to only test the keys.

# When would you ever write a condition like this?
if (key, value) in dict:

Now it’s not necessary that the in operator and for ... in operate over the same items. Implementation-wise they are different operations (__contains__ vs. __iter__). But that little inconsistency would be somewhat confusing and, well, inconsistent.


在Python中执行RPC的当前选择是什么?[关闭]

问题:在Python中执行RPC的当前选择是什么?[关闭]

实际上,我已经对Pyro和RPyC做过一些工作,但是RPC的实现要比这两者更多。我们可以列出它们吗?

基于本机Python的协议:

具有许多基础协议的RPC框架:

基于JSON-RPC的框架:

肥皂:

基于XML-RPC的框架:

其他:

Actually, I’ve done some work with Pyro and RPyC, but there is more RPC implementation than these two. Can we make a list of them?

Native Python-based protocols:

RPC frameworks with a lot of underlying protocols:

JSON-RPC based frameworks:

SOAP:

XML-RPC based frameworks:

Others:


回答 0

XML-RPC是Python标准库的一部分:

XML-RPC is part of the Python standard library:


回答 1

Apache Thrift是Facebook开发的一种跨语言RPC选项。通过套接字工作,函数签名以与语言无关的方式在文本文件中定义。

Apache Thrift is a cross-language RPC option developed at Facebook. Works over sockets, function signatures are defined in text files in a language-independent way.


回答 2

自从我问了这个问题之后,我就开始使用python-symmetric-jsonrpc。它非常好,可以在python和非python软件之间使用,并遵循JSON-RPC标准。但是它缺少一些例子。

Since I’ve asked this question, I’ve started using python-symmetric-jsonrpc. It is quite good, can be used between python and non-python software and follow the JSON-RPC standard. But it lacks some examples.


回答 3

您可以尝试拉登。它一次提供多种Web服务器协议,因此您可以在客户端提供更大的灵活性。

http://pypi.python.org/pypi/ladon

You could try Ladon. It serves up multiple web server protocols at once so you can offer more flexibility at the client side.

http://pypi.python.org/pypi/ladon


回答 4

有一些尝试使SOAP与python一起使用,但是我还没有对其进行过多的测试,所以我不能说它是否好。

SOAPy是一个例子。

There are some attempts at making SOAP work with python, but I haven’t tested it much so I can’t say if it is good or not.

SOAPy is one example.


回答 5

我们正在开发Versile PythonVPy),这是新ORB / RPC框架的python 2.6+和3.x的实现。可以提供用于审查和测试的功能性AGPL开发版本。VPy通过一般的本机对象层(代码示例)具有与PyRo和RPyC类似的本机python功能。该产品专为实现Versile Platform的平台无关的远程对象交互而设计

全面披露:我为开发VPy的公司工作。

We are developing Versile Python (VPy), an implementation for python 2.6+ and 3.x of a new ORB/RPC framework. Functional AGPL dev releases for review and testing are available. VPy has native python capabilities similar to PyRo and RPyC via a general native objects layer (code example). The product is designed for platform-independent remote object interaction for implementations of Versile Platform.

Full disclosure: I work for the company developing VPy.


回答 6

也许是实现SOAP的ZSI。我使用了存根生成器,并且工作正常。我遇到的唯一问题是通过HTTPS进行SOAP。

maybe ZSI which implements SOAP. I used the stub generator and It worked properly. The only problem I encountered is about doing SOAP throught HTTPS.


回答 7

您错过了omn​​iORB。这是一个相当完整的CORBA实现,因此您也可以使用它与其他具有CORBA支持的语言进行对话。

You missed out omniORB. This is a pretty full CORBA implementation, so you can also use it to talk to other languages that have CORBA support.


类似于sprintf的Python功能

问题:类似于sprintf的Python功能

我想创建一个字符串缓冲区来进行大量处理,格式化,最后使用sprintfPython中的C样式功能将缓冲区写入文本文件中。由于条件语句,我无法将它们直接写到文件中。

例如伪代码:

sprintf(buf,"A = %d\n , B= %s\n",A,B)
/* some processing */
sprint(buf,"C=%d\n",c)
....
...
fprintf(file,buf)

所以在输出文件中,我们有这种o / p:

A= foo B= bar
C= ded
etc...

编辑,以澄清我的问题:
buf是一个大缓冲区,其中包含所有使用sprintf格式化的字符串。按照您的示例,buf将仅包含当前值,而不包含旧值。例如,buf我最初写的书A= something ,B= something后来C= something被附加在同一书中buf,但是在您的Python答案buf中仅包含最后一个值,这不是我想要的-我想拥有自开始以来所做的buf所有printf操作,例如in C

I would like to create a string buffer to do lots of processing, format and finally write the buffer in a text file using a C-style sprintf functionality in Python. Because of conditional statements, I can’t write them directly to the file.

e.g pseudo code:

sprintf(buf,"A = %d\n , B= %s\n",A,B)
/* some processing */
sprint(buf,"C=%d\n",c)
....
...
fprintf(file,buf)

So in the output file we have this kind of o/p:

A= foo B= bar
C= ded
etc...

Edit, to clarify my question:
buf is a big buffer contains all these strings which have formatted using sprintf. Going by your examples, buf will only contain current values, not older ones. e.g first in buf I wrote A= something ,B= something later C= something was appended in the same buf, but in your Python answers buf contains only last value, which is not I want – I want buf to have all the printfs I have done since the beginning, like in C.


回答 0

Python %为此提供了一个运算符。

>>> a = 5
>>> b = "hello"
>>> buf = "A = %d\n , B = %s\n" % (a, b)
>>> print buf
A = 5
 , B = hello

>>> c = 10
>>> buf = "C = %d\n" % c
>>> print buf
C = 10

有关所有受支持的格式说明符,请参见此参考

您也可以使用format

>>> print "This is the {}th tome of {}".format(5, "knowledge")
This is the 5th tome of knowledge

Python has a % operator for this.

>>> a = 5
>>> b = "hello"
>>> buf = "A = %d\n , B = %s\n" % (a, b)
>>> print buf
A = 5
 , B = hello

>>> c = 10
>>> buf = "C = %d\n" % c
>>> print buf
C = 10

See this reference for all supported format specifiers.

You could as well use format:

>>> print "This is the {}th tome of {}".format(5, "knowledge")
This is the 5th tome of knowledge

回答 1

如果我正确理解了您的问题,那么format()就是您所要的东西,以及它的迷你语言

python 2.7及更高版本的愚蠢示例:

>>> print "{} ...\r\n {}!".format("Hello", "world")
Hello ...
 world!

对于早期的python版本:(已通过2.6.2测试)

>>> print "{0} ...\r\n {1}!".format("Hello", "world")
Hello ...
 world!

If I understand your question correctly, format() is what you are looking for, along with its mini-language.

Silly example for python 2.7 and up:

>>> print "{} ...\r\n {}!".format("Hello", "world")
Hello ...
 world!

For earlier python versions: (tested with 2.6.2)

>>> print "{0} ...\r\n {1}!".format("Hello", "world")
Hello ...
 world!

回答 2

我并不完全确定我了解您的目标,但是您可以将StringIO实例用作缓冲区:

>>> import StringIO 
>>> buf = StringIO.StringIO()
>>> buf.write("A = %d, B = %s\n" % (3, "bar"))
>>> buf.write("C=%d\n" % 5)
>>> print(buf.getvalue())
A = 3, B = bar
C=5

与不同sprintf,您只需将字符串传递给buf.write,即可使用%运算符或format字符串方法对其进行格式化。

您当然可以定义一个函数来获取sprintf您希望的接口:

def sprintf(buf, fmt, *args):
    buf.write(fmt % args)

可以这样使用:

>>> buf = StringIO.StringIO()
>>> sprintf(buf, "A = %d, B = %s\n", 3, "foo")
>>> sprintf(buf, "C = %d\n", 5)
>>> print(buf.getvalue())
A = 3, B = foo
C = 5

I’m not completely certain that I understand your goal, but you can use a StringIO instance as a buffer:

>>> import StringIO 
>>> buf = StringIO.StringIO()
>>> buf.write("A = %d, B = %s\n" % (3, "bar"))
>>> buf.write("C=%d\n" % 5)
>>> print(buf.getvalue())
A = 3, B = bar
C=5

Unlike sprintf, you just pass a string to buf.write, formatting it with the % operator or the format method of strings.

You could of course define a function to get the sprintf interface you’re hoping for:

def sprintf(buf, fmt, *args):
    buf.write(fmt % args)

which would be used like this:

>>> buf = StringIO.StringIO()
>>> sprintf(buf, "A = %d, B = %s\n", 3, "foo")
>>> sprintf(buf, "C = %d\n", 5)
>>> print(buf.getvalue())
A = 3, B = foo
C = 5

回答 3

使用格式运算符%

buf = "A = %d\n , B= %s\n" % (a, b)
print >>f, buf

Use the formatting operator %:

buf = "A = %d\n , B= %s\n" % (a, b)
print >>f, buf

回答 4

您可以使用字符串格式:

>>> a=42
>>> b="bar"
>>> "The number is %d and the word is %s" % (a,b)
'The number is 42 and the word is bar'

但这已在Python 3中删除,您应该使用“ str.format()”:

>>> a=42
>>> b="bar"
>>> "The number is {0} and the word is {1}".format(a,b)
'The number is 42 and the word is bar'

You can use string formatting:

>>> a=42
>>> b="bar"
>>> "The number is %d and the word is %s" % (a,b)
'The number is 42 and the word is bar'

But this is removed in Python 3, you should use “str.format()”:

>>> a=42
>>> b="bar"
>>> "The number is {0} and the word is {1}".format(a,b)
'The number is 42 and the word is bar'

回答 5

要插入很长的字符串,最好为不同的参数使用名称,而不是希望它们放在正确的位置。这也使替换多次重复变得更容易。

>>> 'Coordinates: {latitude}, {longitude}'.format(latitude='37.24N', longitude='-115.81W')
'Coordinates: 37.24N, -115.81W'

取自“ 格式”示例,其中Format还显示了所有其他与之相关的答案。

To insert into a very long string it is nice to use names for the different arguments, instead of hoping they are in the right positions. This also makes it easier to replace multiple recurrences.

>>> 'Coordinates: {latitude}, {longitude}'.format(latitude='37.24N', longitude='-115.81W')
'Coordinates: 37.24N, -115.81W'

Taken from Format examples, where all the other Format-related answers are also shown.


回答 6

这可能是从C代码到Python代码最接近的翻译。

A = 1
B = "hello"
buf = "A = %d\n , B= %s\n" % (A, B)

c = 2
buf += "C=%d\n" % c

f = open('output.txt', 'w')
print >> f, c
f.close()

%Python中的运算符几乎执行与C相同的操作sprintf。您也可以将字符串直接打印到文件中。如果涉及许多此类字符串格式的字符串,那么明智的做法是使用StringIO对象来加快处理时间。

因此+=,不要这样做,而是这样做:

import cStringIO
buf = cStringIO.StringIO()

...

print >> buf, "A = %d\n , B= %s\n" % (A, B)

...

print >> buf, "C=%d\n" % c

...

print >> f, buf.getvalue()

This is probably the closest translation from your C code to Python code.

A = 1
B = "hello"
buf = "A = %d\n , B= %s\n" % (A, B)

c = 2
buf += "C=%d\n" % c

f = open('output.txt', 'w')
print >> f, c
f.close()

The % operator in Python does almost exactly the same thing as C’s sprintf. You can also print the string to a file directly. If there are lots of these string formatted stringlets involved, it might be wise to use a StringIO object to speed up processing time.

So instead of doing +=, do this:

import cStringIO
buf = cStringIO.StringIO()

...

print >> buf, "A = %d\n , B= %s\n" % (A, B)

...

print >> buf, "C=%d\n" % c

...

print >> f, buf.getvalue()

回答 7

如果您想要类似python3打印功能的东西,但是想要一个字符串:

def sprint(*args, **kwargs):
    sio = io.StringIO()
    print(*args, **kwargs, file=sio)
    return sio.getvalue()
>>> x = sprint('abc', 10, ['one', 'two'], {'a': 1, 'b': 2}, {1, 2, 3})
>>> x
"abc 10 ['one', 'two'] {'a': 1, 'b': 2} {1, 2, 3}\n"

'\n'结尾没有:

def sprint(*args, end='', **kwargs):
    sio = io.StringIO()
    print(*args, **kwargs, end=end, file=sio)
    return sio.getvalue()
>>> x = sprint('abc', 10, ['one', 'two'], {'a': 1, 'b': 2}, {1, 2, 3})
>>> x
"abc 10 ['one', 'two'] {'a': 1, 'b': 2} {1, 2, 3}"

If you want something like the python3 print function but to a string:

def sprint(*args, **kwargs):
    sio = io.StringIO()
    print(*args, **kwargs, file=sio)
    return sio.getvalue()
>>> x = sprint('abc', 10, ['one', 'two'], {'a': 1, 'b': 2}, {1, 2, 3})
>>> x
"abc 10 ['one', 'two'] {'a': 1, 'b': 2} {1, 2, 3}\n"

or without the '\n' at the end:

def sprint(*args, end='', **kwargs):
    sio = io.StringIO()
    print(*args, **kwargs, end=end, file=sio)
    return sio.getvalue()
>>> x = sprint('abc', 10, ['one', 'two'], {'a': 1, 'b': 2}, {1, 2, 3})
>>> x
"abc 10 ['one', 'two'] {'a': 1, 'b': 2} {1, 2, 3}"

回答 8

就像是…

greetings = 'Hello {name}'.format(name = 'John')

Hello John

Something like…

greetings = 'Hello {name}'.format(name = 'John')

Hello John

回答 9

Take a look at “Literal String Interpolation” https://www.python.org/dev/peps/pep-0498/

I found it through the http://www.malemburg.com/


回答 10

两种方法是写入字符串缓冲区或将行写入列表,然后再将它们连接。我认为该StringIO方法更具pythonic功能,但在python 2.6之前不起作用。

from io import StringIO

with StringIO() as s:
   print("Hello", file=s)
   print("Goodbye", file=s)
   # And later...
   with open('myfile', 'w') as f:
       f.write(s.getvalue())

您也可以不使用ContextManangers = StringIO())使用它们。当前,我正在使用带有print函数的上下文管理器类。为了能够插入调试或奇数页的需求,此片段可能很有用:

class Report:
    ... usual init/enter/exit
    def print(self, *args, **kwargs):
        with StringIO() as s:
            print(*args, **kwargs, file=s)
            out = s.getvalue()
        ... stuff with out

with Report() as r:
   r.print(f"This is {datetime.date.today()}!", 'Yikes!', end=':')

Two approaches are to write to a string buffer or to write lines to a list and join them later. I think the StringIO approach is more pythonic, but didn’t work before Python 2.6.

from io import StringIO

with StringIO() as s:
   print("Hello", file=s)
   print("Goodbye", file=s)
   # And later...
   with open('myfile', 'w') as f:
       f.write(s.getvalue())

You can also use these without a ContextMananger (s = StringIO()). Currently, I’m using a context manager class with a print function. This fragment might be useful to be able to insert debugging or odd paging requirements:

class Report:
    ... usual init/enter/exit
    def print(self, *args, **kwargs):
        with StringIO() as s:
            print(*args, **kwargs, file=s)
            out = s.getvalue()
        ... stuff with out

with Report() as r:
   r.print(f"This is {datetime.date.today()}!", 'Yikes!', end=':')

熊猫将数据框转换为元组数组

问题:熊猫将数据框转换为元组数组

我已经使用熊猫处理了一些数据,现在我想将批处理保存回数据库。这要求我将数据帧转换为元组数组,每个元组都对应于数据帧的“行”。

我的DataFrame看起来像:

In [182]: data_set
Out[182]: 
  index data_date   data_1  data_2
0  14303 2012-02-17  24.75   25.03 
1  12009 2012-02-16  25.00   25.07 
2  11830 2012-02-15  24.99   25.15 
3  6274  2012-02-14  24.68   25.05 
4  2302  2012-02-13  24.62   24.77 
5  14085 2012-02-10  24.38   24.61 

我想将其转换为元组数组,例如:

[(datetime.date(2012,2,17),24.75,25.03),
(datetime.date(2012,2,16),25.00,25.07),
...etc. ]

关于如何有效执行此操作的任何建议?

I have manipulated some data using pandas and now I want to carry out a batch save back to the database. This requires me to convert the dataframe into an array of tuples, with each tuple corresponding to a “row” of the dataframe.

My DataFrame looks something like:

In [182]: data_set
Out[182]: 
  index data_date   data_1  data_2
0  14303 2012-02-17  24.75   25.03 
1  12009 2012-02-16  25.00   25.07 
2  11830 2012-02-15  24.99   25.15 
3  6274  2012-02-14  24.68   25.05 
4  2302  2012-02-13  24.62   24.77 
5  14085 2012-02-10  24.38   24.61 

I want to convert it to an array of tuples like:

[(datetime.date(2012,2,17),24.75,25.03),
(datetime.date(2012,2,16),25.00,25.07),
...etc. ]

Any suggestion on how I can efficiently do this?


回答 0

怎么样:

subset = data_set[['data_date', 'data_1', 'data_2']]
tuples = [tuple(x) for x in subset.to_numpy()]

大熊猫<0.24使用

tuples = [tuple(x) for x in subset.values]

How about:

subset = data_set[['data_date', 'data_1', 'data_2']]
tuples = [tuple(x) for x in subset.to_numpy()]

for pandas < 0.24 use

tuples = [tuple(x) for x in subset.values]

回答 1

list(data_set.itertuples(index=False))

从17.1开始,以上代码将返回namedtuples列表

如果需要普通元组的列表,请name=None作为参数传递:

list(data_set.itertuples(index=False, name=None))
list(data_set.itertuples(index=False))

As of 17.1, the above will return a list of namedtuples.

If you want a list of ordinary tuples, pass name=None as an argument:

list(data_set.itertuples(index=False, name=None))

回答 2

通用方式:

[tuple(x) for x in data_set.to_records(index=False)]

A generic way:

[tuple(x) for x in data_set.to_records(index=False)]

回答 3

动机
许多数据集足够大,我们需要关注自身的速度/效率。因此,我本着这种精神提供此解决方案。它恰好也是简洁的。

为了比较,让我们删除该index

df = data_set.drop('index', 1)

解决方案
我将建议使用zipmap

list(zip(*map(df.get, df)))

[('2012-02-17', 24.75, 25.03),
 ('2012-02-16', 25.0, 25.07),
 ('2012-02-15', 24.99, 25.15),
 ('2012-02-14', 24.68, 25.05),
 ('2012-02-13', 24.62, 24.77),
 ('2012-02-10', 24.38, 24.61)]

如果我们要处理特定的列子集,它也很灵活。我们假设已经显示的列是我们想要的子集。

list(zip(*map(df.get, ['data_date', 'data_1', 'data_2'])))

[('2012-02-17', 24.75, 25.03),
 ('2012-02-16', 25.0, 25.07),
 ('2012-02-15', 24.99, 25.15),
 ('2012-02-14', 24.68, 25.05),
 ('2012-02-13', 24.62, 24.77),
 ('2012-02-10', 24.38, 24.61)]

什么是更快?

转弯records最快,然后渐近收敛zipmapiter_tuples

我将使用simple_benchmarks这篇文章中获得的库

from simple_benchmark import BenchmarkBuilder
b = BenchmarkBuilder()

import pandas as pd
import numpy as np

def tuple_comp(df): return [tuple(x) for x in df.to_numpy()]
def iter_namedtuples(df): return list(df.itertuples(index=False))
def iter_tuples(df): return list(df.itertuples(index=False, name=None))
def records(df): return df.to_records(index=False).tolist()
def zipmap(df): return list(zip(*map(df.get, df)))

funcs = [tuple_comp, iter_namedtuples, iter_tuples, records, zipmap]
for func in funcs:
    b.add_function()(func)

def creator(n):
    return pd.DataFrame({"A": random.randint(n, size=n), "B": random.randint(n, size=n)})

@b.add_arguments('Rows in DataFrame')
def argument_provider():
    for n in (10 ** (np.arange(4, 11) / 2)).astype(int):
        yield n, creator(n)

r = b.run()

检查结果

r.to_pandas_dataframe().pipe(lambda d: d.div(d.min(1), 0))

        tuple_comp  iter_namedtuples  iter_tuples   records    zipmap
100       2.905662          6.626308     3.450741  1.469471  1.000000
316       4.612692          4.814433     2.375874  1.096352  1.000000
1000      6.513121          4.106426     1.958293  1.000000  1.316303
3162      8.446138          4.082161     1.808339  1.000000  1.533605
10000     8.424483          3.621461     1.651831  1.000000  1.558592
31622     7.813803          3.386592     1.586483  1.000000  1.515478
100000    7.050572          3.162426     1.499977  1.000000  1.480131

r.plot()

Motivation
Many data sets are large enough that we need to concern ourselves with speed/efficiency. So I offer this solution in that spirit. It happens to also be succinct.

For the sake of comparison, let’s drop the index column

df = data_set.drop('index', 1)

Solution
I’ll propose the use of zip and map

list(zip(*map(df.get, df)))

[('2012-02-17', 24.75, 25.03),
 ('2012-02-16', 25.0, 25.07),
 ('2012-02-15', 24.99, 25.15),
 ('2012-02-14', 24.68, 25.05),
 ('2012-02-13', 24.62, 24.77),
 ('2012-02-10', 24.38, 24.61)]

It happens to also be flexible if we wanted to deal with a specific subset of columns. We’ll assume the columns we’ve already displayed are the subset we want.

list(zip(*map(df.get, ['data_date', 'data_1', 'data_2'])))

[('2012-02-17', 24.75, 25.03),
 ('2012-02-16', 25.0, 25.07),
 ('2012-02-15', 24.99, 25.15),
 ('2012-02-14', 24.68, 25.05),
 ('2012-02-13', 24.62, 24.77),
 ('2012-02-10', 24.38, 24.61)]

What is Quicker?

Turn’s out records is quickest followed by asymptotically converging zipmap and iter_tuples

I’ll use a library simple_benchmarks that I got from this post

from simple_benchmark import BenchmarkBuilder
b = BenchmarkBuilder()

import pandas as pd
import numpy as np

def tuple_comp(df): return [tuple(x) for x in df.to_numpy()]
def iter_namedtuples(df): return list(df.itertuples(index=False))
def iter_tuples(df): return list(df.itertuples(index=False, name=None))
def records(df): return df.to_records(index=False).tolist()
def zipmap(df): return list(zip(*map(df.get, df)))

funcs = [tuple_comp, iter_namedtuples, iter_tuples, records, zipmap]
for func in funcs:
    b.add_function()(func)

def creator(n):
    return pd.DataFrame({"A": random.randint(n, size=n), "B": random.randint(n, size=n)})

@b.add_arguments('Rows in DataFrame')
def argument_provider():
    for n in (10 ** (np.arange(4, 11) / 2)).astype(int):
        yield n, creator(n)

r = b.run()

Check the results

r.to_pandas_dataframe().pipe(lambda d: d.div(d.min(1), 0))

        tuple_comp  iter_namedtuples  iter_tuples   records    zipmap
100       2.905662          6.626308     3.450741  1.469471  1.000000
316       4.612692          4.814433     2.375874  1.096352  1.000000
1000      6.513121          4.106426     1.958293  1.000000  1.316303
3162      8.446138          4.082161     1.808339  1.000000  1.533605
10000     8.424483          3.621461     1.651831  1.000000  1.558592
31622     7.813803          3.386592     1.586483  1.000000  1.515478
100000    7.050572          3.162426     1.499977  1.000000  1.480131

r.plot()


回答 4

这是一种向量化方法(假设将数据帧data_set定义为df),它返回的listof tuples,如下所示:

>>> df.set_index(['data_date'])[['data_1', 'data_2']].to_records().tolist()

生成:

[(datetime.datetime(2012, 2, 17, 0, 0), 24.75, 25.03),
 (datetime.datetime(2012, 2, 16, 0, 0), 25.0, 25.07),
 (datetime.datetime(2012, 2, 15, 0, 0), 24.99, 25.15),
 (datetime.datetime(2012, 2, 14, 0, 0), 24.68, 25.05),
 (datetime.datetime(2012, 2, 13, 0, 0), 24.62, 24.77),
 (datetime.datetime(2012, 2, 10, 0, 0), 24.38, 24.61)]

将datetime列设置为索引轴的想法是,通过对数据帧使用其中的参数来帮助将Timestamp值转换为其对应的datetime.datetime等效格式。convert_datetime64DF.to_recordsDateTimeIndex

这会返回recarray,然后可以将其返回给listusing.tolist


根据用例,更通用的解决方案是:

df.to_records().tolist()                              # Supply index=False to exclude index

Here’s a vectorized approach (assuming the dataframe, data_set to be defined as df instead) that returns a list of tuples as shown:

>>> df.set_index(['data_date'])[['data_1', 'data_2']].to_records().tolist()

produces:

[(datetime.datetime(2012, 2, 17, 0, 0), 24.75, 25.03),
 (datetime.datetime(2012, 2, 16, 0, 0), 25.0, 25.07),
 (datetime.datetime(2012, 2, 15, 0, 0), 24.99, 25.15),
 (datetime.datetime(2012, 2, 14, 0, 0), 24.68, 25.05),
 (datetime.datetime(2012, 2, 13, 0, 0), 24.62, 24.77),
 (datetime.datetime(2012, 2, 10, 0, 0), 24.38, 24.61)]

The idea of setting datetime column as the index axis is to aid in the conversion of the Timestamp value to it’s corresponding datetime.datetime format equivalent by making use of the convert_datetime64 argument in DF.to_records which does so for a DateTimeIndex dataframe.

This returns a recarray which could be then made to return a list using .tolist


More generalized solution depending on the use case would be:

df.to_records().tolist()                              # Supply index=False to exclude index

回答 5

最有效,最简单的方法:

list(data_set.to_records())

您可以在此调用之前过滤所需的列。

The most efficient and easy way:

list(data_set.to_records())

You can filter the columns you need before this call.


回答 6

该答案不会添加尚未讨论的任何答案,但是这里提供了一些速度结果。我认为这应该可以解决评论中出现的问题。所有这些看起来都像是O(n)基于这三个值,。

TL; DRtuples = list(df.itertuples(index=False, name=None))tuples = list(zip(*[df[c].values.tolist() for c in df]))并列最快。

我对结果进行了快速速度测试,得出以下三个建议:

  1. @pirsquared的zip答案: tuples = list(zip(*[df[c].values.tolist() for c in df]))
  2. @ wes-mckinney接受的答案: tuples = [tuple(x) for x in df.values]
  3. itertuples来自@ksindi的答案以及来自@Axel的name=None建议:tuples = list(df.itertuples(index=False, name=None))
from numpy import random
import pandas as pd


def create_random_df(n):
    return pd.DataFrame({"A": random.randint(n, size=n), "B": random.randint(n, size=n)})

小尺寸:

df = create_random_df(10000)
%timeit tuples = list(zip(*[df[c].values.tolist() for c in df]))
%timeit tuples = [tuple(x) for x in df.values]
%timeit tuples = list(df.itertuples(index=False, name=None))

给出:

1.66 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
15.5 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.74 ms ± 75.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

较大:

df = create_random_df(1000000)
%timeit tuples = list(zip(*[df[c].values.tolist() for c in df]))
%timeit tuples = [tuple(x) for x in df.values]
%timeit tuples = list(df.itertuples(index=False, name=None))

给出:

202 ms ± 5.91 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
1.52 s ± 98.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
209 ms ± 11.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

尽我所能:

df = create_random_df(10000000)
%timeit tuples = list(zip(*[df[c].values.tolist() for c in df]))
%timeit tuples = [tuple(x) for x in df.values]
%timeit tuples = list(df.itertuples(index=False, name=None))

给出:

1.78 s ± 118 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
15.4 s ± 222 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.68 s ± 96.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

zip版本和itertuples版本彼此在置信区间内。我怀疑他们在幕后做着同样的事情。

这些速度测试可能无关紧要。突破计算机内存的限制并不需要花费大量时间,并且您实际上不应该对大型数据集执行此操作。完成这些操作后,使用这些元组将最终效率低下。这不太可能成为代码中的主要瓶颈,因此请坚持使用您认为最易读的版本。

This answer doesn’t add any answers that aren’t already discussed, but here are some speed results. I think this should resolve questions that came up in the comments. All of these look like they are O(n), based on these three values.

TL;DR: tuples = list(df.itertuples(index=False, name=None)) and tuples = list(zip(*[df[c].values.tolist() for c in df])) are tied for the fastest.

I did a quick speed test on results for three suggestions here:

  1. The zip answer from @pirsquared: tuples = list(zip(*[df[c].values.tolist() for c in df]))
  2. The accepted answer from @wes-mckinney: tuples = [tuple(x) for x in df.values]
  3. The itertuples answer from @ksindi with the name=None suggestion from @Axel: tuples = list(df.itertuples(index=False, name=None))
from numpy import random
import pandas as pd


def create_random_df(n):
    return pd.DataFrame({"A": random.randint(n, size=n), "B": random.randint(n, size=n)})

Small size:

df = create_random_df(10000)
%timeit tuples = list(zip(*[df[c].values.tolist() for c in df]))
%timeit tuples = [tuple(x) for x in df.values]
%timeit tuples = list(df.itertuples(index=False, name=None))

Gives:

1.66 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
15.5 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.74 ms ± 75.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Larger:

df = create_random_df(1000000)
%timeit tuples = list(zip(*[df[c].values.tolist() for c in df]))
%timeit tuples = [tuple(x) for x in df.values]
%timeit tuples = list(df.itertuples(index=False, name=None))

Gives:

202 ms ± 5.91 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
1.52 s ± 98.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
209 ms ± 11.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

As much patience as I have:

df = create_random_df(10000000)
%timeit tuples = list(zip(*[df[c].values.tolist() for c in df]))
%timeit tuples = [tuple(x) for x in df.values]
%timeit tuples = list(df.itertuples(index=False, name=None))

Gives:

1.78 s ± 118 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
15.4 s ± 222 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.68 s ± 96.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The zip version and the itertuples version are within the confidence intervals each other. I suspect that they are doing the same thing under the hood.

These speed tests are probably irrelevant though. Pushing the limits of my computer’s memory doesn’t take a huge amount of time, and you really shouldn’t be doing this on a large data set. Working with those tuples after doing this will end up being really inefficient. It’s unlikely to be a major bottleneck in your code, so just stick with the version you think is most readable.


回答 7

#try this one:

tuples = list(zip(data_set["data_date"], data_set["data_1"],data_set["data_2"]))
print (tuples)
#try this one:

tuples = list(zip(data_set["data_date"], data_set["data_1"],data_set["data_2"]))
print (tuples)

回答 8

将数据框架列表更改为元组列表。

df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
print(df)
OUTPUT
   col1  col2
0     1     4
1     2     5
2     3     6

records = df.to_records(index=False)
result = list(records)
print(result)
OUTPUT
[(1, 4), (2, 5), (3, 6)]

Changing the data frames list into a list of tuples.

df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
print(df)
OUTPUT
   col1  col2
0     1     4
1     2     5
2     3     6

records = df.to_records(index=False)
result = list(records)
print(result)
OUTPUT
[(1, 4), (2, 5), (3, 6)]

回答 9

更多pythonic方式:

df = data_set[['data_date', 'data_1', 'data_2']]
map(tuple,df.values)

More pythonic way:

df = data_set[['data_date', 'data_1', 'data_2']]
map(tuple,df.values)

如何使用boto3将S3对象保存到文件

问题:如何使用boto3将S3对象保存到文件

我正在尝试使用适用于AWS的新boto3客户端做一个“ hello world” 。

我的用例非常简单:从S3获取对象并将其保存到文件中。

在boto 2.XI中,它应该是这样的:

import boto
key = boto.connect_s3().get_bucket('foo').get_key('foo')
key.get_contents_to_filename('/tmp/foo')

在boto 3中。我找不到一种干净的方法来做同样的事情,所以我手动遍历了“ Streaming”对象:

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    chunk = key['Body'].read(1024*8)
    while chunk:
        f.write(chunk)
        chunk = key['Body'].read(1024*8)

要么

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    for chunk in iter(lambda: key['Body'].read(4096), b''):
        f.write(chunk)

而且效果很好。我想知道是否有任何“本机” boto3函数可以完成相同的任务?

I’m trying to do a “hello world” with new boto3 client for AWS.

The use-case I have is fairly simple: get object from S3 and save it to the file.

In boto 2.X I would do it like this:

import boto
key = boto.connect_s3().get_bucket('foo').get_key('foo')
key.get_contents_to_filename('/tmp/foo')

In boto 3 . I can’t find a clean way to do the same thing, so I’m manually iterating over the “Streaming” object:

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    chunk = key['Body'].read(1024*8)
    while chunk:
        f.write(chunk)
        chunk = key['Body'].read(1024*8)

or

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    for chunk in iter(lambda: key['Body'].read(4096), b''):
        f.write(chunk)

And it works fine. I was wondering is there any “native” boto3 function that will do the same task?


回答 0

Boto3最近有一项自定义功能,可以帮助您(其中包括其他方面)。当前,它在低级S3客户端上公开,可以这样使用:

s3_client = boto3.client('s3')
open('hello.txt').write('Hello, world!')

# Upload the file to S3
s3_client.upload_file('hello.txt', 'MyBucket', 'hello-remote.txt')

# Download the file from S3
s3_client.download_file('MyBucket', 'hello-remote.txt', 'hello2.txt')
print(open('hello2.txt').read())

这些功能将自动处理读/写文件,以及并行并行处理大文件。

请注意,s3_client.download_file不会创建目录。可以将其创建为pathlib.Path('/path/to/file.txt').parent.mkdir(parents=True, exist_ok=True)

There is a customization that went into Boto3 recently which helps with this (among other things). It is currently exposed on the low-level S3 client, and can be used like this:

s3_client = boto3.client('s3')
open('hello.txt').write('Hello, world!')

# Upload the file to S3
s3_client.upload_file('hello.txt', 'MyBucket', 'hello-remote.txt')

# Download the file from S3
s3_client.download_file('MyBucket', 'hello-remote.txt', 'hello2.txt')
print(open('hello2.txt').read())

These functions will automatically handle reading/writing files as well as doing multipart uploads in parallel for large files.

Note that s3_client.download_file won’t create a directory. It can be created as pathlib.Path('/path/to/file.txt').parent.mkdir(parents=True, exist_ok=True).


回答 1

boto3现在具有比客户端更好的界面:

resource = boto3.resource('s3')
my_bucket = resource.Bucket('MyBucket')
my_bucket.download_file(key, local_filename)

就其本身而言,它并没有比client接受的答案好得多(尽管文档说它在失败时重试上载和下载做得更好),但考虑到资源通常更符合人体工程学(例如,s3 存储桶对象资源)比客户端方法更好),这确实使您可以停留在资源层而不必下拉。

Resources 通常,可以使用与客户端相同的方式来创建它们,并且它们采用全部或大部分相同的参数,然后将其转发给其内部客户端。

boto3 now has a nicer interface than the client:

resource = boto3.resource('s3')
my_bucket = resource.Bucket('MyBucket')
my_bucket.download_file(key, local_filename)

This by itself isn’t tremendously better than the client in the accepted answer (although the docs say that it does a better job retrying uploads and downloads on failure) but considering that resources are generally more ergonomic (for example, the s3 bucket and object resources are nicer than the client methods) this does allow you to stay at the resource layer without having to drop down.

Resources generally can be created in the same way as clients, and they take all or most of the same arguments and just forward them to their internal clients.


回答 2

对于那些想模拟set_contents_from_string类似boto2方法的人,您可以尝试

import boto3
from cStringIO import StringIO

s3c = boto3.client('s3')
contents = 'My string to save to S3 object'
target_bucket = 'hello-world.by.vor'
target_file = 'data/hello.txt'
fake_handle = StringIO(contents)

# notice if you do fake_handle.read() it reads like a file handle
s3c.put_object(Bucket=target_bucket, Key=target_file, Body=fake_handle.read())

对于Python3:

在python3中,StringIO和cStringIO都消失了StringIO像这样使用导入:

from io import StringIO

要同时支持两个版本:

try:
   from StringIO import StringIO
except ImportError:
   from io import StringIO

For those of you who would like to simulate the set_contents_from_string like boto2 methods, you can try

import boto3
from cStringIO import StringIO

s3c = boto3.client('s3')
contents = 'My string to save to S3 object'
target_bucket = 'hello-world.by.vor'
target_file = 'data/hello.txt'
fake_handle = StringIO(contents)

# notice if you do fake_handle.read() it reads like a file handle
s3c.put_object(Bucket=target_bucket, Key=target_file, Body=fake_handle.read())

For Python3:

In python3 both StringIO and cStringIO are gone. Use the StringIO import like:

from io import StringIO

To support both version:

try:
   from StringIO import StringIO
except ImportError:
   from io import StringIO

回答 3

# Preface: File is json with contents: {'name': 'Android', 'status': 'ERROR'}

import boto3
import io

s3 = boto3.resource('s3')

obj = s3.Object('my-bucket', 'key-to-file.json')
data = io.BytesIO()
obj.download_fileobj(data)

# object is now a bytes string, Converting it to a dict:
new_dict = json.loads(data.getvalue().decode("utf-8"))

print(new_dict['status']) 
# Should print "Error"
# Preface: File is json with contents: {'name': 'Android', 'status': 'ERROR'}

import boto3
import io

s3 = boto3.resource('s3')

obj = s3.Object('my-bucket', 'key-to-file.json')
data = io.BytesIO()
obj.download_fileobj(data)

# object is now a bytes string, Converting it to a dict:
new_dict = json.loads(data.getvalue().decode("utf-8"))

print(new_dict['status']) 
# Should print "Error"

回答 4

当您想要读取与默认配置不同的文件时,请mpu.aws.s3_download(s3path, destination)直接使用或复制粘贴的代码:

def s3_download(source, destination,
                exists_strategy='raise',
                profile_name=None):
    """
    Copy a file from an S3 source to a local destination.

    Parameters
    ----------
    source : str
        Path starting with s3://, e.g. 's3://bucket-name/key/foo.bar'
    destination : str
    exists_strategy : {'raise', 'replace', 'abort'}
        What is done when the destination already exists?
    profile_name : str, optional
        AWS profile

    Raises
    ------
    botocore.exceptions.NoCredentialsError
        Botocore is not able to find your credentials. Either specify
        profile_name or add the environment variables AWS_ACCESS_KEY_ID,
        AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN.
        See https://boto3.readthedocs.io/en/latest/guide/configuration.html
    """
    exists_strategies = ['raise', 'replace', 'abort']
    if exists_strategy not in exists_strategies:
        raise ValueError('exists_strategy \'{}\' is not in {}'
                         .format(exists_strategy, exists_strategies))
    session = boto3.Session(profile_name=profile_name)
    s3 = session.resource('s3')
    bucket_name, key = _s3_path_split(source)
    if os.path.isfile(destination):
        if exists_strategy is 'raise':
            raise RuntimeError('File \'{}\' already exists.'
                               .format(destination))
        elif exists_strategy is 'abort':
            return
    s3.Bucket(bucket_name).download_file(key, destination)

from collections import namedtuple

S3Path = namedtuple("S3Path", ["bucket_name", "key"])


def _s3_path_split(s3_path):
    """
    Split an S3 path into bucket and key.

    Parameters
    ----------
    s3_path : str

    Returns
    -------
    splitted : (str, str)
        (bucket, key)

    Examples
    --------
    >>> _s3_path_split('s3://my-bucket/foo/bar.jpg')
    S3Path(bucket_name='my-bucket', key='foo/bar.jpg')
    """
    if not s3_path.startswith("s3://"):
        raise ValueError(
            "s3_path is expected to start with 's3://', " "but was {}"
            .format(s3_path)
        )
    bucket_key = s3_path[len("s3://"):]
    bucket_name, key = bucket_key.split("/", 1)
    return S3Path(bucket_name, key)

When you want to read a file with a different configuration than the default one, feel free to use either mpu.aws.s3_download(s3path, destination) directly or the copy-pasted code:

def s3_download(source, destination,
                exists_strategy='raise',
                profile_name=None):
    """
    Copy a file from an S3 source to a local destination.

    Parameters
    ----------
    source : str
        Path starting with s3://, e.g. 's3://bucket-name/key/foo.bar'
    destination : str
    exists_strategy : {'raise', 'replace', 'abort'}
        What is done when the destination already exists?
    profile_name : str, optional
        AWS profile

    Raises
    ------
    botocore.exceptions.NoCredentialsError
        Botocore is not able to find your credentials. Either specify
        profile_name or add the environment variables AWS_ACCESS_KEY_ID,
        AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN.
        See https://boto3.readthedocs.io/en/latest/guide/configuration.html
    """
    exists_strategies = ['raise', 'replace', 'abort']
    if exists_strategy not in exists_strategies:
        raise ValueError('exists_strategy \'{}\' is not in {}'
                         .format(exists_strategy, exists_strategies))
    session = boto3.Session(profile_name=profile_name)
    s3 = session.resource('s3')
    bucket_name, key = _s3_path_split(source)
    if os.path.isfile(destination):
        if exists_strategy is 'raise':
            raise RuntimeError('File \'{}\' already exists.'
                               .format(destination))
        elif exists_strategy is 'abort':
            return
    s3.Bucket(bucket_name).download_file(key, destination)

from collections import namedtuple

S3Path = namedtuple("S3Path", ["bucket_name", "key"])


def _s3_path_split(s3_path):
    """
    Split an S3 path into bucket and key.

    Parameters
    ----------
    s3_path : str

    Returns
    -------
    splitted : (str, str)
        (bucket, key)

    Examples
    --------
    >>> _s3_path_split('s3://my-bucket/foo/bar.jpg')
    S3Path(bucket_name='my-bucket', key='foo/bar.jpg')
    """
    if not s3_path.startswith("s3://"):
        raise ValueError(
            "s3_path is expected to start with 's3://', " "but was {}"
            .format(s3_path)
        )
    bucket_key = s3_path[len("s3://"):]
    bucket_name, key = bucket_key.split("/", 1)
    return S3Path(bucket_name, key)

回答 5

注意:我假设您已经分别配置了身份验证。下面的代码是从S3存储桶下载单个对象。

import boto3

#initiate s3 client 
s3 = boto3.resource('s3')

#Download object to the file    
s3.Bucket('mybucket').download_file('hello.txt', '/tmp/hello.txt')

Note: I’m assuming you have configured authentication separately. Below code is to download the single object from the S3 bucket.

import boto3

#initiate s3 client 
s3 = boto3.resource('s3')

#Download object to the file    
s3.Bucket('mybucket').download_file('hello.txt', '/tmp/hello.txt')

断言未使用Mock调用函数/方法

问题:断言未使用Mock调用函数/方法

我正在使用Mock库来测试我的应用程序,但是我想断言某些函数没有被调用。模拟文档谈论类似的方法mock.assert_called_withmock.assert_called_once_with,但我没有找到像什么mock.assert_not_called或验证模拟相关的东西是不叫

我可以使用类似以下的内容,尽管它看起来既不酷也不是pythonic:

def test_something:
    # some actions
    with patch('something') as my_var:
        try:
            # args are not important. func should never be called in this test
            my_var.assert_called_with(some, args)
        except AssertionError:
            pass  # this error being raised means it's ok
    # other stuff

任何想法如何做到这一点?

I’m using the Mock library to test my application, but I want to assert that some function was not called. Mock docs talk about methods like mock.assert_called_with and mock.assert_called_once_with, but I didn’t find anything like mock.assert_not_called or something related to verify mock was NOT called.

I could go with something like the following, though it doesn’t seem cool nor pythonic:

def test_something:
    # some actions
    with patch('something') as my_var:
        try:
            # args are not important. func should never be called in this test
            my_var.assert_called_with(some, args)
        except AssertionError:
            pass  # this error being raised means it's ok
    # other stuff

Any ideas how to accomplish this?


回答 0

这应该适合您的情况;

assert not my_var.called, 'method should not have been called'

样品;

>>> mock=Mock()
>>> mock.a()
<Mock name='mock.a()' id='4349129872'>
>>> assert not mock.b.called, 'b was called and should not have been'
>>> assert not mock.a.called, 'a was called and should not have been'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError: a was called and should not have been

This should work for your case;

assert not my_var.called, 'method should not have been called'

Sample;

>>> mock=Mock()
>>> mock.a()
<Mock name='mock.a()' id='4349129872'>
>>> assert not mock.b.called, 'b was called and should not have been'
>>> assert not mock.a.called, 'a was called and should not have been'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError: a was called and should not have been

回答 1

尽管是一个老问题,我想补充一下当前的mock库(unittest.mock的反向端口)支持assert_not_called方法。

只需升级您的;

pip install mock --upgrade

Though an old question, I would like to add that currently mock library (backport of unittest.mock) supports assert_not_called method.

Just upgrade yours;

pip install mock --upgrade


回答 2

您可以检查该called属性,但是如果断言失败,那么您接下来要了解的是有关意外调用的信息,因此您也可以安排从头开始显示该信息。使用unittest,您可以查看的内容call_args_list

self.assertItemsEqual(my_var.call_args_list, [])

当失败时,它会给出如下消息:

AssertionError:元素计数不相等:
第一个具有0,第二个具有1:call('first arguments',4)

You can check the called attribute, but if your assertion fails, the next thing you’ll want to know is something about the unexpected call, so you may as well arrange for that information to be displayed from the start. Using unittest, you can check the contents of call_args_list instead:

self.assertItemsEqual(my_var.call_args_list, [])

When it fails, it gives a message like this:

AssertionError: Element counts were not equal:
First has 0, Second has 1:  call('first argument', 4)

回答 3

当您使用类进行测试时,继承了unittest.TestCase,您可以简单地使用以下方法:

  • assertTrue
  • 断言错误
  • 断言等于

和类似的(在python文档中找到其余的)。

在您的示例中,我们可以简单地断言嘲笑嘲笑的方法是否为False,这意味着未调用该方法。

import unittest
from unittest import mock

import my_module

class A(unittest.TestCase):
    def setUp(self):
        self.message = "Method should not be called. Called {times} times!"

    @mock.patch("my_module.method_to_mock")
    def test(self, mock_method):
        my_module.method_to_mock()

        self.assertFalse(mock_method.called,
                         self.message.format(times=mock_method.call_count))

When you test using class inherits unittest.TestCase you can simply use methods like:

  • assertTrue
  • assertFalse
  • assertEqual

and similar (in python documentation you find the rest).

In your example we can simply assert if mock_method.called property is False, which means that method was not called.

import unittest
from unittest import mock

import my_module

class A(unittest.TestCase):
    def setUp(self):
        self.message = "Method should not be called. Called {times} times!"

    @mock.patch("my_module.method_to_mock")
    def test(self, mock_method):
        my_module.method_to_mock()

        self.assertFalse(mock_method.called,
                         self.message.format(times=mock_method.call_count))

回答 4

随便python >= 3.5可以使用mock_object.assert_not_called()

With python >= 3.5 you can use mock_object.assert_not_called().


回答 5

从其他答案来看,除了@ rob-kennedy之外,没有人谈论过call_args_list

它是一个强大的工具,可以实现与 MagicMock.assert_called_with()

call_args_listcall对象列表。每个call对象代表对模拟可调用对象的调用。

>>> from unittest.mock import MagicMock
>>> m = MagicMock()
>>> m.call_args_list
[]
>>> m(42)
<MagicMock name='mock()' id='139675158423872'>
>>> m.call_args_list
[call(42)]
>>> m(42, 30)
<MagicMock name='mock()' id='139675158423872'>
>>> m.call_args_list
[call(42), call(42, 30)]

使用call对象很容易,因为您可以将其与长度为2的元组进行比较,其中第一个组件是包含相关调用的所有位置参数的元组,而第二个组件是关键字arguments的字典。

>>> ((42,),) in m.call_args_list
True
>>> m(42, foo='bar')
<MagicMock name='mock()' id='139675158423872'>
>>> ((42,), {'foo': 'bar'}) in m.call_args_list
True
>>> m(foo='bar')
<MagicMock name='mock()' id='139675158423872'>
>>> ((), {'foo': 'bar'}) in m.call_args_list
True

因此,解决OP特定问题的一种方法是

def test_something():
    with patch('something') as my_var:
        assert ((some, args),) not in my_var.call_args_list

请注意,通过这种方式,MagicMock.called您不仅可以通过检查是否已调用了模拟的可调用对象,还可以通过一组特定的参数来检查它是否已被调用。

这很有用。假设您要测试一个接受列表的函数,然后compute()仅在满足特定条件的情况下针对列表的每个值调用另一个函数。

现在compute,您可以模拟,并测试是否已按某个值调用了它,但未按其他值调用了它。

Judging from other answers, no one except @rob-kennedy talked about the call_args_list.

It’s a powerful tool for that you can implement the exact contrary of MagicMock.assert_called_with()

call_args_list is a list of call objects. Each call object represents a call made on a mocked callable.

>>> from unittest.mock import MagicMock
>>> m = MagicMock()
>>> m.call_args_list
[]
>>> m(42)
<MagicMock name='mock()' id='139675158423872'>
>>> m.call_args_list
[call(42)]
>>> m(42, 30)
<MagicMock name='mock()' id='139675158423872'>
>>> m.call_args_list
[call(42), call(42, 30)]

Consuming a call object is easy, since you can compare it with a tuple of length 2 where the first component is a tuple containing all the positional arguments of the related call, while the second component is a dictionary of the keyword arguments.

>>> ((42,),) in m.call_args_list
True
>>> m(42, foo='bar')
<MagicMock name='mock()' id='139675158423872'>
>>> ((42,), {'foo': 'bar'}) in m.call_args_list
True
>>> m(foo='bar')
<MagicMock name='mock()' id='139675158423872'>
>>> ((), {'foo': 'bar'}) in m.call_args_list
True

So, a way to address the specific problem of the OP is

def test_something():
    with patch('something') as my_var:
        assert ((some, args),) not in my_var.call_args_list

Note that this way, instead of just checking if a mocked callable has been called, via MagicMock.called, you can now check if it has been called with a specific set of arguments.

That’s useful. Say you want to test a function that takes a list and call another function, compute(), for each of the value of the list only if they satisfy a specific condition.

You can now mock compute, and test if it has been called on some value but not on others.


在Python中将列表成对迭代(当前,下一个)

问题:在Python中将列表成对迭代(当前,下一个)

有时我需要在Python中迭代一个列表,以查看“当前”元素和“下一个”元素。到目前为止,我已经使用以下代码完成了此操作:

for current, next in zip(the_list, the_list[1:]):
    # Do something

这行得通,符合我的期望,但是有没有一种更惯用或有效的方式来执行相同的操作?

I sometimes need to iterate a list in Python looking at the “current” element and the “next” element. I have, till now, done so with code like:

for current, next in zip(the_list, the_list[1:]):
    # Do something

This works and does what I expect, but is there’s a more idiomatic or efficient way to do the same thing?


回答 0

这是itertools模块文档中的一个相关示例:

import itertools
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return zip(a, b)   

对于Python 2,您需要itertools.izip代替zip

import itertools
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.izip(a, b)

工作原理:

首先,创建两个并行的迭代器,a并将b它们(tee()称为调用)都指向原始可迭代的第一个元素。第二个迭代器b向前移动1个(next(b, None))调用。此时指向as0并b指向s1。双方ab可以独立遍历原来迭代器-的izip函数接受两个迭代器,使对返回的元素,以相同的速度前进的两个迭代器。

一个警告:该tee()函数产生两个可以彼此独立进行的迭代器,但这要付出一定的代价。如果一个迭代器比另一个迭代器前进得更多,则tee() 需要将消耗的元素保留在内存中,直到第二个迭代器也将它们消耗掉为止(它不能“倒回”原始迭代器)。这里没有关系,因为一个迭代器仅比另一个迭代器领先1步,但是通常很容易以这种方式使用大量内存。

而且由于tee()可以接受n参数,因此它也可以用于两个以上的并行迭代器:

def threes(iterator):
    "s -> (s0,s1,s2), (s1,s2,s3), (s2, s3,4), ..."
    a, b, c = itertools.tee(iterator, 3)
    next(b, None)
    next(c, None)
    next(c, None)
    return zip(a, b, c)

Here’s a relevant example from the itertools module docs:

import itertools
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return zip(a, b)   

For Python 2, you need itertools.izip instead of zip:

import itertools
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.izip(a, b)

How this works:

First, two parallel iterators, a and b are created (the tee() call), both pointing to the first element of the original iterable. The second iterator, b is moved 1 step forward (the next(b, None)) call). At this point a points to s0 and b points to s1. Both a and b can traverse the original iterator independently – the izip function takes the two iterators and makes pairs of the returned elements, advancing both iterators at the same pace.

One caveat: the tee() function produces two iterators that can advance independently of each other, but it comes at a cost. If one of the iterators advances further than the other, then tee() needs to keep the consumed elements in memory until the second iterator comsumes them too (it cannot ‘rewind’ the original iterator). Here it doesn’t matter because one iterator is only 1 step ahead of the other, but in general it’s easy to use a lot of memory this way.

And since tee() can take an n parameter, this can also be used for more than two parallel iterators:

def threes(iterator):
    "s -> (s0,s1,s2), (s1,s2,s3), (s2, s3,4), ..."
    a, b, c = itertools.tee(iterator, 3)
    next(b, None)
    next(c, None)
    next(c, None)
    return zip(a, b, c)

回答 1

自己滚!

def pairwise(iterable):
    it = iter(iterable)
    a = next(it, None)

    for b in it:
        yield (a, b)
        a = b

Roll your own!

def pairwise(iterable):
    it = iter(iterable)
    a = next(it, None)

    for b in it:
        yield (a, b)
        a = b

回答 2

由于the_list[1:]实际上创建了整个列表的副本(不包括其第一个元素),并zip()在调用时立即创建了一个元组列表,因此总共创建了该列表的三个副本。如果您的清单很大,则可能更喜欢

from itertools import izip, islice
for current_item, next_item in izip(the_list, islice(the_list, 1, None)):
    print(current_item, next_item)

根本不会复制列表。

Since the_list[1:] actually creates a copy of the whole list (excluding its first element), and zip() creates a list of tuples immediately when called, in total three copies of your list are created. If your list is very large, you might prefer

from itertools import izip, islice
for current_item, next_item in izip(the_list, islice(the_list, 1, None)):
    print(current_item, next_item)

which does not copy the list at all.


回答 3

我只是将其列出来,我很惊讶没有人想到enumerate()。

for (index, thing) in enumerate(the_list):
    if index < len(the_list):
        current, next_ = thing, the_list[index + 1]
        #do something

I’m just putting this out, I’m very surprised no one has thought of enumerate().

for (index, thing) in enumerate(the_list):
    if index < len(the_list):
        current, next_ = thing, the_list[index + 1]
        #do something

回答 4

通过索引进行迭代可以做同样的事情:

#!/usr/bin/python
the_list = [1, 2, 3, 4]
for i in xrange(len(the_list) - 1):
    current_item, next_item = the_list[i], the_list[i + 1]
    print(current_item, next_item)

输出:

(1, 2)
(2, 3)
(3, 4)

Iterating by index can do the same thing:

#!/usr/bin/python
the_list = [1, 2, 3, 4]
for i in xrange(len(the_list) - 1):
    current_item, next_item = the_list[i], the_list[i + 1]
    print(current_item, next_item)

Output:

(1, 2)
(2, 3)
(3, 4)

回答 5

截至2020年5月16日,这现在是一个简单的导入

from more_itertools import pairwise
for current, next in pairwise(your_iterable):
  print(f'Current = {current}, next = {nxt}')

更多itertools的文档 该代码与其他答案中的代码相同,但我更喜欢在可用时导入。

如果尚未安装,则: pip install more-itertools

例如,如果您有fibbonnacci序列,则可以计算后续对的比率为:

from more_itertools import pairwise
fib= [1,1,2,3,5,8,13]
for current, nxt in pairwise(fib):
    ratio=current/nxt
    print(f'Curent = {current}, next = {nxt}, ratio = {ratio} ')

This is now a simple Import As of 16th May 2020

from more_itertools import pairwise
for current, next in pairwise(your_iterable):
  print(f'Current = {current}, next = {nxt}')

Docs for more-itertools Under the hood this code is the same as that in the other answers, but I much prefer imports when available.

If you don’t already have it installed then: pip install more-itertools

Example

For instance if you had the fibbonnacci sequence, you could calculate the ratios of subsequent pairs as:

from more_itertools import pairwise
fib= [1,1,2,3,5,8,13]
for current, nxt in pairwise(fib):
    ratio=current/nxt
    print(f'Curent = {current}, next = {nxt}, ratio = {ratio} ')

回答 6

使用列表理解从列表中配对

the_list = [1, 2, 3, 4]
pairs = [[the_list[i], the_list[i + 1]] for i in range(len(the_list) - 1)]
for [current_item, next_item] in pairs:
    print(current_item, next_item)

输出:

(1, 2)
(2, 3)
(3, 4)

Pairs from a list using a list comprehension

the_list = [1, 2, 3, 4]
pairs = [[the_list[i], the_list[i + 1]] for i in range(len(the_list) - 1)]
for [current_item, next_item] in pairs:
    print(current_item, next_item)

Output:

(1, 2)
(2, 3)
(3, 4)

回答 7

我真的很惊讶,没有人提到更短,更简单,最重要的通用解决方案:

Python 3:

from itertools import islice

def n_wise(iterable, n):
    return zip(*(islice(iterable, i, None) for i in range(n)))

Python 2:

from itertools import izip, islice

def n_wise(iterable, n):
    return izip(*(islice(iterable, i, None) for i in xrange(n)))

它可以通过进行成对迭代n=2,但是可以处理更大的数字:

>>> for a, b in n_wise('Hello!', 2):
>>>     print(a, b)
H e
e l
l l
l o
o !

>>> for a, b, c, d in n_wise('Hello World!', 4):
>>>     print(a, b, c, d)
H e l l
e l l o
l l o
l o   W
o   W o
  W o r
W o r l
o r l d
r l d !

I am really surprised nobody has mentioned the shorter, simpler and most importantly general solution:

Python 3:

from itertools import islice

def n_wise(iterable, n):
    return zip(*(islice(iterable, i, None) for i in range(n)))

Python 2:

from itertools import izip, islice

def n_wise(iterable, n):
    return izip(*(islice(iterable, i, None) for i in xrange(n)))

It works for pairwise iteration by passing n=2, but can handle any higher number:

>>> for a, b in n_wise('Hello!', 2):
>>>     print(a, b)
H e
e l
l l
l o
o !

>>> for a, b, c, d in n_wise('Hello World!', 4):
>>>     print(a, b, c, d)
H e l l
e l l o
l l o
l o   W
o   W o
  W o r
W o r l
o r l d
r l d !

回答 8

基本解决方案:

def neighbors( list ):
  i = 0
  while i + 1 < len( list ):
    yield ( list[ i ], list[ i + 1 ] )
    i += 1

for ( x, y ) in neighbors( list ):
  print( x, y )

A basic solution:

def neighbors( list ):
  i = 0
  while i + 1 < len( list ):
    yield ( list[ i ], list[ i + 1 ] )
    i += 1

for ( x, y ) in neighbors( list ):
  print( x, y )

回答 9

code = '0016364ee0942aa7cc04a8189ef3'
# Getting the current and next item
print  [code[idx]+code[idx+1] for idx in range(len(code)-1)]
# Getting the pair
print  [code[idx*2]+code[idx*2+1] for idx in range(len(code)/2)]
code = '0016364ee0942aa7cc04a8189ef3'
# Getting the current and next item
print  [code[idx]+code[idx+1] for idx in range(len(code)-1)]
# Getting the pair
print  [code[idx*2]+code[idx*2+1] for idx in range(len(code)/2)]

Python将多个变量分配给相同的值?列出行为

问题:Python将多个变量分配给相同的值?列出行为

我试图使用如下所示的多重赋值来初始化变量,但是我对此行为感到困惑,我希望分别重新赋值列表,我的意思是b [0]和c [0]等于0。

a=b=c=[0,3,5]
a[0]=1
print(a)
print(b)
print(c)

结果是:[1、3、5] [1、3、5] [1、3、5]

那是对的吗?多重分配应该使用什么?有什么不同呢?

d=e=f=3
e=4
print('f:',f)
print('e:',e)

结果:(’f:’,3)(’e:’,4)

I tried to use multiple assignment as show below to initialize variables, but I got confused by the behavior, I expect to reassign the values list separately, I mean b[0] and c[0] equal 0 as before.

a=b=c=[0,3,5]
a[0]=1
print(a)
print(b)
print(c)

Result is: [1, 3, 5] [1, 3, 5] [1, 3, 5]

Is that correct? what should I use for multiple assignment? what is different from this?

d=e=f=3
e=4
print('f:',f)
print('e:',e)

result: (‘f:’, 3) (‘e:’, 4)


回答 0

如果您要使用C / Java / etc等语言的Python。家庭,这可能会帮助您停止将其a视为“变量”,而开始将其视为“名称”。

a,,bc不是具有相等值的不同变量;它们是相同名称的不同名称。变量具有类型,身份,地址和类似的东西。

名称没有任何名称。当然可以,并且对于相同的值,您可以有很多名称。

如果给Notorious B.I.G.热狗* Biggie Smalls,请Chris Wallace带一个热狗。如果将的第一个元素更改a为1,则b和的第一个元素为c1。

如果您想知道两个名称是否在命名同一对象,请使用is运算符:

>>> a=b=c=[0,3,5]
>>> a is b
True

然后您问:

有什么不同呢?

d=e=f=3
e=4
print('f:',f)
print('e:',e)

在这里,您将名称重新绑定e到value 4。这并不影响名称df以任何方式。

在先前的版本中,您分配给a[0],而不是a。因此,从的角度来看a[0],您正在重新绑定a[0],但是从的角度来看a,您正在就位更改它。

您可以使用该id函数,该函数为您提供代表对象身份的唯一编号,以准确查看哪个对象是哪个对象,即使在is无济于事的情况下:

>>> a=b=c=[0,3,5]
>>> id(a)
4473392520
>>> id(b)
4473392520
>>> id(a[0])
4297261120
>>> id(b[0])
4297261120

>>> a[0] = 1
>>> id(a)
4473392520
>>> id(b)
4473392520
>>> id(a[0])
4297261216
>>> id(b[0])
4297261216

请注意,该a[0]名称已从4297261120更改为4297261216-现在它是另一个值的名称。并且b[0]现在也是该新值的名称。这是因为ab仍在命名同一对象。


在幕后,a[0]=1实际上是在列表对象上调用方法。(等效于a.__setitem__(0, 1)。)因此,它实际上根本没有重新绑定任何内容。这就像打电话my_object.set_something(1)。当然,该对象可能是重新绑定实例属性以实现此方法,但这并不重要。重要的是您没有分配任何东西,只是在改变对象。与相同a[0]=1


user570826询问:

如果有的话 a = b = c = 10

这与以下情况完全相同a = b = c = [1, 2, 3]:您有三个名称相同的值。

但是在这种情况下,值是an int,而ints是不可变的。在这两种情况下,你可以重新绑定a到一个不同的值(例如,a = "Now I'm a string!"),但不会影响原来的值,这bc将仍然是名称。不同的是,有一个列表,你可以更改值[1, 2, 3][1, 2, 3, 4]这样做,例如a.append(4); 由于那实际上是在更改值bc的名称,b因此现在b [1, 2, 3, 4]。无法将值更改10为其他任何值。10永远是10,就像克劳迪娅(Claudia)一样,吸血鬼也永远是5(至少直到她被Kirsten Dunst取代)。


*警告:请勿给臭名昭著的BIG热狗。黑帮说唱僵尸永远不要在午夜之后喂饱。

If you’re coming to Python from a language in the C/Java/etc. family, it may help you to stop thinking about a as a “variable”, and start thinking of it as a “name”.

a, b, and c aren’t different variables with equal values; they’re different names for the same identical value. Variables have types, identities, addresses, and all kinds of stuff like that.

Names don’t have any of that. Values do, of course, and you can have lots of names for the same value.

If you give Notorious B.I.G. a hot dog,* Biggie Smalls and Chris Wallace have a hot dog. If you change the first element of a to 1, the first elements of b and c are 1.

If you want to know if two names are naming the same object, use the is operator:

>>> a=b=c=[0,3,5]
>>> a is b
True

You then ask:

what is different from this?

d=e=f=3
e=4
print('f:',f)
print('e:',e)

Here, you’re rebinding the name e to the value 4. That doesn’t affect the names d and f in any way.

In your previous version, you were assigning to a[0], not to a. So, from the point of view of a[0], you’re rebinding a[0], but from the point of view of a, you’re changing it in-place.

You can use the id function, which gives you some unique number representing the identity of an object, to see exactly which object is which even when is can’t help:

>>> a=b=c=[0,3,5]
>>> id(a)
4473392520
>>> id(b)
4473392520
>>> id(a[0])
4297261120
>>> id(b[0])
4297261120

>>> a[0] = 1
>>> id(a)
4473392520
>>> id(b)
4473392520
>>> id(a[0])
4297261216
>>> id(b[0])
4297261216

Notice that a[0] has changed from 4297261120 to 4297261216—it’s now a name for a different value. And b[0] is also now a name for that same new value. That’s because a and b are still naming the same object.


Under the covers, a[0]=1 is actually calling a method on the list object. (It’s equivalent to a.__setitem__(0, 1).) So, it’s not really rebinding anything at all. It’s like calling my_object.set_something(1). Sure, likely the object is rebinding an instance attribute in order to implement this method, but that’s not what’s important; what’s important is that you’re not assigning anything, you’re just mutating the object. And it’s the same with a[0]=1.


user570826 asked:

What if we have, a = b = c = 10

That’s exactly the same situation as a = b = c = [1, 2, 3]: you have three names for the same value.

But in this case, the value is an int, and ints are immutable. In either case, you can rebind a to a different value (e.g., a = "Now I'm a string!"), but the won’t affect the original value, which b and c will still be names for. The difference is that with a list, you can change the value [1, 2, 3] into [1, 2, 3, 4] by doing, e.g., a.append(4); since that’s actually changing the value that b and c are names for, b will now b [1, 2, 3, 4]. There’s no way to change the value 10 into anything else. 10 is 10 forever, just like Claudia the vampire is 5 forever (at least until she’s replaced by Kirsten Dunst).


* Warning: Do not give Notorious B.I.G. a hot dog. Gangsta rap zombies should never be fed after midnight.


回答 1

咳嗽

>>> a,b,c = (1,2,3)
>>> a
1
>>> b
2
>>> c
3
>>> a,b,c = ({'test':'a'},{'test':'b'},{'test':'c'})
>>> a
{'test': 'a'}
>>> b
{'test': 'b'}
>>> c
{'test': 'c'}
>>> 

Cough cough

>>> a,b,c = (1,2,3)
>>> a
1
>>> b
2
>>> c
3
>>> a,b,c = ({'test':'a'},{'test':'b'},{'test':'c'})
>>> a
{'test': 'a'}
>>> b
{'test': 'b'}
>>> c
{'test': 'c'}
>>> 

回答 2

是的,这是预期的行为。a,b和c都设置为同一列表的标签。如果需要三个不同的列表,则需要分别分配它们。您可以重复显示列表,也可以使用多种方式之一复制列表:

b = a[:] # this does a shallow copy, which is good enough for this case
import copy
c = copy.deepcopy(a) # this does a deep copy, which matters if the list contains mutable objects

Python中的赋值语句不复制对象-它们将名称绑定到对象,并且对象可以具有与您设置的一样多的标签。在第一次编辑中,更改a [0],您将更新a,b和c均引用的单个列表中的一个元素。在第二次更改e中,您将e切换为另一个对象的标签(4而不是3)。

Yes, that’s the expected behavior. a, b and c are all set as labels for the same list. If you want three different lists, you need to assign them individually. You can either repeat the explicit list, or use one of the numerous ways to copy a list:

b = a[:] # this does a shallow copy, which is good enough for this case
import copy
c = copy.deepcopy(a) # this does a deep copy, which matters if the list contains mutable objects

Assignment statements in Python do not copy objects – they bind the name to an object, and an object can have as many labels as you set. In your first edit, changing a[0], you’re updating one element of the single list that a, b, and c all refer to. In your second, changing e, you’re switching e to be a label for a different object (4 instead of 3).


回答 3

在python中,一切都是对象,也是“简单”的变量类型(int,float等)。

当您更改变量值时,实际上是在更改其指针,如果在两个变量之间进行比较,则会对其指针进行比较。(要清楚,指针是物理计算机内存中存储变量的地址)。

结果,当您更改内部变量值时,会更改其在内存中的值,并且会影响指向该地址的所有变量。

例如,当您这样做时:

a = b =  5 

这意味着a和b指向内存中包含值5的相同地址,但是当您这样做时:

a = 6

它不会影响b,因为a现在指向另一个包含6的存储位置,而b仍然指向包含5的存储地址。

但是,当您这样做时:

a = b = [1,2,3]

a和b再次指向相同的位置,但是不同之处在于,如果更改列表值之一:

a[0] = 2

它会更改a所指向的内存的值,但是a仍然指向与b相同的地址,结果b也将更改。

In python, everything is an object, also “simple” variables types (int, float, etc..).

When you changes a variable value, you actually changes it’s pointer, and if you compares between two variables it’s compares their pointers. (To be clear, pointer is the address in physical computer memory where a variable is stored).

As a result, when you changes an inner variable value, you changes it’s value in the memory and it’s affects all the variables that point to this address.

For your example, when you do:

a = b =  5 

This means that a and b points to the same address in memory that contains the value 5, but when you do:

a = 6

It’s not affect b because a is now points to another memory location that contains 6 and b still points to the memory address that contains 5.

But, when you do:

a = b = [1,2,3]

a and b, again, points to the same location but the difference is that if you change the one of the list values:

a[0] = 2

It’s changes the value of the memory that a is points on, but a is still points to the same address as b, and as a result, b changes as well.


回答 4

您可以id(name)用来检查两个名称是否代表相同的对象:

>>> a = b = c = [0, 3, 5]
>>> print(id(a), id(b), id(c))
46268488 46268488 46268488

列表是可变的;这意味着您可以在不创建新对象的情况下就地更改值。但是,这取决于您如何更改值:

>>> a[0] = 1
>>> print(id(a), id(b), id(c))
46268488 46268488 46268488
>>> print(a, b, c)
[1, 3, 5] [1, 3, 5] [1, 3, 5]

如果您将新列表分配给a,则其ID将更改,因此不会影响bc的值:

>>> a = [1, 8, 5]
>>> print(id(a), id(b), id(c))
139423880 46268488 46268488
>>> print(a, b, c)
[1, 8, 5] [1, 3, 5] [1, 3, 5]

整数是不可变的,因此您不能在不创建新对象的情况下更改值:

>>> x = y = z = 1
>>> print(id(x), id(y), id(z))
507081216 507081216 507081216
>>> x = 2
>>> print(id(x), id(y), id(z))
507081248 507081216 507081216
>>> print(x, y, z)
2 1 1

You can use id(name) to check if two names represent the same object:

>>> a = b = c = [0, 3, 5]
>>> print(id(a), id(b), id(c))
46268488 46268488 46268488

Lists are mutable; it means you can change the value in place without creating a new object. However, it depends on how you change the value:

>>> a[0] = 1
>>> print(id(a), id(b), id(c))
46268488 46268488 46268488
>>> print(a, b, c)
[1, 3, 5] [1, 3, 5] [1, 3, 5]

If you assign a new list to a, then its id will change, so it won’t affect b and c‘s values:

>>> a = [1, 8, 5]
>>> print(id(a), id(b), id(c))
139423880 46268488 46268488
>>> print(a, b, c)
[1, 8, 5] [1, 3, 5] [1, 3, 5]

Integers are immutable, so you cannot change the value without creating a new object:

>>> x = y = z = 1
>>> print(id(x), id(y), id(z))
507081216 507081216 507081216
>>> x = 2
>>> print(id(x), id(y), id(z))
507081248 507081216 507081216
>>> print(x, y, z)
2 1 1

回答 5

在第一个示例中,a = b = c = [1, 2, 3]您实际上是在说:

 'a' is the same as 'b', is the same as 'c' and they are all [1, 2, 3]

如果要将“ a”设置为1,将“ b”设置为“ 2”,将“ c”设置为3,请尝试以下操作:

a, b, c = [1, 2, 3]

print(a)
--> 1
print(b)
--> 2
print(c)
--> 3

希望这可以帮助!

in your first example a = b = c = [1, 2, 3] you are really saying:

 'a' is the same as 'b', is the same as 'c' and they are all [1, 2, 3]

If you want to set ‘a’ equal to 1, ‘b’ equal to ‘2’ and ‘c’ equal to 3, try this:

a, b, c = [1, 2, 3]

print(a)
--> 1
print(b)
--> 2
print(c)
--> 3

Hope this helps!


回答 6

简而言之,在第一种情况下,您要为分配多个名称list。在内存中仅创建一个列表副本,并且所有名称均指向该位置。因此,使用任何名称更改列表实际上都会修改内存中的列表。

在第二种情况下,将在内存中创建相同值的多个副本。因此,每个副本彼此独立。

Simply put, in the first case, you are assigning multiple names to a list. Only one copy of list is created in memory and all names refer to that location. So changing the list using any of the names will actually modify the list in memory.

In the second case, multiple copies of same value are created in memory. So each copy is independent of one another.


回答 7

您需要的是:

a, b, c = [0,3,5] # Unpack the list, now a, b, and c are ints
a = 1             # `a` did equal 0, not [0,3,5]
print(a)
print(b)
print(c)

What you need is this:

a, b, c = [0,3,5] # Unpack the list, now a, b, and c are ints
a = 1             # `a` did equal 0, not [0,3,5]
print(a)
print(b)
print(c)

回答 8

执行我需要的代码可能是这样的:

# test

aux=[[0 for n in range(3)] for i in range(4)]
print('aux:',aux)

# initialization

a,b,c,d=[[0 for n in range(3)] for i in range(4)]

# changing values

a[0]=1
d[2]=5
print('a:',a)
print('b:',b)
print('c:',c)
print('d:',d)

结果:

('aux:', [[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]])
('a:', [1, 0, 0])
('b:', [0, 0, 0])
('c:', [0, 0, 0])
('d:', [0, 0, 5])

The code that does what I need could be this:

# test

aux=[[0 for n in range(3)] for i in range(4)]
print('aux:',aux)

# initialization

a,b,c,d=[[0 for n in range(3)] for i in range(4)]

# changing values

a[0]=1
d[2]=5
print('a:',a)
print('b:',b)
print('c:',c)
print('d:',d)

Result:

('aux:', [[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]])
('a:', [1, 0, 0])
('b:', [0, 0, 0])
('c:', [0, 0, 0])
('d:', [0, 0, 5])

Matplotlib透明线图

问题:Matplotlib透明线图

我在matplotlib中绘制了两个相似的轨迹,我想以部分透明的方式绘制每条线,以使红色(绘制的第二个)不会遮盖蓝色。

编辑:这是带有透明线的图像。

I am plotting two similar trajectories in matplotlib and I’d like to plot each of the lines with partial transparency so that the red (plotted second) doesn’t obscure the blue.

EDIT: Here’s the image with transparent lines.


回答 0

干净利落:

plt.plot(x, y, 'r-', alpha=0.7)

(我知道我没有添加任何新内容,但是简单的答案应该可见)。

Plain and simple:

plt.plot(x, y, 'r-', alpha=0.7)

(I know I add nothing new, but the straightforward answer should be visible).


回答 1

绘制完所有线条后,可以如下设置所有线条的透明度:

for l in fig_field.gca().lines:
    l.set_alpha(.7)

编辑:请在评论中查看乔的答案。

After I plotted all the lines, I was able to set the transparency of all of them as follows:

for l in fig_field.gca().lines:
    l.set_alpha(.7)

EDIT: please see Joe’s answer in the comments.


回答 2

这实际上取决于您要使用哪些函数来绘制线条,但是请尝试查看所使用的on是否采用alpha值并将其设置为0.5。如果那不起作用,请尝试获取线对象并直接设置其alpha值。

It really depends on what functions you’re using to plot the lines, but try see if the on you’re using takes an alpha value and set it to something like 0.5. If that doesn’t work, try get the line objects and set their alpha values directly.