Python 实用宝典

Question 1

I know Ruby very well. I believe that I may need to learn Python presently. For those who know both, what concepts are similar between the two, and what are different?

I’m looking for a list similar to a primer I wrote for Learning Lua for JavaScripters: simple things like whitespace significance and looping constructs; the name of nil in Python, and what values are considered “truthy”; is it idiomatic to use the equivalent of map and each, or are mumble somethingaboutlistcomprehensions mumble the norm?

If I get a good variety of answers I’m happy to aggregate them into a community wiki. Or else you all can fight and crib from each other to try to create the one true comprehensive list.

Edit: To be clear, my goal is “proper” and idiomatic Python. If there is a Python equivalent of inject, but nobody uses it because there is a better/different way to achieve the common functionality of iterating a list and accumulating a result along the way, I want to know how you do things. Perhaps I’ll update this question with a list of common goals, how you achieve them in Ruby, and ask what the equivalent is in Python.

Question 2

Here are some key differences to me:

Ruby has blocks; Python does not.
Python has functions; Ruby does not. In Python, you can take any function or method and pass it to another function. In Ruby, everything is a method, and methods can’t be directly passed. Instead, you have to wrap them in Proc’s to pass them.
Ruby and Python both support closures, but in different ways. In Python, you can define a function inside another function. The inner function has read access to variables from the outer function, but not write access. In Ruby, you define closures using blocks. The closures have full read and write access to variables from the outer scope.
Python has list comprehensions, which are pretty expressive. For example, if you have a list of numbers, you can write
```
[x*x for x in values if x > 15]
```
to get a new list of the squares of all values greater than 15. In Ruby, you’d have to write the following:
```
values.select {|v| v > 15}.map {|v| v * v}
```
The Ruby code doesn’t feel as compact. It’s also not as efficient since it first converts the values array into a shorter intermediate array containing the values greater than 15. Then, it takes the intermediate array and generates a final array containing the squares of the intermediates. The intermediate array is then thrown out. So, Ruby ends up with 3 arrays in memory during the computation; Python only needs the input list and the resulting list.

Python also supplies similar map comprehensions.
Python supports tuples; Ruby doesn’t. In Ruby, you have to use arrays to simulate tuples.
Ruby supports switch/case statements; Python does not.
~~Ruby supports the standard expr ? val1 : val2 ternary operator; Python does not.~~
Ruby supports only single inheritance. If you need to mimic multiple inheritance, you can define modules and use mix-ins to pull the module methods into classes. Python supports multiple inheritance rather than module mix-ins.
Python supports only single-line lambda functions. Ruby blocks, which are kind of/sort of lambda functions, can be arbitrarily big. Because of this, Ruby code is typically written in a more functional style than Python code. For example, to loop over a list in Ruby, you typically do
```
collection.each do |value|
  ...
end
```
The block works very much like a function being passed to collection.each. If you were to do the same thing in Python, you’d have to define a named inner function and then pass that to the collection each method (if list supported this method):
```
def some_operation(value):
  ...

collection.each(some_operation)
```
That doesn’t flow very nicely. So, typically the following non-functional approach would be used in Python:
```
for value in collection:
  ...
```
Using resources in a safe way is quite different between the two languages. Here, the problem is that you want to allocate some resource (open a file, obtain a database cursor, etc), perform some arbitrary operation on it, and then close it in a safe manner even if an exception occurs.

In Ruby, because blocks are so easy to use (see #9), you would typically code this pattern as a method that takes a block for the arbitrary operation to perform on the resource.

In Python, passing in a function for the arbitrary action is a little clunkier since you have to write a named, inner function (see #9). Instead, Python uses a with statement for safe resource handling. See How do I correctly clean up a Python object? for more details.

Question 3

I’ve just spent a couple of months learning Python after 6 years of Ruby. There really was no great comparison out there for the two languages, so I decided to man up and write one myself. Now, it is mainly concerned with functional programming, but since you mention Ruby’s inject method, I’m guessing we’re on the same wavelength.

I hope this helps: The ‘ugliness’ of Python

A couple of points that will get you moving in the right direction:

All the functional programming goodness you use in Ruby is in Python, and it’s even easier. For example, you can map over functions exactly as you’d expect:
```
def f(x):
    return x + 1

map(f, [1, 2, 3]) # => [2, 3, 4]
```
Python doesn’t have a method that acts like each. Since you only use each for side effects, the equivalent in Python is the for loop:
```
for n in [1, 2, 3]:
    print n
```
List comprehensions are great when a) you have to deal with functions and object collections together and b) when you need to iterate using multiple indexes. For example, to find all the palindromes in a string (assuming you have a function p() that returns true for palindromes), all you need is a single list comprehension:
```
s = 'string-with-palindromes-like-abbalabba'
l = len(s)
[s[x:y] for x in range(l) for y in range(x,l+1) if p(s[x:y])]
```

Question 4

My suggestion: Don’t try to learn the differences. Learn how to approach the problem in Python. Just like there’s a Ruby approach to each problem (that works very well givin the limitations and strengths of the language), there’s a Python approach to the problem. they are both different. To get the best out of each language, you really should learn the language itself, and not just the “translation” from one to the other.

Now, with that said, the difference will help you adapt faster and make 1 off modifications to a Python program. And that’s fine for a start to get writing. But try to learn from other projects the why behind the architecture and design decisions rather than the how behind the semantics of the language…

Question 5

I know little Ruby, but here are a few bullet points about the things you mentioned:

nil, the value indicating lack of a value, would be None (note that you check for it like x is None or x is not None, not with == – or by coercion to boolean, see next point).
None, zero-esque numbers (0, 0.0, 0j (complex number)) and empty collections ([], {}, set(), the empty string "", etc.) are considered falsy, everything else is considered truthy.
For side effects, (for-)loop explicitly. For generating a new bunch of stuff without side-effects, use list comprehensions (or their relatives – generator expressions for lazy one-time iterators, dict/set comprehensions for the said collections).

Concerning looping: You have for, which operates on an iterable(! no counting), and while, which does what you would expect. The fromer is far more powerful, thanks to the extensive support for iterators. Not only nearly everything that can be an iterator instead of a list is an iterator (at least in Python 3 – in Python 2, you have both and the default is a list, sadly). The are numerous tools for working with iterators – zip iterates any number of iterables in parallel, enumerate gives you (index, item) (on any iterable, not just on lists), even slicing abritary (possibly large or infinite) iterables! I found that these make many many looping tasks much simpler. Needless to say, they integrate just fine with list comprehensions, generator expressions, etc.

Question 6

In Ruby, instance variables and methods are completely unrelated, except when you explicitly relate them with attr_accessor or something like that.

In Python, methods are just a special class of attribute: one that is executable.

So for example:

>>> class foo:
...     x = 5
...     def y(): pass
... 
>>> f = foo()
>>> type(f.x)
<type 'int'>
>>> type(f.y)
<type 'instancemethod'>

That difference has a lot of implications, like for example that referring to f.x refers to the method object, rather than calling it. Also, as you can see, f.x is public by default, whereas in Ruby, instance variables are private by default.

Question 7

Why do you have to call items() to iterate over key, value pairs in a dictionary? ie.

dic = {'one': '1', 'two': '2'}
for k, v in dic.items():
    print(k, v)

Why isn’t that the default behavior of iterating over a dictionary

for k, v in dic:
    print(k, v)

Question 8

For every python container C, the expectation is that

for item in C:
    assert item in C

will pass just fine — wouldn’t you find it astonishing if one sense of in (the loop clause) had a completely different meaning from the other (the presence check)? I sure would! It naturally works that way for lists, sets, tuples, …

So, when C is a dictionary, if in were to yield key/value tuples in a for loop, then, by the principle of least astonishment, in would also have to take such a tuple as its left-hand operand in the containment check.

How useful would that be? Pretty useless indeed, basically making if (key, value) in C a synonym for if C.get(key) == value — which is a check I believe I may have performed, or wanted to perform, 100 times more rarely than what if k in C actually means, checking the presence of the key only and completely ignoring the value.

On the other hand, wanting to loop just on keys is quite common, e.g.:

for k in thedict:
    thedict[k] += 1

having the value as well would not help particularly:

for k, v in thedict.items():
    thedict[k] = v + 1

actually somewhat less clear and less concise. (Note that items was the original spelling of the “proper” methods to use to get key/value pairs: unfortunately that was back in the days when such accessors returned whole lists, so to support “just iterating” an alternative spelling had to be introduced, and iteritems it was — in Python 3, where backwards compatibility constraints with previous Python versions were much weakened, it became items again).

Question 9

My guess: Using the full tuple would be more intuitive for looping, but perhaps less so for testing for membership using in.

if key in counts:
    counts[key] += 1
else:
    counts[key] = 1

That code wouldn’t really work if you had to specify both key and value for in. I am having a hard time imagining use case where you’d check if both the key AND value are in the dictionary. It is far more natural to only test the keys.

# When would you ever write a condition like this?
if (key, value) in dict:

Now it’s not necessary that the in operator and for ... in operate over the same items. Implementation-wise they are different operations (__contains__ vs. __iter__). But that little inconsistency would be somewhat confusing and, well, inconsistent.

Question 10

Actually, I’ve done some work with Pyro and RPyC, but there is more RPC implementation than these two. Can we make a list of them?

Native Python-based protocols:

RPC frameworks with a lot of underlying protocols:

Spyne (see lightning talk)

JSON-RPC based frameworks:

SOAP:

XML-RPC based frameworks:

XMLRPC, using the xmlrpclib and SimpleXMLRPCServer modules in the standard library.

Others:

Question 11

XML-RPC is part of the Python standard library:

Python 2: xmlrpclib and SimpleXMLRPCServer
Python 3: xmlrpc (both client and server)

Question 12

Apache Thrift is a cross-language RPC option developed at Facebook. Works over sockets, function signatures are defined in text files in a language-independent way.

Question 13

Since I’ve asked this question, I’ve started using python-symmetric-jsonrpc. It is quite good, can be used between python and non-python software and follow the JSON-RPC standard. But it lacks some examples.

Question 14

You could try Ladon. It serves up multiple web server protocols at once so you can offer more flexibility at the client side.

http://pypi.python.org/pypi/ladon

Question 15

There are some attempts at making SOAP work with python, but I haven’t tested it much so I can’t say if it is good or not.

SOAPy is one example.

Question 16

We are developing Versile Python (VPy), an implementation for python 2.6+ and 3.x of a new ORB/RPC framework. Functional AGPL dev releases for review and testing are available. VPy has native python capabilities similar to PyRo and RPyC via a general native objects layer (code example). The product is designed for platform-independent remote object interaction for implementations of Versile Platform.

Full disclosure: I work for the company developing VPy.

Question 17

maybe ZSI which implements SOAP. I used the stub generator and It worked properly. The only problem I encountered is about doing SOAP throught HTTPS.

Question 18

You missed out omniORB. This is a pretty full CORBA implementation, so you can also use it to talk to other languages that have CORBA support.

Question 19

I would like to create a string buffer to do lots of processing, format and finally write the buffer in a text file using a C-style sprintf functionality in Python. Because of conditional statements, I can’t write them directly to the file.

e.g pseudo code:

sprintf(buf,"A = %d\n , B= %s\n",A,B)
/* some processing */
sprint(buf,"C=%d\n",c)
....
...
fprintf(file,buf)

So in the output file we have this kind of o/p:

A= foo B= bar
C= ded
etc...

Edit, to clarify my question:
buf is a big buffer contains all these strings which have formatted using sprintf. Going by your examples, buf will only contain current values, not older ones. e.g first in buf I wrote A= something ,B= something later C= something was appended in the same buf, but in your Python answers buf contains only last value, which is not I want – I want buf to have all the printfs I have done since the beginning, like in C.

Question 20

Python has a % operator for this.

>>> a = 5
>>> b = "hello"
>>> buf = "A = %d\n , B = %s\n" % (a, b)
>>> print buf
A = 5
 , B = hello

>>> c = 10
>>> buf = "C = %d\n" % c
>>> print buf
C = 10

See this reference for all supported format specifiers.

You could as well use format:

>>> print "This is the {}th tome of {}".format(5, "knowledge")
This is the 5th tome of knowledge

Question 21

If I understand your question correctly, format() is what you are looking for, along with its mini-language.

Silly example for python 2.7 and up:

>>> print "{} ...\r\n {}!".format("Hello", "world")
Hello ...
 world!

For earlier python versions: (tested with 2.6.2)

>>> print "{0} ...\r\n {1}!".format("Hello", "world")
Hello ...
 world!

Question 22

I’m not completely certain that I understand your goal, but you can use a StringIO instance as a buffer:

>>> import StringIO 
>>> buf = StringIO.StringIO()
>>> buf.write("A = %d, B = %s\n" % (3, "bar"))
>>> buf.write("C=%d\n" % 5)
>>> print(buf.getvalue())
A = 3, B = bar
C=5

Unlike sprintf, you just pass a string to buf.write, formatting it with the % operator or the format method of strings.

You could of course define a function to get the sprintf interface you’re hoping for:

def sprintf(buf, fmt, *args):
    buf.write(fmt % args)

which would be used like this:

>>> buf = StringIO.StringIO()
>>> sprintf(buf, "A = %d, B = %s\n", 3, "foo")
>>> sprintf(buf, "C = %d\n", 5)
>>> print(buf.getvalue())
A = 3, B = foo
C = 5

Question 23

Use the formatting operator %:

buf = "A = %d\n , B= %s\n" % (a, b)
print >>f, buf

Question 24

You can use string formatting:

>>> a=42
>>> b="bar"
>>> "The number is %d and the word is %s" % (a,b)
'The number is 42 and the word is bar'

But this is removed in Python 3, you should use “str.format()”:

>>> a=42
>>> b="bar"
>>> "The number is {0} and the word is {1}".format(a,b)
'The number is 42 and the word is bar'

Question 25

To insert into a very long string it is nice to use names for the different arguments, instead of hoping they are in the right positions. This also makes it easier to replace multiple recurrences.

>>> 'Coordinates: {latitude}, {longitude}'.format(latitude='37.24N', longitude='-115.81W')
'Coordinates: 37.24N, -115.81W'

Taken from Format examples, where all the other Format-related answers are also shown.

Question 26

This is probably the closest translation from your C code to Python code.

A = 1
B = "hello"
buf = "A = %d\n , B= %s\n" % (A, B)

c = 2
buf += "C=%d\n" % c

f = open('output.txt', 'w')
print >> f, c
f.close()

The % operator in Python does almost exactly the same thing as C’s sprintf. You can also print the string to a file directly. If there are lots of these string formatted stringlets involved, it might be wise to use a StringIO object to speed up processing time.

So instead of doing +=, do this:

import cStringIO
buf = cStringIO.StringIO()

...

print >> buf, "A = %d\n , B= %s\n" % (A, B)

...

print >> buf, "C=%d\n" % c

...

print >> f, buf.getvalue()

Question 27

If you want something like the python3 print function but to a string:

def sprint(*args, **kwargs):
    sio = io.StringIO()
    print(*args, **kwargs, file=sio)
    return sio.getvalue()

>>> x = sprint('abc', 10, ['one', 'two'], {'a': 1, 'b': 2}, {1, 2, 3})
>>> x
"abc 10 ['one', 'two'] {'a': 1, 'b': 2} {1, 2, 3}\n"

or without the '\n' at the end:

def sprint(*args, end='', **kwargs):
    sio = io.StringIO()
    print(*args, **kwargs, end=end, file=sio)
    return sio.getvalue()

>>> x = sprint('abc', 10, ['one', 'two'], {'a': 1, 'b': 2}, {1, 2, 3})
>>> x
"abc 10 ['one', 'two'] {'a': 1, 'b': 2} {1, 2, 3}"

Question 28

Something like…

greetings = 'Hello {name}'.format(name = 'John')

Hello John

Question 29

Take a look at “Literal String Interpolation” https://www.python.org/dev/peps/pep-0498/

I found it through the http://www.malemburg.com/

Question 30

Two approaches are to write to a string buffer or to write lines to a list and join them later. I think the StringIO approach is more pythonic, but didn’t work before Python 2.6.

from io import StringIO

with StringIO() as s:
   print("Hello", file=s)
   print("Goodbye", file=s)
   # And later...
   with open('myfile', 'w') as f:
       f.write(s.getvalue())

You can also use these without a ContextMananger (s = StringIO()). Currently, I’m using a context manager class with a print function. This fragment might be useful to be able to insert debugging or odd paging requirements:

class Report:
    ... usual init/enter/exit
    def print(self, *args, **kwargs):
        with StringIO() as s:
            print(*args, **kwargs, file=s)
            out = s.getvalue()
        ... stuff with out

with Report() as r:
   r.print(f"This is {datetime.date.today()}!", 'Yikes!', end=':')

Question 31

I have manipulated some data using pandas and now I want to carry out a batch save back to the database. This requires me to convert the dataframe into an array of tuples, with each tuple corresponding to a “row” of the dataframe.

My DataFrame looks something like:

In [182]: data_set
Out[182]: 
  index data_date   data_1  data_2
0  14303 2012-02-17  24.75   25.03 
1  12009 2012-02-16  25.00   25.07 
2  11830 2012-02-15  24.99   25.15 
3  6274  2012-02-14  24.68   25.05 
4  2302  2012-02-13  24.62   24.77 
5  14085 2012-02-10  24.38   24.61

I want to convert it to an array of tuples like:

[(datetime.date(2012,2,17),24.75,25.03),
(datetime.date(2012,2,16),25.00,25.07),
...etc. ]

Any suggestion on how I can efficiently do this?

Question 32

How about:

subset = data_set[['data_date', 'data_1', 'data_2']]
tuples = [tuple(x) for x in subset.to_numpy()]

for pandas < 0.24 use

tuples = [tuple(x) for x in subset.values]

Question 33

list(data_set.itertuples(index=False))

As of 17.1, the above will return a list of namedtuples.

If you want a list of ordinary tuples, pass name=None as an argument:

list(data_set.itertuples(index=False, name=None))

Question 34

A generic way:

[tuple(x) for x in data_set.to_records(index=False)]

Question 35

Motivation
Many data sets are large enough that we need to concern ourselves with speed/efficiency. So I offer this solution in that spirit. It happens to also be succinct.

For the sake of comparison, let’s drop the index column

df = data_set.drop('index', 1)

Solution
I’ll propose the use of zip and map

list(zip(*map(df.get, df)))

[('2012-02-17', 24.75, 25.03),
 ('2012-02-16', 25.0, 25.07),
 ('2012-02-15', 24.99, 25.15),
 ('2012-02-14', 24.68, 25.05),
 ('2012-02-13', 24.62, 24.77),
 ('2012-02-10', 24.38, 24.61)]

It happens to also be flexible if we wanted to deal with a specific subset of columns. We’ll assume the columns we’ve already displayed are the subset we want.

list(zip(*map(df.get, ['data_date', 'data_1', 'data_2'])))

[('2012-02-17', 24.75, 25.03),
 ('2012-02-16', 25.0, 25.07),
 ('2012-02-15', 24.99, 25.15),
 ('2012-02-14', 24.68, 25.05),
 ('2012-02-13', 24.62, 24.77),
 ('2012-02-10', 24.38, 24.61)]

What is Quicker?

Turn’s out records is quickest followed by asymptotically converging zipmap and iter_tuples

I’ll use a library simple_benchmarks that I got from this post

from simple_benchmark import BenchmarkBuilder
b = BenchmarkBuilder()

import pandas as pd
import numpy as np

def tuple_comp(df): return [tuple(x) for x in df.to_numpy()]
def iter_namedtuples(df): return list(df.itertuples(index=False))
def iter_tuples(df): return list(df.itertuples(index=False, name=None))
def records(df): return df.to_records(index=False).tolist()
def zipmap(df): return list(zip(*map(df.get, df)))

funcs = [tuple_comp, iter_namedtuples, iter_tuples, records, zipmap]
for func in funcs:
    b.add_function()(func)

def creator(n):
    return pd.DataFrame({"A": random.randint(n, size=n), "B": random.randint(n, size=n)})

@b.add_arguments('Rows in DataFrame')
def argument_provider():
    for n in (10 ** (np.arange(4, 11) / 2)).astype(int):
        yield n, creator(n)

r = b.run()

Check the results

r.to_pandas_dataframe().pipe(lambda d: d.div(d.min(1), 0))

        tuple_comp  iter_namedtuples  iter_tuples   records    zipmap
100       2.905662          6.626308     3.450741  1.469471  1.000000
316       4.612692          4.814433     2.375874  1.096352  1.000000
1000      6.513121          4.106426     1.958293  1.000000  1.316303
3162      8.446138          4.082161     1.808339  1.000000  1.533605
10000     8.424483          3.621461     1.651831  1.000000  1.558592
31622     7.813803          3.386592     1.586483  1.000000  1.515478
100000    7.050572          3.162426     1.499977  1.000000  1.480131

r.plot()

Question 36

Here’s a vectorized approach (assuming the dataframe, data_set to be defined as df instead) that returns a list of tuples as shown:

>>> df.set_index(['data_date'])[['data_1', 'data_2']].to_records().tolist()

produces:

[(datetime.datetime(2012, 2, 17, 0, 0), 24.75, 25.03),
 (datetime.datetime(2012, 2, 16, 0, 0), 25.0, 25.07),
 (datetime.datetime(2012, 2, 15, 0, 0), 24.99, 25.15),
 (datetime.datetime(2012, 2, 14, 0, 0), 24.68, 25.05),
 (datetime.datetime(2012, 2, 13, 0, 0), 24.62, 24.77),
 (datetime.datetime(2012, 2, 10, 0, 0), 24.38, 24.61)]

The idea of setting datetime column as the index axis is to aid in the conversion of the Timestamp value to it’s corresponding datetime.datetime format equivalent by making use of the convert_datetime64 argument in DF.to_records which does so for a DateTimeIndex dataframe.

This returns a recarray which could be then made to return a list using .tolist

More generalized solution depending on the use case would be:

df.to_records().tolist()                              # Supply index=False to exclude index

Question 37

The most efficient and easy way:

list(data_set.to_records())

You can filter the columns you need before this call.

Question 38

This answer doesn’t add any answers that aren’t already discussed, but here are some speed results. I think this should resolve questions that came up in the comments. All of these look like they are O(n), based on these three values.

TL;DR: tuples = list(df.itertuples(index=False, name=None)) and tuples = list(zip(*[df[c].values.tolist() for c in df])) are tied for the fastest.

I did a quick speed test on results for three suggestions here:

The zip answer from @pirsquared: tuples = list(zip(*[df[c].values.tolist() for c in df]))
The accepted answer from @wes-mckinney: tuples = [tuple(x) for x in df.values]
The itertuples answer from @ksindi with the name=None suggestion from @Axel: tuples = list(df.itertuples(index=False, name=None))

from numpy import random
import pandas as pd


def create_random_df(n):
    return pd.DataFrame({"A": random.randint(n, size=n), "B": random.randint(n, size=n)})

Small size:

df = create_random_df(10000)
%timeit tuples = list(zip(*[df[c].values.tolist() for c in df]))
%timeit tuples = [tuple(x) for x in df.values]
%timeit tuples = list(df.itertuples(index=False, name=None))

Gives:

1.66 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
15.5 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.74 ms ± 75.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Larger:

df = create_random_df(1000000)
%timeit tuples = list(zip(*[df[c].values.tolist() for c in df]))
%timeit tuples = [tuple(x) for x in df.values]
%timeit tuples = list(df.itertuples(index=False, name=None))

Gives:

202 ms ± 5.91 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
1.52 s ± 98.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
209 ms ± 11.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

As much patience as I have:

df = create_random_df(10000000)
%timeit tuples = list(zip(*[df[c].values.tolist() for c in df]))
%timeit tuples = [tuple(x) for x in df.values]
%timeit tuples = list(df.itertuples(index=False, name=None))

Gives:

1.78 s ± 118 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
15.4 s ± 222 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.68 s ± 96.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The zip version and the itertuples version are within the confidence intervals each other. I suspect that they are doing the same thing under the hood.

These speed tests are probably irrelevant though. Pushing the limits of my computer’s memory doesn’t take a huge amount of time, and you really shouldn’t be doing this on a large data set. Working with those tuples after doing this will end up being really inefficient. It’s unlikely to be a major bottleneck in your code, so just stick with the version you think is most readable.

Question 39

#try this one:

tuples = list(zip(data_set["data_date"], data_set["data_1"],data_set["data_2"]))
print (tuples)

Question 40

Changing the data frames list into a list of tuples.

df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
print(df)
OUTPUT
   col1  col2
0     1     4
1     2     5
2     3     6

records = df.to_records(index=False)
result = list(records)
print(result)
OUTPUT
[(1, 4), (2, 5), (3, 6)]

Question 41

More pythonic way:

df = data_set[['data_date', 'data_1', 'data_2']]
map(tuple,df.values)

Question 42

I’m trying to do a “hello world” with new boto3 client for AWS.

The use-case I have is fairly simple: get object from S3 and save it to the file.

In boto 2.X I would do it like this:

import boto
key = boto.connect_s3().get_bucket('foo').get_key('foo')
key.get_contents_to_filename('/tmp/foo')

In boto 3 . I can’t find a clean way to do the same thing, so I’m manually iterating over the “Streaming” object:

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    chunk = key['Body'].read(1024*8)
    while chunk:
        f.write(chunk)
        chunk = key['Body'].read(1024*8)

or

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    for chunk in iter(lambda: key['Body'].read(4096), b''):
        f.write(chunk)

And it works fine. I was wondering is there any “native” boto3 function that will do the same task?

Question 43

There is a customization that went into Boto3 recently which helps with this (among other things). It is currently exposed on the low-level S3 client, and can be used like this:

s3_client = boto3.client('s3')
open('hello.txt').write('Hello, world!')

# Upload the file to S3
s3_client.upload_file('hello.txt', 'MyBucket', 'hello-remote.txt')

# Download the file from S3
s3_client.download_file('MyBucket', 'hello-remote.txt', 'hello2.txt')
print(open('hello2.txt').read())

These functions will automatically handle reading/writing files as well as doing multipart uploads in parallel for large files.

Note that s3_client.download_file won’t create a directory. It can be created as pathlib.Path('/path/to/file.txt').parent.mkdir(parents=True, exist_ok=True).

Question 44

boto3 now has a nicer interface than the client:

resource = boto3.resource('s3')
my_bucket = resource.Bucket('MyBucket')
my_bucket.download_file(key, local_filename)

This by itself isn’t tremendously better than the client in the accepted answer (although the docs say that it does a better job retrying uploads and downloads on failure) but considering that resources are generally more ergonomic (for example, the s3 bucket and object resources are nicer than the client methods) this does allow you to stay at the resource layer without having to drop down.

Resources generally can be created in the same way as clients, and they take all or most of the same arguments and just forward them to their internal clients.

Question 45

For those of you who would like to simulate the set_contents_from_string like boto2 methods, you can try

import boto3
from cStringIO import StringIO

s3c = boto3.client('s3')
contents = 'My string to save to S3 object'
target_bucket = 'hello-world.by.vor'
target_file = 'data/hello.txt'
fake_handle = StringIO(contents)

# notice if you do fake_handle.read() it reads like a file handle
s3c.put_object(Bucket=target_bucket, Key=target_file, Body=fake_handle.read())

For Python3:

In python3 both StringIO and cStringIO are gone. Use the StringIO import like:

from io import StringIO

To support both version:

try:
   from StringIO import StringIO
except ImportError:
   from io import StringIO

Question 46

# Preface: File is json with contents: {'name': 'Android', 'status': 'ERROR'}

import boto3
import io

s3 = boto3.resource('s3')

obj = s3.Object('my-bucket', 'key-to-file.json')
data = io.BytesIO()
obj.download_fileobj(data)

# object is now a bytes string, Converting it to a dict:
new_dict = json.loads(data.getvalue().decode("utf-8"))

print(new_dict['status']) 
# Should print "Error"

Question 47

When you want to read a file with a different configuration than the default one, feel free to use either mpu.aws.s3_download(s3path, destination) directly or the copy-pasted code:

def s3_download(source, destination,
                exists_strategy='raise',
                profile_name=None):
    """
    Copy a file from an S3 source to a local destination.

    Parameters
    ----------
    source : str
        Path starting with s3://, e.g. 's3://bucket-name/key/foo.bar'
    destination : str
    exists_strategy : {'raise', 'replace', 'abort'}
        What is done when the destination already exists?
    profile_name : str, optional
        AWS profile

    Raises
    ------
    botocore.exceptions.NoCredentialsError
        Botocore is not able to find your credentials. Either specify
        profile_name or add the environment variables AWS_ACCESS_KEY_ID,
        AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN.
        See https://boto3.readthedocs.io/en/latest/guide/configuration.html
    """
    exists_strategies = ['raise', 'replace', 'abort']
    if exists_strategy not in exists_strategies:
        raise ValueError('exists_strategy \'{}\' is not in {}'
                         .format(exists_strategy, exists_strategies))
    session = boto3.Session(profile_name=profile_name)
    s3 = session.resource('s3')
    bucket_name, key = _s3_path_split(source)
    if os.path.isfile(destination):
        if exists_strategy is 'raise':
            raise RuntimeError('File \'{}\' already exists.'
                               .format(destination))
        elif exists_strategy is 'abort':
            return
    s3.Bucket(bucket_name).download_file(key, destination)

from collections import namedtuple

S3Path = namedtuple("S3Path", ["bucket_name", "key"])


def _s3_path_split(s3_path):
    """
    Split an S3 path into bucket and key.

    Parameters
    ----------
    s3_path : str

    Returns
    -------
    splitted : (str, str)
        (bucket, key)

    Examples
    --------
    >>> _s3_path_split('s3://my-bucket/foo/bar.jpg')
    S3Path(bucket_name='my-bucket', key='foo/bar.jpg')
    """
    if not s3_path.startswith("s3://"):
        raise ValueError(
            "s3_path is expected to start with 's3://', " "but was {}"
            .format(s3_path)
        )
    bucket_key = s3_path[len("s3://"):]
    bucket_name, key = bucket_key.split("/", 1)
    return S3Path(bucket_name, key)

Question 48

Note: I’m assuming you have configured authentication separately. Below code is to download the single object from the S3 bucket.

import boto3

#initiate s3 client 
s3 = boto3.resource('s3')

#Download object to the file    
s3.Bucket('mybucket').download_file('hello.txt', '/tmp/hello.txt')

Question 49

I’m using the Mock library to test my application, but I want to assert that some function was not called. Mock docs talk about methods like mock.assert_called_with and mock.assert_called_once_with, but I didn’t find anything like mock.assert_not_called or something related to verify mock was NOT called.

I could go with something like the following, though it doesn’t seem cool nor pythonic:

def test_something:
    # some actions
    with patch('something') as my_var:
        try:
            # args are not important. func should never be called in this test
            my_var.assert_called_with(some, args)
        except AssertionError:
            pass  # this error being raised means it's ok
    # other stuff

Any ideas how to accomplish this?

Question 50

This should work for your case;

assert not my_var.called, 'method should not have been called'

Sample;

>>> mock=Mock()
>>> mock.a()
<Mock name='mock.a()' id='4349129872'>
>>> assert not mock.b.called, 'b was called and should not have been'
>>> assert not mock.a.called, 'a was called and should not have been'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError: a was called and should not have been

Question 51

Though an old question, I would like to add that currently mock library (backport of unittest.mock) supports assert_not_called method.

Just upgrade yours;

pip install mock --upgrade

Question 52

You can check the called attribute, but if your assertion fails, the next thing you’ll want to know is something about the unexpected call, so you may as well arrange for that information to be displayed from the start. Using unittest, you can check the contents of call_args_list instead:

self.assertItemsEqual(my_var.call_args_list, [])

When it fails, it gives a message like this:

AssertionError: Element counts were not equal:
First has 0, Second has 1:  call('first argument', 4)

Question 53

When you test using class inherits unittest.TestCase you can simply use methods like:

assertTrue
assertFalse
assertEqual

and similar (in python documentation you find the rest).

In your example we can simply assert if mock_method.called property is False, which means that method was not called.

import unittest
from unittest import mock

import my_module

class A(unittest.TestCase):
    def setUp(self):
        self.message = "Method should not be called. Called {times} times!"

    @mock.patch("my_module.method_to_mock")
    def test(self, mock_method):
        my_module.method_to_mock()

        self.assertFalse(mock_method.called,
                         self.message.format(times=mock_method.call_count))

Question 54

With python >= 3.5 you can use mock_object.assert_not_called().

Question 55

Judging from other answers, no one except @rob-kennedy talked about the call_args_list.

It’s a powerful tool for that you can implement the exact contrary of MagicMock.assert_called_with()

call_args_list is a list of call objects. Each call object represents a call made on a mocked callable.

>>> from unittest.mock import MagicMock
>>> m = MagicMock()
>>> m.call_args_list
[]
>>> m(42)
<MagicMock name='mock()' id='139675158423872'>
>>> m.call_args_list
[call(42)]
>>> m(42, 30)
<MagicMock name='mock()' id='139675158423872'>
>>> m.call_args_list
[call(42), call(42, 30)]

Consuming a call object is easy, since you can compare it with a tuple of length 2 where the first component is a tuple containing all the positional arguments of the related call, while the second component is a dictionary of the keyword arguments.

>>> ((42,),) in m.call_args_list
True
>>> m(42, foo='bar')
<MagicMock name='mock()' id='139675158423872'>
>>> ((42,), {'foo': 'bar'}) in m.call_args_list
True
>>> m(foo='bar')
<MagicMock name='mock()' id='139675158423872'>
>>> ((), {'foo': 'bar'}) in m.call_args_list
True

So, a way to address the specific problem of the OP is

def test_something():
    with patch('something') as my_var:
        assert ((some, args),) not in my_var.call_args_list

Note that this way, instead of just checking if a mocked callable has been called, via MagicMock.called, you can now check if it has been called with a specific set of arguments.

That’s useful. Say you want to test a function that takes a list and call another function, compute(), for each of the value of the list only if they satisfy a specific condition.

You can now mock compute, and test if it has been called on some value but not on others.

Question 56

I sometimes need to iterate a list in Python looking at the “current” element and the “next” element. I have, till now, done so with code like:

for current, next in zip(the_list, the_list[1:]):
    # Do something

This works and does what I expect, but is there’s a more idiomatic or efficient way to do the same thing?

Question 57

Here’s a relevant example from the itertools module docs:

import itertools
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return zip(a, b)

For Python 2, you need itertools.izip instead of zip:

import itertools
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.izip(a, b)

How this works:

First, two parallel iterators, a and b are created (the tee() call), both pointing to the first element of the original iterable. The second iterator, b is moved 1 step forward (the next(b, None)) call). At this point a points to s0 and b points to s1. Both a and b can traverse the original iterator independently – the izip function takes the two iterators and makes pairs of the returned elements, advancing both iterators at the same pace.

One caveat: the tee() function produces two iterators that can advance independently of each other, but it comes at a cost. If one of the iterators advances further than the other, then tee() needs to keep the consumed elements in memory until the second iterator comsumes them too (it cannot ‘rewind’ the original iterator). Here it doesn’t matter because one iterator is only 1 step ahead of the other, but in general it’s easy to use a lot of memory this way.

And since tee() can take an n parameter, this can also be used for more than two parallel iterators:

def threes(iterator):
    "s -> (s0,s1,s2), (s1,s2,s3), (s2, s3,4), ..."
    a, b, c = itertools.tee(iterator, 3)
    next(b, None)
    next(c, None)
    next(c, None)
    return zip(a, b, c)

Question 58

Roll your own!

def pairwise(iterable):
    it = iter(iterable)
    a = next(it, None)

    for b in it:
        yield (a, b)
        a = b

Question 59

Since the_list[1:] actually creates a copy of the whole list (excluding its first element), and zip() creates a list of tuples immediately when called, in total three copies of your list are created. If your list is very large, you might prefer

from itertools import izip, islice
for current_item, next_item in izip(the_list, islice(the_list, 1, None)):
    print(current_item, next_item)

which does not copy the list at all.

Question 60

I’m just putting this out, I’m very surprised no one has thought of enumerate().

for (index, thing) in enumerate(the_list):
    if index < len(the_list):
        current, next_ = thing, the_list[index + 1]
        #do something

Question 61

Iterating by index can do the same thing:

#!/usr/bin/python
the_list = [1, 2, 3, 4]
for i in xrange(len(the_list) - 1):
    current_item, next_item = the_list[i], the_list[i + 1]
    print(current_item, next_item)

Output:

(1, 2)
(2, 3)
(3, 4)

Question 62

This is now a simple Import As of 16th May 2020

from more_itertools import pairwise
for current, next in pairwise(your_iterable):
  print(f'Current = {current}, next = {nxt}')

Docs for more-itertools Under the hood this code is the same as that in the other answers, but I much prefer imports when available.

If you don’t already have it installed then: pip install more-itertools

Example

For instance if you had the fibbonnacci sequence, you could calculate the ratios of subsequent pairs as:

from more_itertools import pairwise
fib= [1,1,2,3,5,8,13]
for current, nxt in pairwise(fib):
    ratio=current/nxt
    print(f'Curent = {current}, next = {nxt}, ratio = {ratio} ')

Question 63

Pairs from a list using a list comprehension

the_list = [1, 2, 3, 4]
pairs = [[the_list[i], the_list[i + 1]] for i in range(len(the_list) - 1)]
for [current_item, next_item] in pairs:
    print(current_item, next_item)

Output:

(1, 2)
(2, 3)
(3, 4)

Question 64

I am really surprised nobody has mentioned the shorter, simpler and most importantly general solution:

Python 3:

from itertools import islice

def n_wise(iterable, n):
    return zip(*(islice(iterable, i, None) for i in range(n)))

Python 2:

from itertools import izip, islice

def n_wise(iterable, n):
    return izip(*(islice(iterable, i, None) for i in xrange(n)))

It works for pairwise iteration by passing n=2, but can handle any higher number:

>>> for a, b in n_wise('Hello!', 2):
>>>     print(a, b)
H e
e l
l l
l o
o !

>>> for a, b, c, d in n_wise('Hello World!', 4):
>>>     print(a, b, c, d)
H e l l
e l l o
l l o
l o   W
o   W o
  W o r
W o r l
o r l d
r l d !

问题：从Ruby学习Python；异同

回答 0

回答 1

回答 2

回答 3

回答 4

问题：在Python中遍历字典时，为什么必须调用.items（）？

回答 0

回答 1

问题：在Python中执行RPC的当前选择是什么？[关闭]

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：类似于sprintf的Python功能

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题：熊猫将数据框转换为元组数组

回答 0

回答 1

回答 2

回答 3

什么是更快？

What is Quicker?

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

问题：如何使用boto3将S3对象保存到文件

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：断言未使用Mock调用函数/方法

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：在Python中将列表成对迭代（当前，下一个）

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

问题：Python将多个变量分配给相同的值？列出行为

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：Matplotlib透明线图

回答 0