I know Ruby very well. I believe that I may need to learn Python presently. For those who know both, what concepts are similar between the two, and what are different?
I’m looking for a list similar to a primer I wrote for Learning Lua for JavaScripters: simple things like whitespace significance and looping constructs; the name of nil in Python, and what values are considered “truthy”; is it idiomatic to use the equivalent of map and each, or are mumblesomethingaboutlistcomprehensionsmumble the norm?
If I get a good variety of answers I’m happy to aggregate them into a community wiki. Or else you all can fight and crib from each other to try to create the one true comprehensive list.
Edit: To be clear, my goal is “proper” and idiomatic Python. If there is a Python equivalent of inject, but nobody uses it because there is a better/different way to achieve the common functionality of iterating a list and accumulating a result along the way, I want to know how you do things. Perhaps I’ll update this question with a list of common goals, how you achieve them in Ruby, and ask what the equivalent is in Python.
Python has functions; Ruby does not. In Python, you can take any function or method and pass it to another function. In Ruby, everything is a method, and methods can’t be directly passed. Instead, you have to wrap them in Proc’s to pass them.
Ruby and Python both support closures, but in different ways. In Python, you can define a function inside another function. The inner function has read access to variables from the outer function, but not write access. In Ruby, you define closures using blocks. The closures have full read and write access to variables from the outer scope.
Python has list comprehensions, which are pretty expressive. For example, if you have a list of numbers, you can write
[x*x for x in values if x > 15]
to get a new list of the squares of all values greater than 15. In Ruby, you’d have to write the following:
values.select {|v| v > 15}.map {|v| v * v}
The Ruby code doesn’t feel as compact. It’s also not as efficient since it first converts the values array into a shorter intermediate array containing the values greater than 15. Then, it takes the intermediate array and generates a final array containing the squares of the intermediates. The intermediate array is then thrown out. So, Ruby ends up with 3 arrays in memory during the computation; Python only needs the input list and the resulting list.
Python also supplies similar map comprehensions.
Python supports tuples; Ruby doesn’t. In Ruby, you have to use arrays to simulate tuples.
Ruby supports switch/case statements; Python does not.
Ruby supports the standard expr ? val1 : val2 ternary operator; Python does not.
Ruby supports only single inheritance. If you need to mimic multiple inheritance, you can define modules and use mix-ins to pull the module methods into classes. Python supports multiple inheritance rather than module mix-ins.
Python supports only single-line lambda functions. Ruby blocks, which are kind of/sort of lambda functions, can be arbitrarily big. Because of this, Ruby code is typically written in a more functional style than Python code. For example, to loop over a list in Ruby, you typically do
collection.each do |value|
...
end
The block works very much like a function being passed to collection.each. If you were to do the same thing in Python, you’d have to define a named inner function and then pass that to the collection each method (if list supported this method):
That doesn’t flow very nicely. So, typically the following non-functional approach would be used in Python:
for value in collection:
...
Using resources in a safe way is quite different between the two languages. Here, the problem is that you want to allocate some resource (open a file, obtain a database cursor, etc), perform some arbitrary operation on it, and then close it in a safe manner even if an exception occurs.
In Ruby, because blocks are so easy to use (see #9), you would typically code this pattern as a method that takes a block for the arbitrary operation to perform on the resource.
In Python, passing in a function for the arbitrary action is a little clunkier since you have to write a named, inner function (see #9). Instead, Python uses a with statement for safe resource handling. See How do I correctly clean up a Python object? for more details.
I’ve just spent a couple of months learning Python after 6 years of Ruby. There really was no great comparison out there for the two languages, so I decided to man up and write one myself. Now, it is mainly concerned with functional programming, but since you mention Ruby’s inject method, I’m guessing we’re on the same wavelength.
A couple of points that will get you moving in the right direction:
All the functional programming goodness you use in Ruby is in Python, and it’s even easier. For example, you can map over functions exactly as you’d expect:
Python doesn’t have a method that acts like each. Since you only use each for side effects, the equivalent in Python is the for loop:
for n in [1, 2, 3]:
print n
List comprehensions are great when a) you have to deal with functions and object collections together and b) when you need to iterate using multiple indexes. For example, to find all the palindromes in a string (assuming you have a function p() that returns true for palindromes), all you need is a single list comprehension:
s = 'string-with-palindromes-like-abbalabba'
l = len(s)
[s[x:y] for x in range(l) for y in range(x,l+1) if p(s[x:y])]
My suggestion: Don’t try to learn the differences. Learn how to approach the problem in Python. Just like there’s a Ruby approach to each problem (that works very well givin the limitations and strengths of the language), there’s a Python approach to the problem. they are both different. To get the best out of each language, you really should learn the language itself, and not just the “translation” from one to the other.
Now, with that said, the difference will help you adapt faster and make 1 off modifications to a Python program. And that’s fine for a start to get writing. But try to learn from other projects the why behind the architecture and design decisions rather than the how behind the semantics of the language…
回答 3
我几乎不了解Ruby,但是这里有一些关于您提到的内容的要点:
nil,表示缺少值的值将是None(请注意,您可以像x is None或那样检查它x is not None,而不用==-或通过强制布尔值检查它,请参阅下一点)。
I know little Ruby, but here are a few bullet points about the things you mentioned:
nil, the value indicating lack of a value, would be None (note that you check for it like x is None or x is not None, not with == – or by coercion to boolean, see next point).
None, zero-esque numbers (0, 0.0, 0j (complex number)) and empty collections ([], {}, set(), the empty string "", etc.) are considered falsy, everything else is considered truthy.
For side effects, (for-)loop explicitly. For generating a new bunch of stuff without side-effects, use list comprehensions (or their relatives – generator expressions for lazy one-time iterators, dict/set comprehensions for the said collections).
Concerning looping: You have for, which operates on an iterable(! no counting), and while, which does what you would expect. The fromer is far more powerful, thanks to the extensive support for iterators. Not only nearly everything that can be an iterator instead of a list is an iterator (at least in Python 3 – in Python 2, you have both and the default is a list, sadly). The are numerous tools for working with iterators – zip iterates any number of iterables in parallel, enumerate gives you (index, item) (on any iterable, not just on lists), even slicing abritary (possibly large or infinite) iterables! I found that these make many many looping tasks much simpler. Needless to say, they integrate just fine with list comprehensions, generator expressions, etc.
In Ruby, instance variables and methods are completely unrelated, except when you explicitly relate them with attr_accessor or something like that.
In Python, methods are just a special class of attribute: one that is executable.
So for example:
>>> class foo:
... x = 5
... def y(): pass
...
>>> f = foo()
>>> type(f.x)
<type 'int'>
>>> type(f.y)
<type 'instancemethod'>
That difference has a lot of implications, like for example that referring to f.x refers to the method object, rather than calling it. Also, as you can see, f.x is public by default, whereas in Ruby, instance variables are private by default.
For every python container C, the expectation is that
for item in C:
assert item in C
will pass just fine — wouldn’t you find it astonishing if one sense of in (the loop clause) had a completely different meaning from the other (the presence check)? I sure would! It naturally works that way for lists, sets, tuples, …
So, when C is a dictionary, if in were to yield key/value tuples in a for loop, then, by the principle of least astonishment, in would also have to take such a tuple as its left-hand operand in the containment check.
How useful would that be? Pretty useless indeed, basically making if (key, value) in C a synonym for if C.get(key) == value — which is a check I believe I may have performed, or wanted to perform, 100 times more rarely than what if k in C actually means, checking the presence of the key only and completely ignoring the value.
On the other hand, wanting to loop just on keys is quite common, e.g.:
for k in thedict:
thedict[k] += 1
having the value as well would not help particularly:
for k, v in thedict.items():
thedict[k] = v + 1
actually somewhat less clear and less concise. (Note that items was the original spelling of the “proper” methods to use to get key/value pairs: unfortunately that was back in the days when such accessors returned whole lists, so to support “just iterating” an alternative spelling had to be introduced, and iteritems it was — in Python 3, where backwards compatibility constraints with previous Python versions were much weakened, it became items again).
回答 1
我的猜测:使用完整的元组进行循环会更直观,但使用进行成员资格测试可能会更不那么直观in。
if key in counts:
counts[key]+=1else:
counts[key]=1
My guess: Using the full tuple would be more intuitive for looping, but perhaps less so for testing for membership using in.
if key in counts:
counts[key] += 1
else:
counts[key] = 1
That code wouldn’t really work if you had to specify both key and value for in. I am having a hard time imagining use case where you’d check if both the key AND value are in the dictionary. It is far more natural to only test the keys.
# When would you ever write a condition like this?
if (key, value) in dict:
Now it’s not necessary that the in operator and for ... in operate over the same items. Implementation-wise they are different operations (__contains__ vs. __iter__). But that little inconsistency would be somewhat confusing and, well, inconsistent.
Apache Thrift is a cross-language RPC option developed at Facebook. Works over sockets, function signatures are defined in text files in a language-independent way.
Since I’ve asked this question, I’ve started using python-symmetric-jsonrpc. It is quite good, can be used between python and non-python software and follow the JSON-RPC standard. But it lacks some examples.
We are developing Versile Python (VPy), an implementation for python 2.6+ and 3.x of a new ORB/RPC framework. Functional AGPL dev releases for review and testing are available. VPy has native python capabilities similar to PyRo and RPyC via a general native objects layer (code example). The product is designed for platform-independent remote object interaction for implementations of Versile Platform.
Full disclosure: I work for the company developing VPy.
I would like to create a string buffer to do lots of processing, format and finally write the buffer in a text file using a C-style sprintf functionality in Python. Because of conditional statements, I can’t write them directly to the file.
Edit, to clarify my question: buf is a big buffer contains all these strings which have formatted using sprintf.
Going by your examples, buf will only contain current values, not older ones.
e.g first in buf I wrote A= something ,B= something later C= something was appended in the same buf, but in your Python answers buf contains only last value, which is not I want – I want buf to have all the printfs I have done since the beginning, like in C.
回答 0
Python %为此提供了一个运算符。
>>> a =5>>> b ="hello">>> buf ="A = %d\n , B = %s\n"%(a, b)>>>print buf
A =5, B = hello
>>> c =10>>> buf ="C = %d\n"% c
>>>print buf
C =10
>>> a = 5
>>> b = "hello"
>>> buf = "A = %d\n , B = %s\n" % (a, b)
>>> print buf
A = 5
, B = hello
>>> c = 10
>>> buf = "C = %d\n" % c
>>> print buf
C = 10
See this reference for all supported format specifiers.
>>>importStringIO>>> buf =StringIO.StringIO()>>> buf.write("A = %d, B = %s\n"%(3,"bar"))>>> buf.write("C=%d\n"%5)>>>print(buf.getvalue())
A =3, B = bar
C=5
To insert into a very long string it is nice to use names for the different arguments, instead of hoping they are in the right positions. This also makes it easier to replace multiple recurrences.
This is probably the closest translation from your C code to Python code.
A = 1
B = "hello"
buf = "A = %d\n , B= %s\n" % (A, B)
c = 2
buf += "C=%d\n" % c
f = open('output.txt', 'w')
print >> f, c
f.close()
The % operator in Python does almost exactly the same thing as C’s sprintf. You can also print the string to a file directly. If there are lots of these string formatted stringlets involved, it might be wise to use a StringIO object to speed up processing time.
So instead of doing +=, do this:
import cStringIO
buf = cStringIO.StringIO()
...
print >> buf, "A = %d\n , B= %s\n" % (A, B)
...
print >> buf, "C=%d\n" % c
...
print >> f, buf.getvalue()
classReport:... usual init/enter/exit
defprint(self,*args,**kwargs):withStringIO()as s:print(*args,**kwargs, file=s)
out = s.getvalue()... stuff with out
withReport()as r:
r.print(f"This is {datetime.date.today()}!",'Yikes!', end=':')
Two approaches are to write to a string buffer or to write lines to a list and join them later. I think the StringIO approach is more pythonic, but didn’t work before Python 2.6.
from io import StringIO
with StringIO() as s:
print("Hello", file=s)
print("Goodbye", file=s)
# And later...
with open('myfile', 'w') as f:
f.write(s.getvalue())
You can also use these without a ContextMananger (s = StringIO()). Currently, I’m using a context manager class with a print function. This fragment might be useful to be able to insert debugging or odd paging requirements:
class Report:
... usual init/enter/exit
def print(self, *args, **kwargs):
with StringIO() as s:
print(*args, **kwargs, file=s)
out = s.getvalue()
... stuff with out
with Report() as r:
r.print(f"This is {datetime.date.today()}!", 'Yikes!', end=':')
I’m trying to do a “hello world” with new boto3 client for AWS.
The use-case I have is fairly simple: get object from S3 and save it to the file.
In boto 2.X I would do it like this:
import boto
key = boto.connect_s3().get_bucket('foo').get_key('foo')
key.get_contents_to_filename('/tmp/foo')
In boto 3 . I can’t find a clean way to do the same thing, so I’m manually iterating over the “Streaming” object:
import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
chunk = key['Body'].read(1024*8)
while chunk:
f.write(chunk)
chunk = key['Body'].read(1024*8)
or
import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
for chunk in iter(lambda: key['Body'].read(4096), b''):
f.write(chunk)
And it works fine. I was wondering is there any “native” boto3 function that will do the same task?
s3_client = boto3.client('s3')
open('hello.txt').write('Hello, world!')# Upload the file to S3
s3_client.upload_file('hello.txt','MyBucket','hello-remote.txt')# Download the file from S3
s3_client.download_file('MyBucket','hello-remote.txt','hello2.txt')print(open('hello2.txt').read())
There is a customization that went into Boto3 recently which helps with this (among other things). It is currently exposed on the low-level S3 client, and can be used like this:
s3_client = boto3.client('s3')
open('hello.txt').write('Hello, world!')
# Upload the file to S3
s3_client.upload_file('hello.txt', 'MyBucket', 'hello-remote.txt')
# Download the file from S3
s3_client.download_file('MyBucket', 'hello-remote.txt', 'hello2.txt')
print(open('hello2.txt').read())
These functions will automatically handle reading/writing files as well as doing multipart uploads in parallel for large files.
Note that s3_client.download_file won’t create a directory. It can be created as pathlib.Path('/path/to/file.txt').parent.mkdir(parents=True, exist_ok=True).
This by itself isn’t tremendously better than the client in the accepted answer (although the docs say that it does a better job retrying uploads and downloads on failure) but considering that resources are generally more ergonomic (for example, the s3 bucket and object resources are nicer than the client methods) this does allow you to stay at the resource layer without having to drop down.
Resources generally can be created in the same way as clients, and they take all or most of the same arguments and just forward them to their internal clients.
回答 2
对于那些想模拟set_contents_from_string类似boto2方法的人,您可以尝试
import boto3
from cStringIO importStringIO
s3c = boto3.client('s3')
contents ='My string to save to S3 object'
target_bucket ='hello-world.by.vor'
target_file ='data/hello.txt'
fake_handle =StringIO(contents)# notice if you do fake_handle.read() it reads like a file handle
s3c.put_object(Bucket=target_bucket,Key=target_file,Body=fake_handle.read())
For those of you who would like to simulate the set_contents_from_string like boto2 methods, you can try
import boto3
from cStringIO import StringIO
s3c = boto3.client('s3')
contents = 'My string to save to S3 object'
target_bucket = 'hello-world.by.vor'
target_file = 'data/hello.txt'
fake_handle = StringIO(contents)
# notice if you do fake_handle.read() it reads like a file handle
s3c.put_object(Bucket=target_bucket, Key=target_file, Body=fake_handle.read())
try:
from StringIO import StringIO
except ImportError:
from io import StringIO
回答 3
# Preface: File is json with contents: {'name': 'Android', 'status': 'ERROR'}import boto3
import io
s3 = boto3.resource('s3')
obj = s3.Object('my-bucket','key-to-file.json')
data = io.BytesIO()
obj.download_fileobj(data)# object is now a bytes string, Converting it to a dict:
new_dict = json.loads(data.getvalue().decode("utf-8"))print(new_dict['status'])# Should print "Error"
# Preface: File is json with contents: {'name': 'Android', 'status': 'ERROR'}
import boto3
import io
s3 = boto3.resource('s3')
obj = s3.Object('my-bucket', 'key-to-file.json')
data = io.BytesIO()
obj.download_fileobj(data)
# object is now a bytes string, Converting it to a dict:
new_dict = json.loads(data.getvalue().decode("utf-8"))
print(new_dict['status'])
# Should print "Error"
def s3_download(source, destination,
exists_strategy='raise',
profile_name=None):"""
Copy a file from an S3 source to a local destination.
Parameters
----------
source : str
Path starting with s3://, e.g. 's3://bucket-name/key/foo.bar'
destination : str
exists_strategy : {'raise', 'replace', 'abort'}
What is done when the destination already exists?
profile_name : str, optional
AWS profile
Raises
------
botocore.exceptions.NoCredentialsError
Botocore is not able to find your credentials. Either specify
profile_name or add the environment variables AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN.
See https://boto3.readthedocs.io/en/latest/guide/configuration.html
"""
exists_strategies =['raise','replace','abort']if exists_strategy notin exists_strategies:raiseValueError('exists_strategy \'{}\' is not in {}'.format(exists_strategy, exists_strategies))
session = boto3.Session(profile_name=profile_name)
s3 = session.resource('s3')
bucket_name, key = _s3_path_split(source)if os.path.isfile(destination):if exists_strategy is'raise':raiseRuntimeError('File \'{}\' already exists.'.format(destination))elif exists_strategy is'abort':return
s3.Bucket(bucket_name).download_file(key, destination)from collections import namedtuple
S3Path = namedtuple("S3Path",["bucket_name","key"])def _s3_path_split(s3_path):"""
Split an S3 path into bucket and key.
Parameters
----------
s3_path : str
Returns
-------
splitted : (str, str)
(bucket, key)
Examples
--------
>>> _s3_path_split('s3://my-bucket/foo/bar.jpg')
S3Path(bucket_name='my-bucket', key='foo/bar.jpg')
"""ifnot s3_path.startswith("s3://"):raiseValueError("s3_path is expected to start with 's3://', ""but was {}".format(s3_path))
bucket_key = s3_path[len("s3://"):]
bucket_name, key = bucket_key.split("/",1)return S3Path(bucket_name, key)
When you want to read a file with a different configuration than the default one, feel free to use either mpu.aws.s3_download(s3path, destination) directly or the copy-pasted code:
def s3_download(source, destination,
exists_strategy='raise',
profile_name=None):
"""
Copy a file from an S3 source to a local destination.
Parameters
----------
source : str
Path starting with s3://, e.g. 's3://bucket-name/key/foo.bar'
destination : str
exists_strategy : {'raise', 'replace', 'abort'}
What is done when the destination already exists?
profile_name : str, optional
AWS profile
Raises
------
botocore.exceptions.NoCredentialsError
Botocore is not able to find your credentials. Either specify
profile_name or add the environment variables AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN.
See https://boto3.readthedocs.io/en/latest/guide/configuration.html
"""
exists_strategies = ['raise', 'replace', 'abort']
if exists_strategy not in exists_strategies:
raise ValueError('exists_strategy \'{}\' is not in {}'
.format(exists_strategy, exists_strategies))
session = boto3.Session(profile_name=profile_name)
s3 = session.resource('s3')
bucket_name, key = _s3_path_split(source)
if os.path.isfile(destination):
if exists_strategy is 'raise':
raise RuntimeError('File \'{}\' already exists.'
.format(destination))
elif exists_strategy is 'abort':
return
s3.Bucket(bucket_name).download_file(key, destination)
from collections import namedtuple
S3Path = namedtuple("S3Path", ["bucket_name", "key"])
def _s3_path_split(s3_path):
"""
Split an S3 path into bucket and key.
Parameters
----------
s3_path : str
Returns
-------
splitted : (str, str)
(bucket, key)
Examples
--------
>>> _s3_path_split('s3://my-bucket/foo/bar.jpg')
S3Path(bucket_name='my-bucket', key='foo/bar.jpg')
"""
if not s3_path.startswith("s3://"):
raise ValueError(
"s3_path is expected to start with 's3://', " "but was {}"
.format(s3_path)
)
bucket_key = s3_path[len("s3://"):]
bucket_name, key = bucket_key.split("/", 1)
return S3Path(bucket_name, key)
回答 5
注意:我假设您已经分别配置了身份验证。下面的代码是从S3存储桶下载单个对象。
import boto3
#initiate s3 client
s3 = boto3.resource('s3')#Download object to the file
s3.Bucket('mybucket').download_file('hello.txt','/tmp/hello.txt')
I have manipulated some data using pandas and now I want to carry out a batch save back to the database. This requires me to convert the dataframe into an array of tuples, with each tuple corresponding to a “row” of the dataframe.
from simple_benchmark importBenchmarkBuilder
b =BenchmarkBuilder()import pandas as pd
import numpy as np
def tuple_comp(df):return[tuple(x)for x in df.to_numpy()]def iter_namedtuples(df):return list(df.itertuples(index=False))def iter_tuples(df):return list(df.itertuples(index=False, name=None))def records(df):return df.to_records(index=False).tolist()def zipmap(df):return list(zip(*map(df.get, df)))
funcs =[tuple_comp, iter_namedtuples, iter_tuples, records, zipmap]for func in funcs:
b.add_function()(func)def creator(n):return pd.DataFrame({"A": random.randint(n, size=n),"B": random.randint(n, size=n)})@b.add_arguments('Rows in DataFrame')def argument_provider():for n in(10**(np.arange(4,11)/2)).astype(int):yield n, creator(n)
r = b.run()
检查结果
r.to_pandas_dataframe().pipe(lambda d: d.div(d.min(1),0))
tuple_comp iter_namedtuples iter_tuples records zipmap
1002.9056626.6263083.4507411.4694711.0000003164.6126924.8144332.3758741.0963521.00000010006.5131214.1064261.9582931.0000001.31630331628.4461384.0821611.8083391.0000001.533605100008.4244833.6214611.6518311.0000001.558592316227.8138033.3865921.5864831.0000001.5154781000007.0505723.1624261.4999771.0000001.480131
Motivation
Many data sets are large enough that we need to concern ourselves with speed/efficiency. So I offer this solution in that spirit. It happens to also be succinct.
For the sake of comparison, let’s drop the index column
It happens to also be flexible if we wanted to deal with a specific subset of columns. We’ll assume the columns we’ve already displayed are the subset we want.
The idea of setting datetime column as the index axis is to aid in the conversion of the Timestamp value to it’s corresponding datetime.datetime format equivalent by making use of the convert_datetime64 argument in DF.to_records which does so for a DateTimeIndex dataframe.
This returns a recarray which could be then made to return a list using .tolist
More generalized solution depending on the use case would be:
df.to_records().tolist() # Supply index=False to exclude index
from numpy import random
import pandas as pd
def create_random_df(n):return pd.DataFrame({"A": random.randint(n, size=n),"B": random.randint(n, size=n)})
小尺寸:
df = create_random_df(10000)%timeit tuples = list(zip(*[df[c].values.tolist()for c in df]))%timeit tuples =[tuple(x)for x in df.values]%timeit tuples = list(df.itertuples(index=False, name=None))
给出:
1.66 ms ±200µs per loop (mean ± std. dev. of 7 runs,1000 loops each)15.5 ms ±1.52 ms per loop (mean ± std. dev. of 7 runs,100 loops each)1.74 ms ±75.4µs per loop (mean ± std. dev. of 7 runs,1000 loops each)
较大:
df = create_random_df(1000000)%timeit tuples = list(zip(*[df[c].values.tolist()for c in df]))%timeit tuples =[tuple(x)for x in df.values]%timeit tuples = list(df.itertuples(index=False, name=None))
给出:
202 ms ±5.91 ms per loop (mean ± std. dev. of 7 runs,10 loops each)1.52 s ±98.1 ms per loop (mean ± std. dev. of 7 runs,1 loop each)209 ms ±11.8 ms per loop (mean ± std. dev. of 7 runs,10 loops each)
尽我所能:
df = create_random_df(10000000)%timeit tuples = list(zip(*[df[c].values.tolist()for c in df]))%timeit tuples =[tuple(x)for x in df.values]%timeit tuples = list(df.itertuples(index=False, name=None))
给出:
1.78 s ±118 ms per loop (mean ± std. dev. of 7 runs,1 loop each)15.4 s ±222 ms per loop (mean ± std. dev. of 7 runs,1 loop each)1.68 s ±96.3 ms per loop (mean ± std. dev. of 7 runs,1 loop each)
This answer doesn’t add any answers that aren’t already discussed, but here are some speed results. I think this should resolve questions that came up in the comments. All of these look like they are O(n), based on these three values.
TL;DR: tuples = list(df.itertuples(index=False, name=None)) and tuples = list(zip(*[df[c].values.tolist() for c in df])) are tied for the fastest.
I did a quick speed test on results for three suggestions here:
The zip answer from @pirsquared: tuples = list(zip(*[df[c].values.tolist() for c in df]))
The accepted answer from @wes-mckinney: tuples = [tuple(x) for x in df.values]
The itertuples answer from @ksindi with the name=None suggestion from @Axel: tuples = list(df.itertuples(index=False, name=None))
from numpy import random
import pandas as pd
def create_random_df(n):
return pd.DataFrame({"A": random.randint(n, size=n), "B": random.randint(n, size=n)})
Small size:
df = create_random_df(10000)
%timeit tuples = list(zip(*[df[c].values.tolist() for c in df]))
%timeit tuples = [tuple(x) for x in df.values]
%timeit tuples = list(df.itertuples(index=False, name=None))
Gives:
1.66 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
15.5 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.74 ms ± 75.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Larger:
df = create_random_df(1000000)
%timeit tuples = list(zip(*[df[c].values.tolist() for c in df]))
%timeit tuples = [tuple(x) for x in df.values]
%timeit tuples = list(df.itertuples(index=False, name=None))
Gives:
202 ms ± 5.91 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
1.52 s ± 98.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
209 ms ± 11.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
As much patience as I have:
df = create_random_df(10000000)
%timeit tuples = list(zip(*[df[c].values.tolist() for c in df]))
%timeit tuples = [tuple(x) for x in df.values]
%timeit tuples = list(df.itertuples(index=False, name=None))
Gives:
1.78 s ± 118 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
15.4 s ± 222 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.68 s ± 96.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The zip version and the itertuples version are within the confidence intervals each other. I suspect that they are doing the same thing under the hood.
These speed tests are probably irrelevant though. Pushing the limits of my computer’s memory doesn’t take a huge amount of time, and you really shouldn’t be doing this on a large data set. Working with those tuples after doing this will end up being really inefficient. It’s unlikely to be a major bottleneck in your code, so just stick with the version you think is most readable.
回答 7
#try this one:
tuples = list(zip(data_set["data_date"], data_set["data_1"],data_set["data_2"]))print(tuples)
def test_something:# some actionswith patch('something')as my_var:try:# args are not important. func should never be called in this test
my_var.assert_called_with(some, args)exceptAssertionError:pass# this error being raised means it's ok# other stuff
I’m using the Mock library to test my application, but I want to assert that some function was not called. Mock docs talk about methods like mock.assert_called_with and mock.assert_called_once_with, but I didn’t find anything like mock.assert_not_called or something related to verify mock was NOT called.
I could go with something like the following, though it doesn’t seem cool nor pythonic:
def test_something:
# some actions
with patch('something') as my_var:
try:
# args are not important. func should never be called in this test
my_var.assert_called_with(some, args)
except AssertionError:
pass # this error being raised means it's ok
# other stuff
Any ideas how to accomplish this?
回答 0
这应该适合您的情况;
assertnot my_var.called,'method should not have been called'
样品;
>>> mock=Mock()>>> mock.a()<Mock name='mock.a()' id='4349129872'>>>>assertnot mock.b.called,'b was called and should not have been'>>>assertnot mock.a.called,'a was called and should not have been'Traceback(most recent call last):File"<stdin>", line 1,in<module>AssertionError: a was called and should not have been
assert not my_var.called, 'method should not have been called'
Sample;
>>> mock=Mock()
>>> mock.a()
<Mock name='mock.a()' id='4349129872'>
>>> assert not mock.b.called, 'b was called and should not have been'
>>> assert not mock.a.called, 'a was called and should not have been'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError: a was called and should not have been
You can check the called attribute, but if your assertion fails, the next thing you’ll want to know is something about the unexpected call, so you may as well arrange for that information to be displayed from the start. Using unittest, you can check the contents of call_args_list instead:
self.assertItemsEqual(my_var.call_args_list, [])
When it fails, it gives a message like this:
AssertionError: Element counts were not equal:
First has 0, Second has 1: call('first argument', 4)
import unittest
from unittest import mock
import my_module
class A(unittest.TestCase):def setUp(self):
self.message ="Method should not be called. Called {times} times!"@mock.patch("my_module.method_to_mock")def test(self, mock_method):
my_module.method_to_mock()
self.assertFalse(mock_method.called,
self.message.format(times=mock_method.call_count))
In your example we can simply assert if mock_method.called property is False, which means that method was not called.
import unittest
from unittest import mock
import my_module
class A(unittest.TestCase):
def setUp(self):
self.message = "Method should not be called. Called {times} times!"
@mock.patch("my_module.method_to_mock")
def test(self, mock_method):
my_module.method_to_mock()
self.assertFalse(mock_method.called,
self.message.format(times=mock_method.call_count))
Consuming a call object is easy, since you can compare it with a tuple of length 2 where the first component is a tuple containing all the positional arguments of the related call, while the second component is a dictionary of the keyword arguments.
>>> ((42,),) in m.call_args_list
True
>>> m(42, foo='bar')
<MagicMock name='mock()' id='139675158423872'>
>>> ((42,), {'foo': 'bar'}) in m.call_args_list
True
>>> m(foo='bar')
<MagicMock name='mock()' id='139675158423872'>
>>> ((), {'foo': 'bar'}) in m.call_args_list
True
So, a way to address the specific problem of the OP is
def test_something():
with patch('something') as my_var:
assert ((some, args),) not in my_var.call_args_list
Note that this way, instead of just checking if a mocked callable has been called, via MagicMock.called, you can now check if it has been called with a specific set of arguments.
That’s useful. Say you want to test a function that takes a list and call another function, compute(), for each of the value of the list only if they satisfy a specific condition.
You can now mock compute, and test if it has been called on some value but not on others.
Here’s a relevant example from the itertools module docs:
import itertools
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = itertools.tee(iterable)
next(b, None)
return zip(a, b)
For Python 2, you need itertools.izip instead of zip:
import itertools
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = itertools.tee(iterable)
next(b, None)
return itertools.izip(a, b)
How this works:
First, two parallel iterators, a and b are created (the tee() call), both pointing to the first element of the original iterable. The second iterator, b is moved 1 step forward (the next(b, None)) call). At this point a points to s0 and b points to s1. Both a and b can traverse the original iterator independently – the izip function takes the two iterators and makes pairs of the returned elements, advancing both iterators at the same pace.
One caveat: the tee() function produces two iterators that can advance independently of each other, but it comes at a cost. If one of the iterators advances further than the other, then tee() needs to keep the consumed elements in memory until the second iterator comsumes them too (it cannot ‘rewind’ the original iterator). Here it doesn’t matter because one iterator is only 1 step ahead of the other, but in general it’s easy to use a lot of memory this way.
And since tee() can take an n parameter, this can also be used for more than two parallel iterators:
def threes(iterator):
"s -> (s0,s1,s2), (s1,s2,s3), (s2, s3,4), ..."
a, b, c = itertools.tee(iterator, 3)
next(b, None)
next(c, None)
next(c, None)
return zip(a, b, c)
回答 1
自己滚!
def pairwise(iterable):
it = iter(iterable)
a = next(it,None)for b in it:yield(a, b)
a = b
Since the_list[1:] actually creates a copy of the whole list (excluding its first element), and zip() creates a list of tuples immediately when called, in total three copies of your list are created. If your list is very large, you might prefer
from itertools import izip, islice
for current_item, next_item in izip(the_list, islice(the_list, 1, None)):
print(current_item, next_item)
from more_itertools import pairwise
fib=[1,1,2,3,5,8,13]for current, nxt in pairwise(fib):
ratio=current/nxt
print(f'Curent = {current}, next = {nxt}, ratio = {ratio} ')
from more_itertools import pairwise
for current, next in pairwise(your_iterable):
print(f'Current = {current}, next = {nxt}')
Docs for more-itertools
Under the hood this code is the same as that in the other answers, but I much prefer imports when available.
If you don’t already have it installed then:
pip install more-itertools
Example
For instance if you had the fibbonnacci sequence, you could calculate the ratios of subsequent pairs as:
from more_itertools import pairwise
fib= [1,1,2,3,5,8,13]
for current, nxt in pairwise(fib):
ratio=current/nxt
print(f'Curent = {current}, next = {nxt}, ratio = {ratio} ')
回答 6
使用列表理解从列表中配对
the_list =[1,2,3,4]
pairs =[[the_list[i], the_list[i +1]]for i in range(len(the_list)-1)]for[current_item, next_item]in pairs:print(current_item, next_item)
the_list = [1, 2, 3, 4]
pairs = [[the_list[i], the_list[i + 1]] for i in range(len(the_list) - 1)]
for [current_item, next_item] in pairs:
print(current_item, next_item)
Output:
(1, 2)
(2, 3)
(3, 4)
回答 7
我真的很惊讶,没有人提到更短,更简单,最重要的通用解决方案:
Python 3:
from itertools import islice
def n_wise(iterable, n):return zip(*(islice(iterable, i,None)for i in range(n)))
Python 2:
from itertools import izip, islice
def n_wise(iterable, n):return izip(*(islice(iterable, i,None)for i in xrange(n)))
它可以通过进行成对迭代n=2,但是可以处理更大的数字:
>>>for a, b in n_wise('Hello!',2):>>>print(a, b)
H e
e l
l l
l o
o !>>>for a, b, c, d in n_wise('Hello World!',4):>>>print(a, b, c, d)
H e l l
e l l o
l l o
l o W
o W o
W o r
W o r l
o r l d
r l d !
I am really surprised nobody has mentioned the shorter, simpler and most importantly general solution:
Python 3:
from itertools import islice
def n_wise(iterable, n):
return zip(*(islice(iterable, i, None) for i in range(n)))
Python 2:
from itertools import izip, islice
def n_wise(iterable, n):
return izip(*(islice(iterable, i, None) for i in xrange(n)))
It works for pairwise iteration by passing n=2, but can handle any higher number:
>>> for a, b in n_wise('Hello!', 2):
>>> print(a, b)
H e
e l
l l
l o
o !
>>> for a, b, c, d in n_wise('Hello World!', 4):
>>> print(a, b, c, d)
H e l l
e l l o
l l o
l o W
o W o
W o r
W o r l
o r l d
r l d !
回答 8
基本解决方案:
def neighbors( list ):
i =0while i +1< len( list ):yield( list[ i ], list[ i +1])
i +=1for( x, y )in neighbors( list ):print( x, y )
def neighbors( list ):
i = 0
while i + 1 < len( list ):
yield ( list[ i ], list[ i + 1 ] )
i += 1
for ( x, y ) in neighbors( list ):
print( x, y )
回答 9
code ='0016364ee0942aa7cc04a8189ef3'# Getting the current and next itemprint[code[idx]+code[idx+1]for idx in range(len(code)-1)]# Getting the pairprint[code[idx*2]+code[idx*2+1]for idx in range(len(code)/2)]
code = '0016364ee0942aa7cc04a8189ef3'
# Getting the current and next item
print [code[idx]+code[idx+1] for idx in range(len(code)-1)]
# Getting the pair
print [code[idx*2]+code[idx*2+1] for idx in range(len(code)/2)]
I tried to use multiple assignment as show below to initialize variables, but I got confused by the behavior, I expect to reassign the values list separately, I mean b[0] and c[0] equal 0 as before.
a=b=c=[0,3,5]
a[0]=1
print(a)
print(b)
print(c)
Result is:
[1, 3, 5]
[1, 3, 5]
[1, 3, 5]
Is that correct? what should I use for multiple assignment?
what is different from this?
If you’re coming to Python from a language in the C/Java/etc. family, it may help you to stop thinking about a as a “variable”, and start thinking of it as a “name”.
a, b, and c aren’t different variables with equal values; they’re different names for the same identical value. Variables have types, identities, addresses, and all kinds of stuff like that.
Names don’t have any of that. Values do, of course, and you can have lots of names for the same value.
If you give Notorious B.I.G. a hot dog,* Biggie Smalls and Chris Wallace have a hot dog. If you change the first element of a to 1, the first elements of b and c are 1.
If you want to know if two names are naming the same object, use the is operator:
>>> a=b=c=[0,3,5]
>>> a is b
True
You then ask:
what is different from this?
d=e=f=3
e=4
print('f:',f)
print('e:',e)
Here, you’re rebinding the name e to the value 4. That doesn’t affect the names d and f in any way.
In your previous version, you were assigning to a[0], not to a. So, from the point of view of a[0], you’re rebinding a[0], but from the point of view of a, you’re changing it in-place.
You can use the id function, which gives you some unique number representing the identity of an object, to see exactly which object is which even when is can’t help:
Notice that a[0] has changed from 4297261120 to 4297261216—it’s now a name for a different value. And b[0] is also now a name for that same new value. That’s because a and b are still naming the same object.
Under the covers, a[0]=1 is actually calling a method on the list object. (It’s equivalent to a.__setitem__(0, 1).) So, it’s not really rebinding anything at all. It’s like calling my_object.set_something(1). Sure, likely the object is rebinding an instance attribute in order to implement this method, but that’s not what’s important; what’s important is that you’re not assigning anything, you’re just mutating the object. And it’s the same with a[0]=1.
user570826 asked:
What if we have, a = b = c = 10
That’s exactly the same situation as a = b = c = [1, 2, 3]: you have three names for the same value.
But in this case, the value is an int, and ints are immutable. In either case, you can rebind a to a different value (e.g., a = "Now I'm a string!"), but the won’t affect the original value, which b and c will still be names for. The difference is that with a list, you can change the value [1, 2, 3] into [1, 2, 3, 4] by doing, e.g., a.append(4); since that’s actually changing the value that b and c are names for, b will now b [1, 2, 3, 4]. There’s no way to change the value 10 into anything else. 10 is 10 forever, just like Claudia the vampire is 5 forever (at least until she’s replaced by Kirsten Dunst).
* Warning: Do not give Notorious B.I.G. a hot dog. Gangsta rap zombies should never be fed after midnight.
回答 1
咳嗽
>>> a,b,c =(1,2,3)>>> a
1>>> b
2>>> c
3>>> a,b,c =({'test':'a'},{'test':'b'},{'test':'c'})>>> a
{'test':'a'}>>> b
{'test':'b'}>>> c
{'test':'c'}>>>
>>> a,b,c = (1,2,3)
>>> a
1
>>> b
2
>>> c
3
>>> a,b,c = ({'test':'a'},{'test':'b'},{'test':'c'})
>>> a
{'test': 'a'}
>>> b
{'test': 'b'}
>>> c
{'test': 'c'}
>>>
b = a[:]# this does a shallow copy, which is good enough for this caseimport copy
c = copy.deepcopy(a)# this does a deep copy, which matters if the list contains mutable objects
Yes, that’s the expected behavior. a, b and c are all set as labels for the same list. If you want three different lists, you need to assign them individually. You can either repeat the explicit list, or use one of the numerous ways to copy a list:
b = a[:] # this does a shallow copy, which is good enough for this case
import copy
c = copy.deepcopy(a) # this does a deep copy, which matters if the list contains mutable objects
Assignment statements in Python do not copy objects – they bind the name to an object, and an object can have as many labels as you set. In your first edit, changing a[0], you’re updating one element of the single list that a, b, and c all refer to. In your second, changing e, you’re switching e to be a label for a different object (4 instead of 3).
In python, everything is an object, also “simple” variables types (int, float, etc..).
When you changes a variable value, you actually changes it’s pointer, and if you compares between two variables it’s compares their pointers.
(To be clear, pointer is the address in physical computer memory where a variable is stored).
As a result, when you changes an inner variable value, you changes it’s value in the memory and it’s affects all the variables that point to this address.
For your example, when you do:
a = b = 5
This means that a and b points to the same address in memory that contains the value 5, but when you do:
a = 6
It’s not affect b because a is now points to another memory location that contains 6 and b still points to the memory address that contains 5.
But, when you do:
a = b = [1,2,3]
a and b, again, points to the same location but the difference is that if you change the one of the list values:
a[0] = 2
It’s changes the value of the memory that a is points on, but a is still points to the same address as b, and as a result, b changes as well.
回答 4
您可以id(name)用来检查两个名称是否代表相同的对象:
>>> a = b = c =[0,3,5]>>>print(id(a), id(b), id(c))462684884626848846268488
>>> a =[1,8,5]>>>print(id(a), id(b), id(c))1394238804626848846268488>>>print(a, b, c)[1,8,5][1,3,5][1,3,5]
整数是不可变的,因此您不能在不创建新对象的情况下更改值:
>>> x = y = z =1>>>print(id(x), id(y), id(z))507081216507081216507081216>>> x =2>>>print(id(x), id(y), id(z))507081248507081216507081216>>>print(x, y, z)211
Simply put, in the first case, you are assigning multiple names to a list. Only one copy of list is created in memory and all names refer to that location. So changing the list using any of the names will actually modify the list in memory.
In the second case, multiple copies of same value are created in memory. So each copy is independent of one another.
回答 7
您需要的是:
a, b, c =[0,3,5]# Unpack the list, now a, b, and c are ints
a =1# `a` did equal 0, not [0,3,5]print(a)print(b)print(c)
a, b, c = [0,3,5] # Unpack the list, now a, b, and c are ints
a = 1 # `a` did equal 0, not [0,3,5]
print(a)
print(b)
print(c)
回答 8
执行我需要的代码可能是这样的:
# test
aux=[[0for n in range(3)]for i in range(4)]print('aux:',aux)# initialization
a,b,c,d=[[0for n in range(3)]for i in range(4)]# changing values
a[0]=1
d[2]=5print('a:',a)print('b:',b)print('c:',c)print('d:',d)
# test
aux=[[0 for n in range(3)] for i in range(4)]
print('aux:',aux)
# initialization
a,b,c,d=[[0 for n in range(3)] for i in range(4)]
# changing values
a[0]=1
d[2]=5
print('a:',a)
print('b:',b)
print('c:',c)
print('d:',d)
I am plotting two similar trajectories in matplotlib and I’d like to plot each of the lines with partial transparency so that the red (plotted second) doesn’t obscure the blue.
It really depends on what functions you’re using to plot the lines, but try see if the on you’re using takes an alpha value and set it to something like 0.5. If that doesn’t work, try get the line objects and set their alpha values directly.