分类目录归档:知识问答

如何将函数应用于Pandas数据框的两列

问题:如何将函数应用于Pandas数据框的两列

假设我有一个df包含的列'ID', 'col_1', 'col_2'。我定义一个函数:

f = lambda x, y : my_function_expression

现在,我要应用fdf的两列'col_1', 'col_2',以逐元素的计算新列'col_3',有点像:

df['col_3'] = df[['col_1','col_2']].apply(f)  
# Pandas gives : TypeError: ('<lambda>() takes exactly 2 arguments (1 given)'

怎么做 ?

** 如下添加详细样本 ***

import pandas as pd

df = pd.DataFrame({'ID':['1','2','3'], 'col_1': [0,2,3], 'col_2':[1,4,5]})
mylist = ['a','b','c','d','e','f']

def get_sublist(sta,end):
    return mylist[sta:end+1]

#df['col_3'] = df[['col_1','col_2']].apply(get_sublist,axis=1)
# expect above to output df as below 

  ID  col_1  col_2            col_3
0  1      0      1       ['a', 'b']
1  2      2      4  ['c', 'd', 'e']
2  3      3      5  ['d', 'e', 'f']

Suppose I have a df which has columns of 'ID', 'col_1', 'col_2'. And I define a function :

f = lambda x, y : my_function_expression.

Now I want to apply the f to df‘s two columns 'col_1', 'col_2' to element-wise calculate a new column 'col_3' , somewhat like :

df['col_3'] = df[['col_1','col_2']].apply(f)  
# Pandas gives : TypeError: ('<lambda>() takes exactly 2 arguments (1 given)'

How to do ?

** Add detail sample as below ***

import pandas as pd

df = pd.DataFrame({'ID':['1','2','3'], 'col_1': [0,2,3], 'col_2':[1,4,5]})
mylist = ['a','b','c','d','e','f']

def get_sublist(sta,end):
    return mylist[sta:end+1]

#df['col_3'] = df[['col_1','col_2']].apply(get_sublist,axis=1)
# expect above to output df as below 

  ID  col_1  col_2            col_3
0  1      0      1       ['a', 'b']
1  2      2      4  ['c', 'd', 'e']
2  3      3      5  ['d', 'e', 'f']

回答 0

这是apply在数据框上使用的示例,我正在用进行调用axis = 1

请注意,区别在于,与其尝试将两个值传递给函数f,不如重写函数以接受pandas Series对象,然后对Series进行索引以获取所需的值。

In [49]: df
Out[49]: 
          0         1
0  1.000000  0.000000
1 -0.494375  0.570994
2  1.000000  0.000000
3  1.876360 -0.229738
4  1.000000  0.000000

In [50]: def f(x):    
   ....:  return x[0] + x[1]  
   ....:  

In [51]: df.apply(f, axis=1) #passes a Series object, row-wise
Out[51]: 
0    1.000000
1    0.076619
2    1.000000
3    1.646622
4    1.000000

根据您的用例,有时创建一个pandas group对象然后apply在组中使用会很有帮助。

Here’s an example using apply on the dataframe, which I am calling with axis = 1.

Note the difference is that instead of trying to pass two values to the function f, rewrite the function to accept a pandas Series object, and then index the Series to get the values needed.

In [49]: df
Out[49]: 
          0         1
0  1.000000  0.000000
1 -0.494375  0.570994
2  1.000000  0.000000
3  1.876360 -0.229738
4  1.000000  0.000000

In [50]: def f(x):    
   ....:  return x[0] + x[1]  
   ....:  

In [51]: df.apply(f, axis=1) #passes a Series object, row-wise
Out[51]: 
0    1.000000
1    0.076619
2    1.000000
3    1.646622
4    1.000000

Depending on your use case, it is sometimes helpful to create a pandas group object, and then use apply on the group.


回答 1

在Pandas中,有一种简单的方法可以做到这一点:

df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)

这允许 f成为具有多个输入值的用户定义函数,并使用(安全)列名而不是(不安全)数字索引来访问列。

数据示例(基于原始问题):

import pandas as pd

df = pd.DataFrame({'ID':['1', '2', '3'], 'col_1': [0, 2, 3], 'col_2':[1, 4, 5]})
mylist = ['a', 'b', 'c', 'd', 'e', 'f']

def get_sublist(sta,end):
    return mylist[sta:end+1]

df['col_3'] = df.apply(lambda x: get_sublist(x.col_1, x.col_2), axis=1)

输出print(df)

  ID  col_1  col_2      col_3
0  1      0      1     [a, b]
1  2      2      4  [c, d, e]
2  3      3      5  [d, e, f]

如果您的列名包含空格或与现有的dataframe属性共享名称,则可以使用方括号进行索引:

df['col_3'] = df.apply(lambda x: f(x['col 1'], x['col 2']), axis=1)

There is a clean, one-line way of doing this in Pandas:

df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)

This allows f to be a user-defined function with multiple input values, and uses (safe) column names rather than (unsafe) numeric indices to access the columns.

Example with data (based on original question):

import pandas as pd

df = pd.DataFrame({'ID':['1', '2', '3'], 'col_1': [0, 2, 3], 'col_2':[1, 4, 5]})
mylist = ['a', 'b', 'c', 'd', 'e', 'f']

def get_sublist(sta,end):
    return mylist[sta:end+1]

df['col_3'] = df.apply(lambda x: get_sublist(x.col_1, x.col_2), axis=1)

Output of print(df):

  ID  col_1  col_2      col_3
0  1      0      1     [a, b]
1  2      2      4  [c, d, e]
2  3      3      5  [d, e, f]

If your column names contain spaces or share a name with an existing dataframe attribute, you can index with square brackets:

df['col_3'] = df.apply(lambda x: f(x['col 1'], x['col 2']), axis=1)

回答 2

一个简单的解决方案是:

df['col_3'] = df[['col_1','col_2']].apply(lambda x: f(*x), axis=1)

A simple solution is:

df['col_3'] = df[['col_1','col_2']].apply(lambda x: f(*x), axis=1)

回答 3

一个有趣的问题!我的回答如下:

import pandas as pd

def sublst(row):
    return lst[row['J1']:row['J2']]

df = pd.DataFrame({'ID':['1','2','3'], 'J1': [0,2,3], 'J2':[1,4,5]})
print df
lst = ['a','b','c','d','e','f']

df['J3'] = df.apply(sublst,axis=1)
print df

输出:

  ID  J1  J2
0  1   0   1
1  2   2   4
2  3   3   5
  ID  J1  J2      J3
0  1   0   1     [a]
1  2   2   4  [c, d]
2  3   3   5  [d, e]

我将列名称更改为ID,J1,J2,J3以确保ID <J1 <J2 <J3,因此列以正确的顺序显示。

另一个简短的版本:

import pandas as pd

df = pd.DataFrame({'ID':['1','2','3'], 'J1': [0,2,3], 'J2':[1,4,5]})
print df
lst = ['a','b','c','d','e','f']

df['J3'] = df.apply(lambda row:lst[row['J1']:row['J2']],axis=1)
print df

A interesting question! my answer as below:

import pandas as pd

def sublst(row):
    return lst[row['J1']:row['J2']]

df = pd.DataFrame({'ID':['1','2','3'], 'J1': [0,2,3], 'J2':[1,4,5]})
print df
lst = ['a','b','c','d','e','f']

df['J3'] = df.apply(sublst,axis=1)
print df

Output:

  ID  J1  J2
0  1   0   1
1  2   2   4
2  3   3   5
  ID  J1  J2      J3
0  1   0   1     [a]
1  2   2   4  [c, d]
2  3   3   5  [d, e]

I changed the column name to ID,J1,J2,J3 to ensure ID < J1 < J2 < J3, so the column display in right sequence.

One more brief version:

import pandas as pd

df = pd.DataFrame({'ID':['1','2','3'], 'J1': [0,2,3], 'J2':[1,4,5]})
print df
lst = ['a','b','c','d','e','f']

df['J3'] = df.apply(lambda row:lst[row['J1']:row['J2']],axis=1)
print df

回答 4

您正在寻找的方法是Series.combine。但是,似乎必须谨慎处理数据类型。在您的示例中,您会(就像我在测试答案时所做的那样)天真地调用

df['col_3'] = df.col_1.combine(df.col_2, func=get_sublist)

但是,这将引发错误:

ValueError: setting an array element with a sequence.

我最好的猜测是,似乎期望结果与调用该方法的系列(此处为df.col_1)具有相同的类型。但是,以下工作原理:

df['col_3'] = df.col_1.astype(object).combine(df.col_2, func=get_sublist)

df

   ID   col_1   col_2   col_3
0   1   0   1   [a, b]
1   2   2   4   [c, d, e]
2   3   3   5   [d, e, f]

The method you are looking for is Series.combine. However, it seems some care has to be taken around datatypes. In your example, you would (as I did when testing the answer) naively call

df['col_3'] = df.col_1.combine(df.col_2, func=get_sublist)

However, this throws the error:

ValueError: setting an array element with a sequence.

My best guess is that it seems to expect the result to be of the same type as the series calling the method (df.col_1 here). However, the following works:

df['col_3'] = df.col_1.astype(object).combine(df.col_2, func=get_sublist)

df

   ID   col_1   col_2   col_3
0   1   0   1   [a, b]
1   2   2   4   [c, d, e]
2   3   3   5   [d, e, f]

回答 5

您的书写方式需要两个输入。如果查看错误消息,它表示您没有为f提供两个输入,仅一个。错误消息是正确的。
之所以不匹配,是因为df [[”col1’,’col2′]]返回一个包含两列而不是两列的单个数据帧。

你需要改变你的f,将它需要一个输入,保持上述数据帧作为输入,然后把它分解成X,Y 内部函数体。然后执行所需的任何操作并返回一个值。

您需要此函数签名,因为语法为.apply(f),因此f需要采用单个对象=数据帧,而不是当前f所期望的两件事。

由于您尚未提供f的正文,因此我无济于事-但这应提供解决方法,而无需从根本上更改您的代码或使用其他方法(而不是应用)

The way you have written f it needs two inputs. If you look at the error message it says you are not providing two inputs to f, just one. The error message is correct.
The mismatch is because df[[‘col1′,’col2’]] returns a single dataframe with two columns, not two separate columns.

You need to change your f so that it takes a single input, keep the above data frame as input, then break it up into x,y inside the function body. Then do whatever you need and return a single value.

You need this function signature because the syntax is .apply(f) So f needs to take the single thing = dataframe and not two things which is what your current f expects.

Since you haven’t provided the body of f I can’t help in anymore detail – but this should provide the way out without fundamentally changing your code or using some other methods rather than apply


回答 6

我将对np.vectorize进行投票。它允许您仅拍摄x列数,而不处理函数中的数据框,因此非常适合您无法控制的函数或执行向函数发送2列和常量之类的操作(例如col_1,col_2, ‘foo’)。

import numpy as np
import pandas as pd

df = pd.DataFrame({'ID':['1','2','3'], 'col_1': [0,2,3], 'col_2':[1,4,5]})
mylist = ['a','b','c','d','e','f']

def get_sublist(sta,end):
    return mylist[sta:end+1]

#df['col_3'] = df[['col_1','col_2']].apply(get_sublist,axis=1)
# expect above to output df as below 

df.loc[:,'col_3'] = np.vectorize(get_sublist, otypes=["O"]) (df['col_1'], df['col_2'])


df

ID  col_1   col_2   col_3
0   1   0   1   [a, b]
1   2   2   4   [c, d, e]
2   3   3   5   [d, e, f]

I’m going to put in a vote for np.vectorize. It allows you to just shoot over x number of columns and not deal with the dataframe in the function, so it’s great for functions you don’t control or doing something like sending 2 columns and a constant into a function (i.e. col_1, col_2, ‘foo’).

import numpy as np
import pandas as pd

df = pd.DataFrame({'ID':['1','2','3'], 'col_1': [0,2,3], 'col_2':[1,4,5]})
mylist = ['a','b','c','d','e','f']

def get_sublist(sta,end):
    return mylist[sta:end+1]

#df['col_3'] = df[['col_1','col_2']].apply(get_sublist,axis=1)
# expect above to output df as below 

df.loc[:,'col_3'] = np.vectorize(get_sublist, otypes=["O"]) (df['col_1'], df['col_2'])


df

ID  col_1   col_2   col_3
0   1   0   1   [a, b]
1   2   2   4   [c, d, e]
2   3   3   5   [d, e, f]

回答 7

从中返回列表apply是危险的操作,因为不能保证结果对象不是Series还是DataFrame。在某些情况下可能会引发exceptions情况。让我们来看一个简单的例子:

df = pd.DataFrame(data=np.random.randint(0, 5, (5,3)),
                  columns=['a', 'b', 'c'])
df
   a  b  c
0  4  0  0
1  2  0  1
2  2  2  2
3  1  2  2
4  3  0  0

从以下位置返回列表可能会导致三种结果 apply

1)如果返回列表的长度不等于列数,则返回一系列列表。

df.apply(lambda x: list(range(2)), axis=1)  # returns a Series
0    [0, 1]
1    [0, 1]
2    [0, 1]
3    [0, 1]
4    [0, 1]
dtype: object

2)当返回列表的长度等于列数时,则返回一个DataFrame,并且每一列都获得列表中的相应值。

df.apply(lambda x: list(range(3)), axis=1) # returns a DataFrame
   a  b  c
0  0  1  2
1  0  1  2
2  0  1  2
3  0  1  2
4  0  1  2

3)如果返回列表的长度等于第一行的列数,但至少具有一行,其中列表的元素数与列数不同,则引发ValueError。

i = 0
def f(x):
    global i
    if i == 0:
        i += 1
        return list(range(3))
    return list(range(4))

df.apply(f, axis=1) 
ValueError: Shape of passed values is (5, 4), indices imply (5, 3)

不申请就回答问题

apply与axis = 1一起使用非常慢。使用基本的迭代方法可能会获得更好的性能(尤其是在较大的数据集上)。

创建更大的数据框

df1 = df.sample(100000, replace=True).reset_index(drop=True)

时机

# apply is slow with axis=1
%timeit df1.apply(lambda x: mylist[x['col_1']: x['col_2']+1], axis=1)
2.59 s ± 76.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# zip - similar to @Thomas
%timeit [mylist[v1:v2+1] for v1, v2 in zip(df1.col_1, df1.col_2)]  
29.5 ms ± 534 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

@托马斯回答

%timeit list(map(get_sublist, df1['col_1'],df1['col_2']))
34 ms ± 459 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Returning a list from apply is a dangerous operation as the resulting object is not guaranteed to be either a Series or a DataFrame. And exceptions might be raised in certain cases. Let’s walk through a simple example:

df = pd.DataFrame(data=np.random.randint(0, 5, (5,3)),
                  columns=['a', 'b', 'c'])
df
   a  b  c
0  4  0  0
1  2  0  1
2  2  2  2
3  1  2  2
4  3  0  0

There are three possible outcomes with returning a list from apply

1) If the length of the returned list is not equal to the number of columns, then a Series of lists is returned.

df.apply(lambda x: list(range(2)), axis=1)  # returns a Series
0    [0, 1]
1    [0, 1]
2    [0, 1]
3    [0, 1]
4    [0, 1]
dtype: object

2) When the length of the returned list is equal to the number of columns then a DataFrame is returned and each column gets the corresponding value in the list.

df.apply(lambda x: list(range(3)), axis=1) # returns a DataFrame
   a  b  c
0  0  1  2
1  0  1  2
2  0  1  2
3  0  1  2
4  0  1  2

3) If the length of the returned list equals the number of columns for the first row but has at least one row where the list has a different number of elements than number of columns a ValueError is raised.

i = 0
def f(x):
    global i
    if i == 0:
        i += 1
        return list(range(3))
    return list(range(4))

df.apply(f, axis=1) 
ValueError: Shape of passed values is (5, 4), indices imply (5, 3)

Answering the problem without apply

Using apply with axis=1 is very slow. It is possible to get much better performance (especially on larger datasets) with basic iterative methods.

Create larger dataframe

df1 = df.sample(100000, replace=True).reset_index(drop=True)

Timings

# apply is slow with axis=1
%timeit df1.apply(lambda x: mylist[x['col_1']: x['col_2']+1], axis=1)
2.59 s ± 76.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# zip - similar to @Thomas
%timeit [mylist[v1:v2+1] for v1, v2 in zip(df1.col_1, df1.col_2)]  
29.5 ms ± 534 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

@Thomas answer

%timeit list(map(get_sublist, df1['col_1'],df1['col_2']))
34 ms ± 459 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

回答 8

我敢肯定这不如使用Pandas或Numpy操作的解决方案快,但是如果您不想重写函数,则可以使用map。使用原始示例数据-

import pandas as pd

df = pd.DataFrame({'ID':['1','2','3'], 'col_1': [0,2,3], 'col_2':[1,4,5]})
mylist = ['a','b','c','d','e','f']

def get_sublist(sta,end):
    return mylist[sta:end+1]

df['col_3'] = list(map(get_sublist,df['col_1'],df['col_2']))
#In Python 2 don't convert above to list

我们可以通过这种方式将任意数量的参数传递给函数。输出就是我们想要的

ID  col_1  col_2      col_3
0  1      0      1     [a, b]
1  2      2      4  [c, d, e]
2  3      3      5  [d, e, f]

I’m sure this isn’t as fast as the solutions using Pandas or Numpy operations, but if you don’t want to rewrite your function you can use map. Using the original example data –

import pandas as pd

df = pd.DataFrame({'ID':['1','2','3'], 'col_1': [0,2,3], 'col_2':[1,4,5]})
mylist = ['a','b','c','d','e','f']

def get_sublist(sta,end):
    return mylist[sta:end+1]

df['col_3'] = list(map(get_sublist,df['col_1'],df['col_2']))
#In Python 2 don't convert above to list

We could pass as many arguments as we wanted into the function this way. The output is what we wanted

ID  col_1  col_2      col_3
0  1      0      1     [a, b]
1  2      2      4  [c, d, e]
2  3      3      5  [d, e, f]

回答 9

我的问题示例:

def get_sublist(row, col1, col2):
    return mylist[row[col1]:row[col2]+1]
df.apply(get_sublist, axis=1, col1='col_1', col2='col_2')

My example to your questions:

def get_sublist(row, col1, col2):
    return mylist[row[col1]:row[col2]+1]
df.apply(get_sublist, axis=1, col1='col_1', col2='col_2')

回答 10

如果您有庞大的数据集,则可以使用简单但更快的(执行时间)方式使用swifter:

import pandas as pd
import swifter

def fnc(m,x,c):
    return m*x+c

df = pd.DataFrame({"m": [1,2,3,4,5,6], "c": [1,1,1,1,1,1], "x":[5,3,6,2,6,1]})
df["y"] = df.swifter.apply(lambda x: fnc(x.m, x.x, x.c), axis=1)

If you have a huge data-set, then you can use an easy but faster(execution time) way of doing this using swifter:

import pandas as pd
import swifter

def fnc(m,x,c):
    return m*x+c

df = pd.DataFrame({"m": [1,2,3,4,5,6], "c": [1,1,1,1,1,1], "x":[5,3,6,2,6,1]})
df["y"] = df.swifter.apply(lambda x: fnc(x.m, x.x, x.c), axis=1)

回答 11

我想您不想更改get_sublist功能,而只想使用DataFrame的apply方法来完成这项工作。为了获得所需的结果,我编写了两个帮助函数:get_sublist_listunlist。顾名思义,首先获取子列表,然后从列表中提取该子列表。最后,我们需要调用apply函数以df[['col_1','col_2']]随后将这两个函数应用于DataFrame。

import pandas as pd

df = pd.DataFrame({'ID':['1','2','3'], 'col_1': [0,2,3], 'col_2':[1,4,5]})
mylist = ['a','b','c','d','e','f']

def get_sublist(sta,end):
    return mylist[sta:end+1]

def get_sublist_list(cols):
    return [get_sublist(cols[0],cols[1])]

def unlist(list_of_lists):
    return list_of_lists[0]

df['col_3'] = df[['col_1','col_2']].apply(get_sublist_list,axis=1).apply(unlist)

df

如果不使用[]get_sublist函数,则该get_sublist_list函数将返回一个纯列表,它会引发ValueError: could not broadcast input array from shape (3) into shape (2)@Ted Petrou提到的列表。

I suppose you don’t want to change get_sublist function, and just want to use DataFrame’s apply method to do the job. To get the result you want, I’ve wrote two help functions: get_sublist_list and unlist. As the function name suggest, first get the list of sublist, second extract that sublist from that list. Finally, We need to call apply function to apply those two functions to the df[['col_1','col_2']] DataFrame subsequently.

import pandas as pd

df = pd.DataFrame({'ID':['1','2','3'], 'col_1': [0,2,3], 'col_2':[1,4,5]})
mylist = ['a','b','c','d','e','f']

def get_sublist(sta,end):
    return mylist[sta:end+1]

def get_sublist_list(cols):
    return [get_sublist(cols[0],cols[1])]

def unlist(list_of_lists):
    return list_of_lists[0]

df['col_3'] = df[['col_1','col_2']].apply(get_sublist_list,axis=1).apply(unlist)

df

If you don’t use [] to enclose the get_sublist function, then the get_sublist_list function will return a plain list, it’ll raise ValueError: could not broadcast input array from shape (3) into shape (2), as @Ted Petrou had mentioned.


更改字典中键的名称

问题:更改字典中键的名称

我想更改Python字典中条目的键。

有没有简单的方法可以做到这一点?

I want to change the key of an entry in a Python dictionary.

Is there a straightforward way to do this?


回答 0

只需2个步骤即可轻松完成:

dictionary[new_key] = dictionary[old_key]
del dictionary[old_key]

或第一步

dictionary[new_key] = dictionary.pop(old_key)

KeyError如果dictionary[old_key]未定义,它将引发。请注意,这删除dictionary[old_key]

>>> dictionary = { 1: 'one', 2:'two', 3:'three' }
>>> dictionary['ONE'] = dictionary.pop(1)
>>> dictionary
{2: 'two', 3: 'three', 'ONE': 'one'}
>>> dictionary['ONE'] = dictionary.pop(1)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
KeyError: 1

Easily done in 2 steps:

dictionary[new_key] = dictionary[old_key]
del dictionary[old_key]

Or in 1 step:

dictionary[new_key] = dictionary.pop(old_key)

which will raise KeyError if dictionary[old_key] is undefined. Note that this will delete dictionary[old_key].

>>> dictionary = { 1: 'one', 2:'two', 3:'three' }
>>> dictionary['ONE'] = dictionary.pop(1)
>>> dictionary
{2: 'two', 3: 'three', 'ONE': 'one'}
>>> dictionary['ONE'] = dictionary.pop(1)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
KeyError: 1

回答 1

如果要更改所有键:

d = {'x':1, 'y':2, 'z':3}
d1 = {'x':'a', 'y':'b', 'z':'c'}

In [10]: dict((d1[key], value) for (key, value) in d.items())
Out[10]: {'a': 1, 'b': 2, 'c': 3}

如果要更改单个键:可以采用上述任何建议。

if you want to change all the keys:

d = {'x':1, 'y':2, 'z':3}
d1 = {'x':'a', 'y':'b', 'z':'c'}

In [10]: dict((d1[key], value) for (key, value) in d.items())
Out[10]: {'a': 1, 'b': 2, 'c': 3}

if you want to change single key: You can go with any of the above suggestion.


回答 2

流行的

>>>a = {1:2, 3:4}
>>>a[5] = a.pop(1)
>>>a
{3: 4, 5: 2}
>>> 

pop’n’fresh

>>>a = {1:2, 3:4}
>>>a[5] = a.pop(1)
>>>a
{3: 4, 5: 2}
>>> 

回答 3

在python 2.7及更高版本中,您可以使用字典理解:这是我在使用DictReader读取CSV时遇到的示例。用户已在所有列名后添加“:”

ori_dict = {'key1:' : 1, 'key2:' : 2, 'key3:' : 3}

摆脱键后面的“:”:

corrected_dict = { k.replace(':', ''): v for k, v in ori_dict.items() }

In python 2.7 and higher, you can use dictionary comprehension: This is an example I encountered while reading a CSV using a DictReader. The user had suffixed all the column names with ‘:’

ori_dict = {'key1:' : 1, 'key2:' : 2, 'key3:' : 3}

to get rid of the trailing ‘:’ in the keys:

corrected_dict = { k.replace(':', ''): v for k, v in ori_dict.items() }


回答 4

由于键是字典用于查找值的对象,因此您无法真正更改它们。您可以做的最接近的操作是保存与旧密钥关联的值,将其删除,然后添加带有替换密钥和已保存值的新条目。其他几个答案说明了可以完成此操作的不同方式。

Since keys are what dictionaries use to lookup values, you can’t really change them. The closest thing you can do is to save the value associated with the old key, delete it, then add a new entry with the replacement key and the saved value. Several of the other answers illustrate different ways this can be accomplished.


回答 5

如果您有复杂的字典,则表示该字典中有一个字典或列表:

myDict = {1:"one",2:{3:"three",4:"four"}}
myDict[2][5] = myDict[2].pop(4)
print myDict

Output
{1: 'one', 2: {3: 'three', 5: 'four'}}

If you have a complex dict, it means there is a dict or list within the dict:

myDict = {1:"one",2:{3:"three",4:"four"}}
myDict[2][5] = myDict[2].pop(4)
print myDict

Output
{1: 'one', 2: {3: 'three', 5: 'four'}}

回答 6

没有直接的方法可以执行此操作,但是您可以删除然后分配

d = {1:2,3:4}

d[newKey] = d[1]
del d[1]

或进行批量密钥更改:

d = dict((changeKey(k), v) for k, v in d.items())

No direct way to do this, but you can delete-then-assign

d = {1:2,3:4}

d[newKey] = d[1]
del d[1]

or do mass key changes:

d = dict((changeKey(k), v) for k, v in d.items())

回答 7

转换字典中的所有键

假设这是您的字典:

>>> sample = {'person-id': '3', 'person-name': 'Bob'}

要将所有破折号转换为示例字典键中的下划线:

>>> sample = {key.replace('-', '_'): sample.pop(key) for key in sample.keys()}
>>> sample
>>> {'person_id': '3', 'person_name': 'Bob'}

To convert all the keys in the dictionary

Suppose this is your dictionary:

>>> sample = {'person-id': '3', 'person-name': 'Bob'}

To convert all the dashes to underscores in the sample dictionary key:

>>> sample = {key.replace('-', '_'): sample.pop(key) for key in sample.keys()}
>>> sample
>>> {'person_id': '3', 'person_name': 'Bob'}

回答 8

d = {1:2,3:4}

假设我们想将键更改为列表元素p = [‘a’,’b’]。以下代码将执行以下操作:

d=dict(zip(p,list(d.values()))) 

我们得到

{'a': 2, 'b': 4}
d = {1:2,3:4}

suppose that we want to change the keys to the list elements p=[‘a’ , ‘b’]. the following code will do:

d=dict(zip(p,list(d.values()))) 

and we get

{'a': 2, 'b': 4}

回答 9

您可以将相同的值与许多键相关联,或者只删除一个键并重新添加具有相同值的新键。

例如,如果您有键->值:

red->1
blue->2
green->4

没有理由您无法添加purple->2或删除red->1并添加orange->1

You can associate the same value with many keys, or just remove a key and re-add a new key with the same value.

For example, if you have keys->values:

red->1
blue->2
green->4

there’s no reason you can’t add purple->2 or remove red->1 and add orange->1


回答 10

如果一次更改所有按键。在这里,我将阻止所有键。

a = {'making' : 1, 'jumping' : 2, 'climbing' : 1, 'running' : 2}
b = {ps.stem(w) : a[w] for w in a.keys()}
print(b)
>>> {'climb': 1, 'jump': 2, 'make': 1, 'run': 2} #output

In case of changing all the keys at once. Here I am stemming the keys.

a = {'making' : 1, 'jumping' : 2, 'climbing' : 1, 'running' : 2}
b = {ps.stem(w) : a[w] for w in a.keys()}
print(b)
>>> {'climb': 1, 'jump': 2, 'make': 1, 'run': 2} #output

回答 11

我还没有看到这个确切的答案:

dict['key'] = value

您甚至可以对对象属性执行此操作。通过执行以下操作使它们成为字典:

dict = vars(obj)

然后,您可以像字典一样操作对象属性:

dict['attribute'] = value

I haven’t seen this exact answer:

dict['key'] = value

You can even do this to object attributes. Make them into a dictionary by doing this:

dict = vars(obj)

Then you can manipulate the object attributes like you would a dictionary:

dict['attribute'] = value

块数组尺寸

问题:块数组尺寸

我目前正在尝试学习Numpy和Python。给定以下数组:

import numpy as np
a = np.array([[1,2],[1,2]])

有没有返回尺寸的函数a(ega是2 x 2数组)?

size() 返回4并没有太大帮助。

I’m currently trying to learn Numpy and Python. Given the following array:

import numpy as np
a = np.array([[1,2],[1,2]])

Is there a function that returns the dimensions of a (e.g.a is a 2 by 2 array)?

size() returns 4 and that doesn’t help very much.


回答 0

.shape

ndarray。 数组尺寸的形状
元组。

从而:

>>> a.shape
(2, 2)

It is .shape:

ndarray.shape
Tuple of array dimensions.

Thus:

>>> a.shape
(2, 2)

回答 1

第一:

按照惯例,在Python世界中,的快捷方式numpynp,因此:

In [1]: import numpy as np

In [2]: a = np.array([[1,2],[3,4]])

第二:

在Numpy中,维度轴/轴形状是相关的,有时是相似的概念:

尺寸

在“ 数学/物理学”中,维或维数被非正式地定义为指定空间中任何点所需的最小坐标数。但在numpy的,根据numpy的文档,这是相同的轴线/轴:

在Numpy中,尺寸称为轴。轴数为等级。

In [3]: a.ndim  # num of dimensions/axes, *Mathematics definition of dimension*
Out[3]: 2

轴/轴

在Numpy中索引an 的第n个坐标array。多维数组每个轴可以有一个索引。

In [4]: a[1,0]  # to index `a`, we specific 1 at the first axis and 0 at the second axis.
Out[4]: 3  # which results in 3 (locate at the row 1 and column 0, 0-based index)

形状

描述沿每个可用轴有多少数据(或范围)。

In [5]: a.shape
Out[5]: (2, 2)  # both the first and second axis have 2 (columns/rows/pages/blocks/...) data

First:

By convention, in Python world, the shortcut for numpy is np, so:

In [1]: import numpy as np

In [2]: a = np.array([[1,2],[3,4]])

Second:

In Numpy, dimension, axis/axes, shape are related and sometimes similar concepts:

dimension

In Mathematics/Physics, dimension or dimensionality is informally defined as the minimum number of coordinates needed to specify any point within a space. But in Numpy, according to the numpy doc, it’s the same as axis/axes:

In Numpy dimensions are called axes. The number of axes is rank.

In [3]: a.ndim  # num of dimensions/axes, *Mathematics definition of dimension*
Out[3]: 2

axis/axes

the nth coordinate to index an array in Numpy. And multidimensional arrays can have one index per axis.

In [4]: a[1,0]  # to index `a`, we specific 1 at the first axis and 0 at the second axis.
Out[4]: 3  # which results in 3 (locate at the row 1 and column 0, 0-based index)

shape

describes how many data (or the range) along each available axis.

In [5]: a.shape
Out[5]: (2, 2)  # both the first and second axis have 2 (columns/rows/pages/blocks/...) data

回答 2

import numpy as np   
>>> np.shape(a)
(2,2)

如果输入不是numpy数组而是列表列表,则也可以使用

>>> a = [[1,2],[1,2]]
>>> np.shape(a)
(2,2)

或元组的元组

>>> a = ((1,2),(1,2))
>>> np.shape(a)
(2,2)
import numpy as np   
>>> np.shape(a)
(2,2)

Also works if the input is not a numpy array but a list of lists

>>> a = [[1,2],[1,2]]
>>> np.shape(a)
(2,2)

Or a tuple of tuples

>>> a = ((1,2),(1,2))
>>> np.shape(a)
(2,2)

回答 3

您可以使用.shape

In: a = np.array([[1,2,3],[4,5,6]])
In: a.shape
Out: (2, 3)
In: a.shape[0] # x axis
Out: 2
In: a.shape[1] # y axis
Out: 3

You can use .shape

In: a = np.array([[1,2,3],[4,5,6]])
In: a.shape
Out: (2, 3)
In: a.shape[0] # x axis
Out: 2
In: a.shape[1] # y axis
Out: 3

回答 4

您可以使用.ndim尺寸并.shape知道确切尺寸

var = np.array([[1,2,3,4,5,6], [1,2,3,4,5,6]])

var.ndim
# displays 2

var.shape
# display 6, 2

您可以使用.reshape功能更改尺寸

var = np.array([[1,2,3,4,5,6], [1,2,3,4,5,6]]).reshape(3,4)

var.ndim
#display 2

var.shape
#display 3, 4

You can use .ndim for dimension and .shape to know the exact dimension

var = np.array([[1,2,3,4,5,6], [1,2,3,4,5,6]])

var.ndim
# displays 2

var.shape
# display 6, 2

You can change the dimension using .reshape function

var = np.array([[1,2,3,4,5,6], [1,2,3,4,5,6]]).reshape(3,4)

var.ndim
#display 2

var.shape
#display 3, 4

回答 5

shape方法要求它a是一个Numpy ndarray。但是Numpy还可以计算纯python对象的可迭代对象的形状:

np.shape([[1,2],[1,2]])

The shape method requires that a be a Numpy ndarray. But Numpy can also calculate the shape of iterables of pure python objects:

np.shape([[1,2],[1,2]])

回答 6

a.shape只是的受限版本np.info()。看一下这个:

import numpy as np
a = np.array([[1,2],[1,2]])
np.info(a)

class:  ndarray
shape:  (2, 2)
strides:  (8, 4)
itemsize:  4
aligned:  True
contiguous:  True
fortran:  False
data pointer: 0x27509cf0560
byteorder:  little
byteswap:  False
type: int32

a.shape is just a limited version of np.info(). Check this out:

import numpy as np
a = np.array([[1,2],[1,2]])
np.info(a)

Out

class:  ndarray
shape:  (2, 2)
strides:  (8, 4)
itemsize:  4
aligned:  True
contiguous:  True
fortran:  False
data pointer: 0x27509cf0560
byteorder:  little
byteswap:  False
type: int32

如何禁用请求库中的日志消息?

问题:如何禁用请求库中的日志消息?

默认情况下,Requests python库按照以下方式将日志消息写入控制台:

Starting new HTTP connection (1): example.com
http://example.com:80 "GET / HTTP/1.1" 200 606

我通常对这些消息不感兴趣,因此想禁用它们。使这些消息静音或降低请求的详细程度的最佳方法是什么?

By default, the Requests python library writes log messages to the console, along the lines of:

Starting new HTTP connection (1): example.com
http://example.com:80 "GET / HTTP/1.1" 200 606

I’m usually not interested in these messages, and would like to disable them. What would be the best way to silence those messages or decrease Requests’ verbosity?


回答 0

我发现了如何配置请求的日志记录级别,这是通过标准的日志记录模块完成的。我决定将其配置为不记录消息,除非它们至少是警告:

import logging

logging.getLogger("requests").setLevel(logging.WARNING)

如果您也希望将此设置应用于urllib3库(通常由请求使用),请添加以下内容:

logging.getLogger("urllib3").setLevel(logging.WARNING)

I found out how to configure requests‘s logging level, it’s done via the standard logging module. I decided to configure it to not log messages unless they are at least warnings:

import logging

logging.getLogger("requests").setLevel(logging.WARNING)

If you wish to apply this setting for the urllib3 library (typically used by requests) too, add the following:

logging.getLogger("urllib3").setLevel(logging.WARNING)

回答 1

如果您是来这里寻找修改任何(可能是深度嵌套的)模块的日志记录的方法,请使用它logging.Logger.manager.loggerDict来获取所有记录器对象的字典。然后,返回的名称可用作以下参数logging.getLogger

import requests
import logging
for key in logging.Logger.manager.loggerDict:
    print(key)
# requests.packages.urllib3.connectionpool
# requests.packages.urllib3.util
# requests.packages
# requests.packages.urllib3
# requests.packages.urllib3.util.retry
# PYREADLINE
# requests
# requests.packages.urllib3.poolmanager

logging.getLogger('requests').setLevel(logging.CRITICAL)
# Could also use the dictionary directly:
# logging.Logger.manager.loggerDict['requests'].setLevel(logging.CRITICAL)

请注意每个用户136036中的注释,此方法仅显示运行上述代码片段时存在的记录器。例如,如果某个模块在实例化一个类时创建了一个新的记录器,则必须创建该类之后放置此代码段以打印其名称。

In case you came here looking for a way to modify logging of any (possibly deeply nested) module, use logging.Logger.manager.loggerDict to get a dictionary of all of the logger objects. The returned names can then be used as the argument to logging.getLogger:

import requests
import logging
for key in logging.Logger.manager.loggerDict:
    print(key)
# requests.packages.urllib3.connectionpool
# requests.packages.urllib3.util
# requests.packages
# requests.packages.urllib3
# requests.packages.urllib3.util.retry
# PYREADLINE
# requests
# requests.packages.urllib3.poolmanager

logging.getLogger('requests').setLevel(logging.CRITICAL)
# Could also use the dictionary directly:
# logging.Logger.manager.loggerDict['requests'].setLevel(logging.CRITICAL)

Per user136036 in a comment, be aware that this method only shows you the loggers that exist at the time you run the above snippet. If, for example, a module creates a new logger when you instantiate a class, then you must put this snippet after creating the class in order to print its name.


回答 2

import logging
urllib3_logger = logging.getLogger('urllib3')
urllib3_logger.setLevel(logging.CRITICAL)

这样,来自urllib3的所有level = INFO消息都不会出现在日志文件中。

因此,您可以继续将level = INFO用作日志消息…只需为您正在使用的库修改它即可。

import logging
urllib3_logger = logging.getLogger('urllib3')
urllib3_logger.setLevel(logging.CRITICAL)

In this way all the messages of level=INFO from urllib3 won’t be present in the logfile.

So you can continue to use the level=INFO for your log messages…just modify this for the library you are using.


回答 3

在遇到类似于您的问题之后,让我复制/粘贴一周或两周前写的文档部分:

import requests
import logging

# these two lines enable debugging at httplib level (requests->urllib3->httplib)
# you will see the REQUEST, including HEADERS and DATA, and RESPONSE with HEADERS but without DATA.
# the only thing missing will be the response.body which is not logged.
import httplib
httplib.HTTPConnection.debuglevel = 1

logging.basicConfig() # you need to initialize logging, otherwise you will not see anything from requests
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True

requests.get('http://httpbin.org/headers')

Let me copy/paste the documentation section which it I wrote about week or two ago, after having a problem similar to yours:

import requests
import logging

# these two lines enable debugging at httplib level (requests->urllib3->httplib)
# you will see the REQUEST, including HEADERS and DATA, and RESPONSE with HEADERS but without DATA.
# the only thing missing will be the response.body which is not logged.
import httplib
httplib.HTTPConnection.debuglevel = 1

logging.basicConfig() # you need to initialize logging, otherwise you will not see anything from requests
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True

requests.get('http://httpbin.org/headers')

回答 4

对于使用的任何人,logging.config.dictConfig您都可以像这样更改字典中的请求库日志级别:

'loggers': {
    '': {
        'handlers': ['file'],
        'level': level,
        'propagate': False
    },
    'requests.packages.urllib3': {
        'handlers': ['file'],
        'level': logging.WARNING
    }
}

For anybody using logging.config.dictConfig you can alter the requests library log level in the dictionary like this:

'loggers': {
    '': {
        'handlers': ['file'],
        'level': level,
        'propagate': False
    },
    'requests.packages.urllib3': {
        'handlers': ['file'],
        'level': logging.WARNING
    }
}

回答 5

将记录器名称设置为requestsrequests.urllib3对我不起作用。我必须指定确切的记录器名称才能更改记录级别。

首先查看您已定义的记录器,以查看要删除的记录器

print(logging.Logger.manager.loggerDict)

您将看到类似以下内容:

{...'urllib3.poolmanager': <logging.Logger object at 0x1070a6e10>, 'django.request': <logging.Logger object at 0x106d61290>, 'django.template': <logging.Logger object at 0x10630dcd0>, 'django.server': <logging.Logger object at 0x106dd6a50>, 'urllib3.connection': <logging.Logger object at 0x10710a350>,'urllib3.connectionpool': <logging.Logger object at 0x106e09690> ...}

然后为确切的记录器配置级别:

   'loggers': {
    '': {
        'handlers': ['default'],
        'level': 'DEBUG',
        'propagate': True
    },
    'urllib3.connectionpool': {
        'handlers': ['default'],
        'level': 'WARNING',
        'propagate' : False
    },

Setting the logger name as requests or requests.urllib3 did not work for me. I had to specify the exact logger name to change the logging level.

First See which loggers you have defined, to see which ones you want to remove

print(logging.Logger.manager.loggerDict)

And you will see something like this:

{...'urllib3.poolmanager': <logging.Logger object at 0x1070a6e10>, 'django.request': <logging.Logger object at 0x106d61290>, 'django.template': <logging.Logger object at 0x10630dcd0>, 'django.server': <logging.Logger object at 0x106dd6a50>, 'urllib3.connection': <logging.Logger object at 0x10710a350>,'urllib3.connectionpool': <logging.Logger object at 0x106e09690> ...}

Then configure the level for the exact logger:

   'loggers': {
    '': {
        'handlers': ['default'],
        'level': 'DEBUG',
        'propagate': True
    },
    'urllib3.connectionpool': {
        'handlers': ['default'],
        'level': 'WARNING',
        'propagate' : False
    },

回答 6

如果您有配置文件,则可以对其进行配置。

在loggers部分中添加urllib3:

[loggers]
keys = root, urllib3

添加logger_urllib3部分:

[logger_urllib3]
level = WARNING
handlers =
qualname = requests.packages.urllib3.connectionpool

If You have configuration file, You can configure it.

Add urllib3 in loggers section:

[loggers]
keys = root, urllib3

Add logger_urllib3 section:

[logger_urllib3]
level = WARNING
handlers =
qualname = requests.packages.urllib3.connectionpool

回答 7

答案在这里: Python:如何抑制来自第三方库的日志记录语句?

您可以保留basicConfig的默认日志记录级别,然后在获取模块的记录器时设置DEBUG级别。

logging.basicConfig(format='%(asctime)s %(module)s %(filename)s:%(lineno)s - %(message)s')
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

logger.debug("my debug message")

This answer is here: Python: how to suppress logging statements from third party libraries?

You can leave the default logging level for basicConfig, and then you set the DEBUG level when you get the logger for your module.

logging.basicConfig(format='%(asctime)s %(module)s %(filename)s:%(lineno)s - %(message)s')
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

logger.debug("my debug message")

回答 8

import logging

# Only show warnings
logging.getLogger("urllib3").setLevel(logging.WARNING)

# Disable all child loggers of urllib3, e.g. urllib3.connectionpool
logging.getLogger("urllib3").propagate = False
import logging

# Only show warnings
logging.getLogger("urllib3").setLevel(logging.WARNING)

# Disable all child loggers of urllib3, e.g. urllib3.connectionpool
logging.getLogger("urllib3").propagate = False

回答 9

Kbrose关于查找哪个记录器正在生成日志消息的指南非常有用。对于我的Django项目,我不得不对120种不同的记录器进行分类,直到发现是elasticsearchPython库对我造成了问题。根据大多数问题的指导,我通过将其添加到记录器中来禁用了它:

      ...
      'elasticsearch': {
          'handlers': ['console'],
          'level': logging.WARNING,
      },     
      ...

如果有人在运行Elasticsearch查询时看到其他无用的日志消息,请在此处发布。

Kbrose’s guidance on finding which logger was generating log messages was immensely useful. For my Django project, I had to sort through 120 different loggers until I found that it was the elasticsearch Python library that was causing issues for me. As per the guidance in most of the questions, I disabled it by adding this to my loggers:

      ...
      'elasticsearch': {
          'handlers': ['console'],
          'level': logging.WARNING,
      },     
      ...

Posting here in case someone else is seeing the unhelpful log messages come through whenever they run an Elasticsearch query.


回答 10

简单:只需在requests.packages.urllib3.disable_warnings()之后添加import requests

simple: just add requests.packages.urllib3.disable_warnings() after import requests


回答 11

我不确定以前的方法是否已停止工作,但是无论如何,这是消除警告的另一种方法:

PYTHONWARNINGS="ignore:Unverified HTTPS request" ./do-insecure-request.py

基本上,在脚本执行的上下文中添加环境变量。

从文档中:https : //urllib3.readthedocs.org/en/latest/security.html#disabling-warnings

I’m not sure if the previous approaches have stopped working, but in any case, here’s another way of removing the warnings:

PYTHONWARNINGS="ignore:Unverified HTTPS request" ./do-insecure-request.py

Basically, adding an environment variable in the context of the script execution.

From the documentation: https://urllib3.readthedocs.org/en/latest/security.html#disabling-warnings


如何使用请求下载图像

问题:如何使用请求下载图像

我正在尝试使用python的requests模块从网络下载并保存图像。

这是我使用的(工作)代码:

img = urllib2.urlopen(settings.STATICMAP_URL.format(**data))
with open(path, 'w') as f:
    f.write(img.read())

这是使用requests以下代码的新代码(无效):

r = requests.get(settings.STATICMAP_URL.format(**data))
if r.status_code == 200:
    img = r.raw.read()
    with open(path, 'w') as f:
        f.write(img)

您能帮助我从响应中使用什么属性requests吗?

I’m trying to download and save an image from the web using python’s requests module.

Here is the (working) code I used:

img = urllib2.urlopen(settings.STATICMAP_URL.format(**data))
with open(path, 'w') as f:
    f.write(img.read())

Here is the new (non-working) code using requests:

r = requests.get(settings.STATICMAP_URL.format(**data))
if r.status_code == 200:
    img = r.raw.read()
    with open(path, 'w') as f:
        f.write(img)

Can you help me on what attribute from the response to use from requests?


回答 0

您可以使用response.rawfile对象,也可以遍历响应。

response.raw默认情况下,使用类似文件的对象不会解码压缩的响应(使用GZIP或deflate)。您可以通过将decode_content属性设置为Truerequests将其设置False为控制自身解码)来强制为您解压缩。然后,您可以使用shutil.copyfileobj()Python将数据流式传输到文件对象:

import requests
import shutil

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        r.raw.decode_content = True
        shutil.copyfileobj(r.raw, f)        

要遍历响应,请使用循环;这样迭代可确保在此阶段对数据进行解压缩:

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        for chunk in r:
            f.write(chunk)

这将以128字节的块读取数据;如果您觉得其他块大小更好,请使用具有自定义块大小的Response.iter_content()方法

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        for chunk in r.iter_content(1024):
            f.write(chunk)

请注意,您需要以二进制模式打开目标文件,以确保python不会尝试为您翻译换行符。我们还设置stream=Truerequests不先将整个图像下载到内存中。

You can either use the response.raw file object, or iterate over the response.

To use the response.raw file-like object will not, by default, decode compressed responses (with GZIP or deflate). You can force it to decompress for you anyway by setting the decode_content attribute to True (requests sets it to False to control decoding itself). You can then use shutil.copyfileobj() to have Python stream the data to a file object:

import requests
import shutil

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        r.raw.decode_content = True
        shutil.copyfileobj(r.raw, f)        

To iterate over the response use a loop; iterating like this ensures that data is decompressed by this stage:

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        for chunk in r:
            f.write(chunk)

This’ll read the data in 128 byte chunks; if you feel another chunk size works better, use the Response.iter_content() method with a custom chunk size:

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        for chunk in r.iter_content(1024):
            f.write(chunk)

Note that you need to open the destination file in binary mode to ensure python doesn’t try and translate newlines for you. We also set stream=True so that requests doesn’t download the whole image into memory first.


回答 1

从请求中获取类似文件的对象,然后将其复制到文件中。这也将避免将整个事件立即读入内存。

import shutil

import requests

url = 'http://example.com/img.png'
response = requests.get(url, stream=True)
with open('img.png', 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
del response

Get a file-like object from the request and copy it to a file. This will also avoid reading the whole thing into memory at once.

import shutil

import requests

url = 'http://example.com/img.png'
response = requests.get(url, stream=True)
with open('img.png', 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
del response

回答 2

怎么样,一个快速的解决方案。

import requests

url = "http://craphound.com/images/1006884_2adf8fc7.jpg"
response = requests.get(url)
if response.status_code == 200:
    with open("/Users/apple/Desktop/sample.jpg", 'wb') as f:
        f.write(response.content)

How about this, a quick solution.

import requests

url = "http://craphound.com/images/1006884_2adf8fc7.jpg"
response = requests.get(url)
if response.status_code == 200:
    with open("/Users/apple/Desktop/sample.jpg", 'wb') as f:
        f.write(response.content)

回答 3

我同样需要使用请求下载图像。我首先尝试了Martijn Pieters的答案,并且效果很好。但是,当我对该简单函数进行概要分析时,发现与urllib和urllib2相比,它使用了许多函数调用。

然后,我尝试了请求模块的作者推荐方式

import requests
from PIL import Image
# python2.x, use this instead  
# from StringIO import StringIO
# for python3.x,
from io import StringIO

r = requests.get('https://example.com/image.jpg')
i = Image.open(StringIO(r.content))

这大大减少了函数调用的次数,从而加快了我的应用程序的速度。这是我的探查器的代码和结果。

#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile

def testRequest():
    image_name = 'test1.jpg'
    url = 'http://example.com/image.jpg'

    r = requests.get(url, stream=True)
    with open(image_name, 'wb') as f:
        for chunk in r.iter_content():
            f.write(chunk)

def testRequest2():
    image_name = 'test2.jpg'
    url = 'http://example.com/image.jpg'

    r = requests.get(url)

    i = Image.open(StringIO(r.content))
    i.save(image_name)

if __name__ == '__main__':
    profile.run('testUrllib()')
    profile.run('testUrllib2()')
    profile.run('testRequest()')

testRequest的结果:

343080 function calls (343068 primitive calls) in 2.580 seconds

以及testRequest2的结果:

3129 function calls (3105 primitive calls) in 0.024 seconds

I have the same need for downloading images using requests. I first tried the answer of Martijn Pieters, and it works well. But when I did a profile on this simple function, I found that it uses so many function calls compared to urllib and urllib2.

I then tried the way recommended by the author of requests module:

import requests
from PIL import Image
# python2.x, use this instead  
# from StringIO import StringIO
# for python3.x,
from io import StringIO

r = requests.get('https://example.com/image.jpg')
i = Image.open(StringIO(r.content))

This much more reduced the number of function calls, thus speeded up my application. Here is the code of my profiler and the result.

#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile

def testRequest():
    image_name = 'test1.jpg'
    url = 'http://example.com/image.jpg'

    r = requests.get(url, stream=True)
    with open(image_name, 'wb') as f:
        for chunk in r.iter_content():
            f.write(chunk)

def testRequest2():
    image_name = 'test2.jpg'
    url = 'http://example.com/image.jpg'

    r = requests.get(url)

    i = Image.open(StringIO(r.content))
    i.save(image_name)

if __name__ == '__main__':
    profile.run('testUrllib()')
    profile.run('testUrllib2()')
    profile.run('testRequest()')

The result for testRequest:

343080 function calls (343068 primitive calls) in 2.580 seconds

And the result for testRequest2:

3129 function calls (3105 primitive calls) in 0.024 seconds

回答 4

这可能比使用容易requests。这是我唯一一次建议不要使用requestsHTTP的东西。

二班轮使用urllib

>>> import urllib
>>> urllib.request.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

还有一个名为Python的漂亮模块wget,非常易于使用。在这里找到。

这证明了设计的简单性:

>>> import wget
>>> url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
>>> filename = wget.download(url)
100% [................................................] 3841532 / 3841532>
>> filename
'razorback.mp3'

请享用。

编辑:您还可以添加out参数以指定路径。

>>> out_filepath = <output_filepath>    
>>> filename = wget.download(url, out=out_filepath)

This might be easier than using requests. This is the only time I’ll ever suggest not using requests to do HTTP stuff.

Two liner using urllib:

>>> import urllib
>>> urllib.request.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

There is also a nice Python module named wget that is pretty easy to use. Found here.

This demonstrates the simplicity of the design:

>>> import wget
>>> url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
>>> filename = wget.download(url)
100% [................................................] 3841532 / 3841532>
>> filename
'razorback.mp3'

Enjoy.

Edit: You can also add an out parameter to specify a path.

>>> out_filepath = <output_filepath>    
>>> filename = wget.download(url, out=out_filepath)

回答 5

以下代码段下载文件。

该文件以其文件名保存在指定的url中。

import requests

url = "http://example.com/image.jpg"
filename = url.split("/")[-1]
r = requests.get(url, timeout=0.5)

if r.status_code == 200:
    with open(filename, 'wb') as f:
        f.write(r.content)

Following code snippet downloads a file.

The file is saved with its filename as in specified url.

import requests

url = "http://example.com/image.jpg"
filename = url.split("/")[-1]
r = requests.get(url, timeout=0.5)

if r.status_code == 200:
    with open(filename, 'wb') as f:
        f.write(r.content)

回答 6

有两种主要方法:

  1. 使用.content(最简单/官方的)(请参见Zhenyi Zhang的答案):

    import io  # Note: io.BytesIO is StringIO.StringIO on Python2.
    import requests
    
    r = requests.get('http://lorempixel.com/400/200')
    r.raise_for_status()
    with io.BytesIO(r.content) as f:
        with Image.open(f) as img:
            img.show()
  2. 使用.raw(请参阅Martijn Pieters的答案):

    import requests
    
    r = requests.get('http://lorempixel.com/400/200', stream=True)
    r.raise_for_status()
    r.raw.decode_content = True  # Required to decompress gzip/deflate compressed responses.
    with PIL.Image.open(r.raw) as img:
        img.show()
    r.close()  # Safety when stream=True ensure the connection is released.

两者的时间都没有明显差异。

There are 2 main ways:

  1. Using .content (simplest/official) (see Zhenyi Zhang’s answer):

    import io  # Note: io.BytesIO is StringIO.StringIO on Python2.
    import requests
    
    r = requests.get('http://lorempixel.com/400/200')
    r.raise_for_status()
    with io.BytesIO(r.content) as f:
        with Image.open(f) as img:
            img.show()
    
  2. Using .raw (see Martijn Pieters’s answer):

    import requests
    
    r = requests.get('http://lorempixel.com/400/200', stream=True)
    r.raise_for_status()
    r.raw.decode_content = True  # Required to decompress gzip/deflate compressed responses.
    with PIL.Image.open(r.raw) as img:
        img.show()
    r.close()  # Safety when stream=True ensure the connection is released.
    

Timing both shows no noticeable difference.


回答 7

就像导入图像和请求一样容易

from PIL import Image
import requests

img = Image.open(requests.get(url, stream = True).raw)
img.save('img1.jpg')

As easy as to import Image and requests

from PIL import Image
import requests

img = Image.open(requests.get(url, stream = True).raw)
img.save('img1.jpg')

回答 8

这是一个更加用户友好的答案,仍然使用流式传输。

只需定义这些函数并调用即可getImage()。默认情况下,它将使用与url相同的文件名并写入当前目录,但是两者都可以更改。

import requests
from StringIO import StringIO
from PIL import Image

def createFilename(url, name, folder):
    dotSplit = url.split('.')
    if name == None:
        # use the same as the url
        slashSplit = dotSplit[-2].split('/')
        name = slashSplit[-1]
    ext = dotSplit[-1]
    file = '{}{}.{}'.format(folder, name, ext)
    return file

def getImage(url, name=None, folder='./'):
    file = createFilename(url, name, folder)
    with open(file, 'wb') as f:
        r = requests.get(url, stream=True)
        for block in r.iter_content(1024):
            if not block:
                break
            f.write(block)

def getImageFast(url, name=None, folder='./'):
    file = createFilename(url, name, folder)
    r = requests.get(url)
    i = Image.open(StringIO(r.content))
    i.save(file)

if __name__ == '__main__':
    # Uses Less Memory
    getImage('http://www.example.com/image.jpg')
    # Faster
    getImageFast('http://www.example.com/image.jpg')

request胆量getImage()是根据这里的答案而的胆量getImageFast()是根据以上答案。

Here is a more user-friendly answer that still uses streaming.

Just define these functions and call getImage(). It will use the same file name as the url and write to the current directory by default, but both can be changed.

import requests
from StringIO import StringIO
from PIL import Image

def createFilename(url, name, folder):
    dotSplit = url.split('.')
    if name == None:
        # use the same as the url
        slashSplit = dotSplit[-2].split('/')
        name = slashSplit[-1]
    ext = dotSplit[-1]
    file = '{}{}.{}'.format(folder, name, ext)
    return file

def getImage(url, name=None, folder='./'):
    file = createFilename(url, name, folder)
    with open(file, 'wb') as f:
        r = requests.get(url, stream=True)
        for block in r.iter_content(1024):
            if not block:
                break
            f.write(block)

def getImageFast(url, name=None, folder='./'):
    file = createFilename(url, name, folder)
    r = requests.get(url)
    i = Image.open(StringIO(r.content))
    i.save(file)

if __name__ == '__main__':
    # Uses Less Memory
    getImage('http://www.example.com/image.jpg')
    # Faster
    getImageFast('http://www.example.com/image.jpg')

The request guts of getImage() are based on the answer here and the guts of getImageFast() are based on the answer above.


回答 9

我将发布答案,因为我没有足够的代表发表评论,但是使用Blairg23发布的wget,您还可以为路径提供out参数。

 wget.download(url, out=path)

I’m going to post an answer as I don’t have enough rep to make a comment, but with wget as posted by Blairg23, you can also provide an out parameter for the path.

 wget.download(url, out=path)

回答 10

这是谷歌搜索有关如何下载带有请求的二进制文件的第一个响应。如果您需要下载包含请求的任意文件,可以使用:

import requests
url = 'https://s3.amazonaws.com/lab-data-collections/GoogleNews-vectors-negative300.bin.gz'
open('GoogleNews-vectors-negative300.bin.gz', 'wb').write(requests.get(url, allow_redirects=True).content)

This is the first response that comes up for google searches on how to download a binary file with requests. In case you need to download an arbitrary file with requests, you can use:

import requests
url = 'https://s3.amazonaws.com/lab-data-collections/GoogleNews-vectors-negative300.bin.gz'
open('GoogleNews-vectors-negative300.bin.gz', 'wb').write(requests.get(url, allow_redirects=True).content)

回答 11

这就是我做的

import requests
from PIL import Image
from io import BytesIO

url = 'your_url'
files = {'file': ("C:/Users/shadow/Downloads/black.jpeg", open('C:/Users/shadow/Downloads/black.jpeg', 'rb'),'image/jpg')}
response = requests.post(url, files=files)

img = Image.open(BytesIO(response.content))
img.show()

This is how I did it

import requests
from PIL import Image
from io import BytesIO

url = 'your_url'
files = {'file': ("C:/Users/shadow/Downloads/black.jpeg", open('C:/Users/shadow/Downloads/black.jpeg', 'rb'),'image/jpg')}
response = requests.post(url, files=files)

img = Image.open(BytesIO(response.content))
img.show()

回答 12

您可以执行以下操作:

import requests
import random

url = "https://images.pexels.com/photos/1308881/pexels-photo-1308881.jpeg? auto=compress&cs=tinysrgb&dpr=1&w=500"
name=random.randrange(1,1000)
filename=str(name)+".jpg"
response = requests.get(url)
if response.status_code.ok:
   with open(filename,'w') as f:
    f.write(response.content)

You can do something like this:

import requests
import random

url = "https://images.pexels.com/photos/1308881/pexels-photo-1308881.jpeg? auto=compress&cs=tinysrgb&dpr=1&w=500"
name=random.randrange(1,1000)
filename=str(name)+".jpg"
response = requests.get(url)
if response.status_code.ok:
   with open(filename,'w') as f:
    f.write(response.content)

在Python中遍历一系列日期

问题:在Python中遍历一系列日期

我有以下代码可以做到这一点,但是我该如何做得更好呢?现在,我认为它比嵌套循环更好,但是当列表理解器中包含生成器时,它开始变得Perl-linerish。

day_count = (end_date - start_date).days + 1
for single_date in [d for d in (start_date + timedelta(n) for n in range(day_count)) if d <= end_date]:
    print strftime("%Y-%m-%d", single_date.timetuple())

笔记

  • 我实际上并没有用它来打印。这只是出于演示目的。
  • start_dateend_date变量是datetime.date因为我不需要时间戳对象。(它们将用于生成报告)。

样本输出

开始日期为2009-05-30,结束日期为2009-06-09

2009-05-30
2009-05-31
2009-06-01
2009-06-02
2009-06-03
2009-06-04
2009-06-05
2009-06-06
2009-06-07
2009-06-08
2009-06-09

I have the following code to do this, but how can I do it better? Right now I think it’s better than nested loops, but it starts to get Perl-one-linerish when you have a generator in a list comprehension.

day_count = (end_date - start_date).days + 1
for single_date in [d for d in (start_date + timedelta(n) for n in range(day_count)) if d <= end_date]:
    print strftime("%Y-%m-%d", single_date.timetuple())

Notes

  • I’m not actually using this to print. That’s just for demo purposes.
  • The start_date and end_date variables are datetime.date objects because I don’t need the timestamps. (They’re going to be used to generate a report).

Sample Output

For a start date of 2009-05-30 and an end date of 2009-06-09:

2009-05-30
2009-05-31
2009-06-01
2009-06-02
2009-06-03
2009-06-04
2009-06-05
2009-06-06
2009-06-07
2009-06-08
2009-06-09

回答 0

为什么会有两个嵌套的迭代?对我来说,它仅需一次迭代即可生成相同的数据列表:

for single_date in (start_date + timedelta(n) for n in range(day_count)):
    print ...

而且没有存储任何列表,仅迭代了一个生成器。同样,生成器中的“ if”似乎不必要。

毕竟,线性序列只需要一个迭代器,而不是两个。

与John Machin讨论后更新:

也许最优雅的解决方案是使用生成器函数完全隐藏/抽象日期范围内的迭代:

from datetime import timedelta, date

def daterange(start_date, end_date):
    for n in range(int ((end_date - start_date).days)):
        yield start_date + timedelta(n)

start_date = date(2013, 1, 1)
end_date = date(2015, 6, 2)
for single_date in daterange(start_date, end_date):
    print(single_date.strftime("%Y-%m-%d"))

注意:为了与内置range()函数保持一致,此迭代到达之前停止end_date。因此,对于包容迭代,请使用第二天,就像使用一样range()

Why are there two nested iterations? For me it produces the same list of data with only one iteration:

for single_date in (start_date + timedelta(n) for n in range(day_count)):
    print ...

And no list gets stored, only one generator is iterated over. Also the “if” in the generator seems to be unnecessary.

After all, a linear sequence should only require one iterator, not two.

Update after discussion with John Machin:

Maybe the most elegant solution is using a generator function to completely hide/abstract the iteration over the range of dates:

from datetime import timedelta, date

def daterange(start_date, end_date):
    for n in range(int ((end_date - start_date).days)):
        yield start_date + timedelta(n)

start_date = date(2013, 1, 1)
end_date = date(2015, 6, 2)
for single_date in daterange(start_date, end_date):
    print(single_date.strftime("%Y-%m-%d"))

NB: For consistency with the built-in range() function this iteration stops before reaching the end_date. So for inclusive iteration use the next day, as you would with range().


回答 1

这可能更清楚:

from datetime import date, timedelta

start_date = date(2019, 1, 1)
end_date = date(2020, 1, 1)
delta = timedelta(days=1)
while start_date <= end_date:
    print (start_date.strftime("%Y-%m-%d"))
    start_date += delta

This might be more clear:

from datetime import date, timedelta

start_date = date(2019, 1, 1)
end_date = date(2020, 1, 1)
delta = timedelta(days=1)
while start_date <= end_date:
    print (start_date.strftime("%Y-%m-%d"))
    start_date += delta

回答 2

使用dateutil库:

from datetime import date
from dateutil.rrule import rrule, DAILY

a = date(2009, 5, 30)
b = date(2009, 6, 9)

for dt in rrule(DAILY, dtstart=a, until=b):
    print dt.strftime("%Y-%m-%d")

该python库具有许多更高级的功能,例如relative deltas 等非常有用的功能,并以单个文件(模块)的形式实现,很容易包含在项目中。

Use the dateutil library:

from datetime import date
from dateutil.rrule import rrule, DAILY

a = date(2009, 5, 30)
b = date(2009, 6, 9)

for dt in rrule(DAILY, dtstart=a, until=b):
    print dt.strftime("%Y-%m-%d")

This python library has many more advanced features, some very useful, like relative deltas—and is implemented as a single file (module) that’s easily included into a project.


回答 3

总体而言,Pandas非常适合用于时间序列,并且直接支持日期范围。

import pandas as pd
daterange = pd.date_range(start_date, end_date)

然后,您可以遍历日期范围以打印日期:

for single_date in daterange:
    print (single_date.strftime("%Y-%m-%d"))

它还有很多选择可以使生活更轻松。例如,如果您只想要工作日,则只需交换bdate_range。请参阅http://pandas.pydata.org/pandas-docs/stable/timeseries.html#generating-ranges-of-timestamps

Pandas的强大功能实际上就是其数据帧,它支持矢量化操作(非常类似于numpy),使跨大量数据的操作变得非常快速和容易。

编辑:您也可以完全跳过for循环,而直接直接打印它,这更容易,更有效:

print(daterange)

Pandas is great for time series in general, and has direct support for date ranges.

import pandas as pd
daterange = pd.date_range(start_date, end_date)

You can then loop over the daterange to print the date:

for single_date in daterange:
    print (single_date.strftime("%Y-%m-%d"))

It also has lots of options to make life easier. For example if you only wanted weekdays, you would just swap in bdate_range. See http://pandas.pydata.org/pandas-docs/stable/timeseries.html#generating-ranges-of-timestamps

The power of Pandas is really its dataframes, which support vectorized operations (much like numpy) that make operations across large quantities of data very fast and easy.

EDIT: You could also completely skip the for loop and just print it directly, which is easier and more efficient:

print(daterange)

回答 4

import datetime

def daterange(start, stop, step=datetime.timedelta(days=1), inclusive=False):
  # inclusive=False to behave like range by default
  if step.days > 0:
    while start < stop:
      yield start
      start = start + step
      # not +=! don't modify object passed in if it's mutable
      # since this function is not restricted to
      # only types from datetime module
  elif step.days < 0:
    while start > stop:
      yield start
      start = start + step
  if inclusive and start == stop:
    yield start

# ...

for date in daterange(start_date, end_date, inclusive=True):
  print strftime("%Y-%m-%d", date.timetuple())

通过支持负步等,此函数可以完成超出您严格要求的范围。只要您考虑范围逻辑,就不需要单独使用day_count,最重要的是,当您从多个函数中调用函数时,代码变得更易于阅读的地方。

import datetime

def daterange(start, stop, step=datetime.timedelta(days=1), inclusive=False):
  # inclusive=False to behave like range by default
  if step.days > 0:
    while start < stop:
      yield start
      start = start + step
      # not +=! don't modify object passed in if it's mutable
      # since this function is not restricted to
      # only types from datetime module
  elif step.days < 0:
    while start > stop:
      yield start
      start = start + step
  if inclusive and start == stop:
    yield start

# ...

for date in daterange(start_date, end_date, inclusive=True):
  print strftime("%Y-%m-%d", date.timetuple())

This function does more than you strictly require, by supporting negative step, etc. As long as you factor out your range logic, then you don’t need the separate day_count and most importantly the code becomes easier to read as you call the function from multiple places.


回答 5

这是我能想到的最易读的解决方案。

import datetime

def daterange(start, end, step=datetime.timedelta(1)):
    curr = start
    while curr < end:
        yield curr
        curr += step

This is the most human-readable solution I can think of.

import datetime

def daterange(start, end, step=datetime.timedelta(1)):
    curr = start
    while curr < end:
        yield curr
        curr += step

回答 6

为什么不尝试:

import datetime as dt

start_date = dt.datetime(2012, 12,1)
end_date = dt.datetime(2012, 12,5)

total_days = (end_date - start_date).days + 1 #inclusive 5 days

for day_number in range(total_days):
    current_date = (start_date + dt.timedelta(days = day_number)).date()
    print current_date

Why not try:

import datetime as dt

start_date = dt.datetime(2012, 12,1)
end_date = dt.datetime(2012, 12,5)

total_days = (end_date - start_date).days + 1 #inclusive 5 days

for day_number in range(total_days):
    current_date = (start_date + dt.timedelta(days = day_number)).date()
    print current_date

回答 7

Numpy arange函数可以应用于日期:

import numpy as np
from datetime import datetime, timedelta
d0 = datetime(2009, 1,1)
d1 = datetime(2010, 1,1)
dt = timedelta(days = 1)
dates = np.arange(d0, d1, dt).astype(datetime)

的用途astype是将转换numpy.datetime64datetime.datetime对象数组。

Numpy’s arange function can be applied to dates:

import numpy as np
from datetime import datetime, timedelta
d0 = datetime(2009, 1,1)
d1 = datetime(2010, 1,1)
dt = timedelta(days = 1)
dates = np.arange(d0, d1, dt).astype(datetime)

The use of astype is to convert from numpy.datetime64 to an array of datetime.datetime objects.


回答 8

显示从今天开始的最近n天:

import datetime
for i in range(0, 100):
    print((datetime.date.today() + datetime.timedelta(i)).isoformat())

输出:

2016-06-29
2016-06-30
2016-07-01
2016-07-02
2016-07-03
2016-07-04

Show the last n days from today:

import datetime
for i in range(0, 100):
    print((datetime.date.today() + datetime.timedelta(i)).isoformat())

Output:

2016-06-29
2016-06-30
2016-07-01
2016-07-02
2016-07-03
2016-07-04

回答 9

import datetime

def daterange(start, stop, step_days=1):
    current = start
    step = datetime.timedelta(step_days)
    if step_days > 0:
        while current < stop:
            yield current
            current += step
    elif step_days < 0:
        while current > stop:
            yield current
            current += step
    else:
        raise ValueError("daterange() step_days argument must not be zero")

if __name__ == "__main__":
    from pprint import pprint as pp
    lo = datetime.date(2008, 12, 27)
    hi = datetime.date(2009, 1, 5)
    pp(list(daterange(lo, hi)))
    pp(list(daterange(hi, lo, -1)))
    pp(list(daterange(lo, hi, 7)))
    pp(list(daterange(hi, lo, -7))) 
    assert not list(daterange(lo, hi, -1))
    assert not list(daterange(hi, lo))
    assert not list(daterange(lo, hi, -7))
    assert not list(daterange(hi, lo, 7)) 
import datetime

def daterange(start, stop, step_days=1):
    current = start
    step = datetime.timedelta(step_days)
    if step_days > 0:
        while current < stop:
            yield current
            current += step
    elif step_days < 0:
        while current > stop:
            yield current
            current += step
    else:
        raise ValueError("daterange() step_days argument must not be zero")

if __name__ == "__main__":
    from pprint import pprint as pp
    lo = datetime.date(2008, 12, 27)
    hi = datetime.date(2009, 1, 5)
    pp(list(daterange(lo, hi)))
    pp(list(daterange(hi, lo, -1)))
    pp(list(daterange(lo, hi, 7)))
    pp(list(daterange(hi, lo, -7))) 
    assert not list(daterange(lo, hi, -1))
    assert not list(daterange(hi, lo))
    assert not list(daterange(lo, hi, -7))
    assert not list(daterange(hi, lo, 7)) 

回答 10

for i in range(16):
    print datetime.date.today() + datetime.timedelta(days=i)
for i in range(16):
    print datetime.date.today() + datetime.timedelta(days=i)

回答 11

为了完整起见,Pandas还提供了一个period_range超出范围的时间戳功能:

import pandas as pd

pd.period_range(start='1/1/1626', end='1/08/1627', freq='D')

For completeness, Pandas also has a period_range function for timestamps that are out of bounds:

import pandas as pd

pd.period_range(start='1/1/1626', end='1/08/1627', freq='D')

回答 12

我有一个类似的问题,但是我需要每月而不是每天进行迭代。

这是我的解决方案

import calendar
from datetime import datetime, timedelta

def days_in_month(dt):
    return calendar.monthrange(dt.year, dt.month)[1]

def monthly_range(dt_start, dt_end):
    forward = dt_end >= dt_start
    finish = False
    dt = dt_start

    while not finish:
        yield dt.date()
        if forward:
            days = days_in_month(dt)
            dt = dt + timedelta(days=days)            
            finish = dt > dt_end
        else:
            _tmp_dt = dt.replace(day=1) - timedelta(days=1)
            dt = (_tmp_dt.replace(day=dt.day))
            finish = dt < dt_end

例子1

date_start = datetime(2016, 6, 1)
date_end = datetime(2017, 1, 1)

for p in monthly_range(date_start, date_end):
    print(p)

输出量

2016-06-01
2016-07-01
2016-08-01
2016-09-01
2016-10-01
2016-11-01
2016-12-01
2017-01-01

范例#2

date_start = datetime(2017, 1, 1)
date_end = datetime(2016, 6, 1)

for p in monthly_range(date_start, date_end):
    print(p)

输出量

2017-01-01
2016-12-01
2016-11-01
2016-10-01
2016-09-01
2016-08-01
2016-07-01
2016-06-01

I have a similar problem, but I need to iterate monthly instead of daily.

This is my solution

import calendar
from datetime import datetime, timedelta

def days_in_month(dt):
    return calendar.monthrange(dt.year, dt.month)[1]

def monthly_range(dt_start, dt_end):
    forward = dt_end >= dt_start
    finish = False
    dt = dt_start

    while not finish:
        yield dt.date()
        if forward:
            days = days_in_month(dt)
            dt = dt + timedelta(days=days)            
            finish = dt > dt_end
        else:
            _tmp_dt = dt.replace(day=1) - timedelta(days=1)
            dt = (_tmp_dt.replace(day=dt.day))
            finish = dt < dt_end

Example #1

date_start = datetime(2016, 6, 1)
date_end = datetime(2017, 1, 1)

for p in monthly_range(date_start, date_end):
    print(p)

Output

2016-06-01
2016-07-01
2016-08-01
2016-09-01
2016-10-01
2016-11-01
2016-12-01
2017-01-01

Example #2

date_start = datetime(2017, 1, 1)
date_end = datetime(2016, 6, 1)

for p in monthly_range(date_start, date_end):
    print(p)

Output

2017-01-01
2016-12-01
2016-11-01
2016-10-01
2016-09-01
2016-08-01
2016-07-01
2016-06-01

回答 13

可以“T *相信这个问题已经存在了9年,没有任何人暗示一个简单的递归函数:

from datetime import datetime, timedelta

def walk_days(start_date, end_date):
    if start_date <= end_date:
        print(start_date.strftime("%Y-%m-%d"))
        next_date = start_date + timedelta(days=1)
        walk_days(next_date, end_date)

#demo
start_date = datetime(2009, 5, 30)
end_date   = datetime(2009, 6, 9)

walk_days(start_date, end_date)

输出:

2009-05-30
2009-05-31
2009-06-01
2009-06-02
2009-06-03
2009-06-04
2009-06-05
2009-06-06
2009-06-07
2009-06-08
2009-06-09

编辑: *现在我可以相信-请参阅Python是否优化了尾递归?。谢谢你

Can‘t* believe this question has existed for 9 years without anyone suggesting a simple recursive function:

from datetime import datetime, timedelta

def walk_days(start_date, end_date):
    if start_date <= end_date:
        print(start_date.strftime("%Y-%m-%d"))
        next_date = start_date + timedelta(days=1)
        walk_days(next_date, end_date)

#demo
start_date = datetime(2009, 5, 30)
end_date   = datetime(2009, 6, 9)

walk_days(start_date, end_date)

Output:

2009-05-30
2009-05-31
2009-06-01
2009-06-02
2009-06-03
2009-06-04
2009-06-05
2009-06-06
2009-06-07
2009-06-08
2009-06-09

Edit: *Now I can believe it — see Does Python optimize tail recursion? . Thank you Tim.


回答 14

您可以使用pandas库简单可靠地生成两个日期之间的一系列日期

import pandas as pd

print pd.date_range(start='1/1/2010', end='1/08/2018', freq='M')

您可以通过将频率设置为D,M,Q,Y(每天,每月,每季度,每年)来更改生成日期的频率。

You can generate a series of date between two dates using the pandas library simply and trustfully

import pandas as pd

print pd.date_range(start='1/1/2010', end='1/08/2018', freq='M')

You can change the frequency of generating dates by setting freq as D, M, Q, Y (daily, monthly, quarterly, yearly )


回答 15

> pip install DateTimeRange

from datetimerange import DateTimeRange

def dateRange(start, end, step):
        rangeList = []
        time_range = DateTimeRange(start, end)
        for value in time_range.range(datetime.timedelta(days=step)):
            rangeList.append(value.strftime('%m/%d/%Y'))
        return rangeList

    dateRange("2018-09-07", "2018-12-25", 7)  

    Out[92]: 
    ['09/07/2018',
     '09/14/2018',
     '09/21/2018',
     '09/28/2018',
     '10/05/2018',
     '10/12/2018',
     '10/19/2018',
     '10/26/2018',
     '11/02/2018',
     '11/09/2018',
     '11/16/2018',
     '11/23/2018',
     '11/30/2018',
     '12/07/2018',
     '12/14/2018',
     '12/21/2018']
> pip install DateTimeRange

from datetimerange import DateTimeRange

def dateRange(start, end, step):
        rangeList = []
        time_range = DateTimeRange(start, end)
        for value in time_range.range(datetime.timedelta(days=step)):
            rangeList.append(value.strftime('%m/%d/%Y'))
        return rangeList

    dateRange("2018-09-07", "2018-12-25", 7)  

    Out[92]: 
    ['09/07/2018',
     '09/14/2018',
     '09/21/2018',
     '09/28/2018',
     '10/05/2018',
     '10/12/2018',
     '10/19/2018',
     '10/26/2018',
     '11/02/2018',
     '11/09/2018',
     '11/16/2018',
     '11/23/2018',
     '11/30/2018',
     '12/07/2018',
     '12/14/2018',
     '12/21/2018']

回答 16

此功能有一些额外的功能:

  • 可以传递与DATE_FORMAT匹配的字符串作为开始或结束,并将其转换为日期对象
  • 可以传递日期对象作为开始或结束
  • 如果结束比开始更早,则进行错误检查

    import datetime
    from datetime import timedelta
    
    
    DATE_FORMAT = '%Y/%m/%d'
    
    def daterange(start, end):
          def convert(date):
                try:
                      date = datetime.datetime.strptime(date, DATE_FORMAT)
                      return date.date()
                except TypeError:
                      return date
    
          def get_date(n):
                return datetime.datetime.strftime(convert(start) + timedelta(days=n), DATE_FORMAT)
    
          days = (convert(end) - convert(start)).days
          if days <= 0:
                raise ValueError('The start date must be before the end date.')
          for n in range(0, days):
                yield get_date(n)
    
    
    start = '2014/12/1'
    end = '2014/12/31'
    print list(daterange(start, end))
    
    start_ = datetime.date.today()
    end = '2015/12/1'
    print list(daterange(start, end))

This function has some extra features:

  • can pass a string matching the DATE_FORMAT for start or end and it is converted to a date object
  • can pass a date object for start or end
  • error checking in case the end is older than the start

    import datetime
    from datetime import timedelta
    
    
    DATE_FORMAT = '%Y/%m/%d'
    
    def daterange(start, end):
          def convert(date):
                try:
                      date = datetime.datetime.strptime(date, DATE_FORMAT)
                      return date.date()
                except TypeError:
                      return date
    
          def get_date(n):
                return datetime.datetime.strftime(convert(start) + timedelta(days=n), DATE_FORMAT)
    
          days = (convert(end) - convert(start)).days
          if days <= 0:
                raise ValueError('The start date must be before the end date.')
          for n in range(0, days):
                yield get_date(n)
    
    
    start = '2014/12/1'
    end = '2014/12/31'
    print list(daterange(start, end))
    
    start_ = datetime.date.today()
    end = '2015/12/1'
    print list(daterange(start, end))
    

回答 17

这是通用日期范围函数的代码,类似于Ber的答案,但更灵活:

def count_timedelta(delta, step, seconds_in_interval):
    """Helper function for iterate.  Finds the number of intervals in the timedelta."""
    return int(delta.total_seconds() / (seconds_in_interval * step))


def range_dt(start, end, step=1, interval='day'):
    """Iterate over datetimes or dates, similar to builtin range."""
    intervals = functools.partial(count_timedelta, (end - start), step)

    if interval == 'week':
        for i in range(intervals(3600 * 24 * 7)):
            yield start + datetime.timedelta(weeks=i) * step

    elif interval == 'day':
        for i in range(intervals(3600 * 24)):
            yield start + datetime.timedelta(days=i) * step

    elif interval == 'hour':
        for i in range(intervals(3600)):
            yield start + datetime.timedelta(hours=i) * step

    elif interval == 'minute':
        for i in range(intervals(60)):
            yield start + datetime.timedelta(minutes=i) * step

    elif interval == 'second':
        for i in range(intervals(1)):
            yield start + datetime.timedelta(seconds=i) * step

    elif interval == 'millisecond':
        for i in range(intervals(1 / 1000)):
            yield start + datetime.timedelta(milliseconds=i) * step

    elif interval == 'microsecond':
        for i in range(intervals(1e-6)):
            yield start + datetime.timedelta(microseconds=i) * step

    else:
        raise AttributeError("Interval must be 'week', 'day', 'hour' 'second', \
            'microsecond' or 'millisecond'.")

Here’s code for a general date range function, similar to Ber’s answer, but more flexible:

def count_timedelta(delta, step, seconds_in_interval):
    """Helper function for iterate.  Finds the number of intervals in the timedelta."""
    return int(delta.total_seconds() / (seconds_in_interval * step))


def range_dt(start, end, step=1, interval='day'):
    """Iterate over datetimes or dates, similar to builtin range."""
    intervals = functools.partial(count_timedelta, (end - start), step)

    if interval == 'week':
        for i in range(intervals(3600 * 24 * 7)):
            yield start + datetime.timedelta(weeks=i) * step

    elif interval == 'day':
        for i in range(intervals(3600 * 24)):
            yield start + datetime.timedelta(days=i) * step

    elif interval == 'hour':
        for i in range(intervals(3600)):
            yield start + datetime.timedelta(hours=i) * step

    elif interval == 'minute':
        for i in range(intervals(60)):
            yield start + datetime.timedelta(minutes=i) * step

    elif interval == 'second':
        for i in range(intervals(1)):
            yield start + datetime.timedelta(seconds=i) * step

    elif interval == 'millisecond':
        for i in range(intervals(1 / 1000)):
            yield start + datetime.timedelta(milliseconds=i) * step

    elif interval == 'microsecond':
        for i in range(intervals(1e-6)):
            yield start + datetime.timedelta(microseconds=i) * step

    else:
        raise AttributeError("Interval must be 'week', 'day', 'hour' 'second', \
            'microsecond' or 'millisecond'.")

回答 18

对于以天为单位递增的范围,以下内容如何处理:

for d in map( lambda x: startDate+datetime.timedelta(days=x), xrange( (stopDate-startDate).days ) ):
  # Do stuff here
  • startDate和stopDate是datetime.date对象

对于通用版本:

for d in map( lambda x: startTime+x*stepTime, xrange( (stopTime-startTime).total_seconds() / stepTime.total_seconds() ) ):
  # Do stuff here
  • startTime和stopTime是datetime.date或datetime.datetime对象(两者应为同一类型)
  • stepTime是一个timedelta对象

请注意,仅在python 2.7之后才支持.total_seconds()。如果您使用的是早期版本,则可以编写自己的函数:

def total_seconds( td ):
  return float(td.microseconds + (td.seconds + td.days * 24 * 3600) * 10**6) / 10**6

What about the following for doing a range incremented by days:

for d in map( lambda x: startDate+datetime.timedelta(days=x), xrange( (stopDate-startDate).days ) ):
  # Do stuff here
  • startDate and stopDate are datetime.date objects

For a generic version:

for d in map( lambda x: startTime+x*stepTime, xrange( (stopTime-startTime).total_seconds() / stepTime.total_seconds() ) ):
  # Do stuff here
  • startTime and stopTime are datetime.date or datetime.datetime object (both should be the same type)
  • stepTime is a timedelta object

Note that .total_seconds() is only supported after python 2.7 If you are stuck with an earlier version you can write your own function:

def total_seconds( td ):
  return float(td.microseconds + (td.seconds + td.days * 24 * 3600) * 10**6) / 10**6

回答 19

通过将rangeargs 存储在元组中,可逆步骤的方法略有不同。

def date_range(start, stop, step=1, inclusive=False):
    day_count = (stop - start).days
    if inclusive:
        day_count += 1

    if step > 0:
        range_args = (0, day_count, step)
    elif step < 0:
        range_args = (day_count - 1, -1, step)
    else:
        raise ValueError("date_range(): step arg must be non-zero")

    for i in range(*range_args):
        yield start + timedelta(days=i)

Slightly different approach to reversible steps by storing range args in a tuple.

def date_range(start, stop, step=1, inclusive=False):
    day_count = (stop - start).days
    if inclusive:
        day_count += 1

    if step > 0:
        range_args = (0, day_count, step)
    elif step < 0:
        range_args = (day_count - 1, -1, step)
    else:
        raise ValueError("date_range(): step arg must be non-zero")

    for i in range(*range_args):
        yield start + timedelta(days=i)

回答 20

import datetime
from dateutil.rrule import DAILY,rrule

date=datetime.datetime(2019,1,10)

date1=datetime.datetime(2019,2,2)

for i in rrule(DAILY , dtstart=date,until=date1):
     print(i.strftime('%Y%b%d'),sep='\n')

输出:

2019Jan10
2019Jan11
2019Jan12
2019Jan13
2019Jan14
2019Jan15
2019Jan16
2019Jan17
2019Jan18
2019Jan19
2019Jan20
2019Jan21
2019Jan22
2019Jan23
2019Jan24
2019Jan25
2019Jan26
2019Jan27
2019Jan28
2019Jan29
2019Jan30
2019Jan31
2019Feb01
2019Feb02
import datetime
from dateutil.rrule import DAILY,rrule

date=datetime.datetime(2019,1,10)

date1=datetime.datetime(2019,2,2)

for i in rrule(DAILY , dtstart=date,until=date1):
     print(i.strftime('%Y%b%d'),sep='\n')

OUTPUT:

2019Jan10
2019Jan11
2019Jan12
2019Jan13
2019Jan14
2019Jan15
2019Jan16
2019Jan17
2019Jan18
2019Jan19
2019Jan20
2019Jan21
2019Jan22
2019Jan23
2019Jan24
2019Jan25
2019Jan26
2019Jan27
2019Jan28
2019Jan29
2019Jan30
2019Jan31
2019Feb01
2019Feb02

如何使用Flask从URL获取命名参数?

问题:如何使用Flask从URL获取命名参数?

当用户访问在我的flask应用程序上运行的URL时,我希望Web服务能够处理问号后指定的参数:

http://10.1.1.1:5000/login?username=alex&password=pw1

#I just want to be able to manipulate the parameters
@app.route('/login', methods=['GET', 'POST'])
def login():
    username = request.form['username']
    print(username)
    password = request.form['password']
    print(password)

When the user accesses this URL running on my flask app, I want the web service to be able to handle the parameters specified after the question mark:

http://10.1.1.1:5000/login?username=alex&password=pw1

#I just want to be able to manipulate the parameters
@app.route('/login', methods=['GET', 'POST'])
def login():
    username = request.form['username']
    print(username)
    password = request.form['password']
    print(password)

回答 0

使用request.args得到解析查询字符串的内容:

from flask import request

@app.route(...)
def login():
    username = request.args.get('username')
    password = request.args.get('password')

Use request.args to get parsed contents of query string:

from flask import request

@app.route(...)
def login():
    username = request.args.get('username')
    password = request.args.get('password')

回答 1

URL参数可在中使用request.args,它是具有方法的ImmutableMultiDict,具有get默认值(default)和类型(type)的可选参数-这是可调用的方法,可将输入值转换为所需的格式。(有关更多详细信息,请参见该方法文档。)

from flask import request

@app.route('/my-route')
def my_route():
  page = request.args.get('page', default = 1, type = int)
  filter = request.args.get('filter', default = '*', type = str)

上面的代码示例:

/my-route?page=34               -> page: 34  filter: '*'
/my-route                       -> page:  1  filter: '*'
/my-route?page=10&filter=test   -> page: 10  filter: 'test'
/my-route?page=10&filter=10     -> page: 10  filter: '10'
/my-route?page=*&filter=*       -> page:  1  filter: '*'

The URL parameters are available in request.args, which is an ImmutableMultiDict that has a get method, with optional parameters for default value (default) and type (type) – which is a callable that converts the input value to the desired format. (See the documentation of the method for more details.)

from flask import request

@app.route('/my-route')
def my_route():
  page = request.args.get('page', default = 1, type = int)
  filter = request.args.get('filter', default = '*', type = str)

Examples with the code above:

/my-route?page=34               -> page: 34  filter: '*'
/my-route                       -> page:  1  filter: '*'
/my-route?page=10&filter=test   -> page: 10  filter: 'test'
/my-route?page=10&filter=10     -> page: 10  filter: '10'
/my-route?page=*&filter=*       -> page:  1  filter: '*'

回答 2

您还可以在视图定义的URL上使用方括号<>,此输入将进入您的视图函数参数

@app.route('/<name>')
def my_view_func(name):
    return name

You can also use brackets <> on the URL of the view definition and this input will go into your view function arguments

@app.route('/<name>')
def my_view_func(name):
    return name

回答 3

如果您在URL中传递了一个参数,则可以按照以下步骤进行操作

from flask import request
#url
http://10.1.1.1:5000/login/alex

from flask import request
@app.route('/login/<username>', methods=['GET'])
def login(username):
    print(username)

如果您有多个参数:

#url
http://10.1.1.1:5000/login?username=alex&password=pw1

from flask import request
@app.route('/login', methods=['GET'])
    def login():
        username = request.args.get('username')
        print(username)
        password= request.args.get('password')
        print(password)

在POST请求的情况下,您尝试执行的操作将参数作为表单参数传递并且不出现在URL中。如果您实际上正在开发登录API,建议您使用POST请求而不是GET,并将数据公开给用户。

如果有职位要求,它将按以下方式工作:

#url
http://10.1.1.1:5000/login

HTML片段:

<form action="http://10.1.1.1:5000/login" method="POST">
  Username : <input type="text" name="username"><br>
  Password : <input type="password" name="password"><br>
  <input type="submit" value="submit">
</form>

路线:

from flask import request
@app.route('/login', methods=['POST'])
    def login():
        username = request.form.get('username')
        print(username)
        password= request.form.get('password')
        print(password)

If you have a single argument passed in the URL you can do it as follows

from flask import request
#url
http://10.1.1.1:5000/login/alex

from flask import request
@app.route('/login/<username>', methods=['GET'])
def login(username):
    print(username)

In case you have multiple parameters:

#url
http://10.1.1.1:5000/login?username=alex&password=pw1

from flask import request
@app.route('/login', methods=['GET'])
    def login():
        username = request.args.get('username')
        print(username)
        password= request.args.get('password')
        print(password)

What you were trying to do works in case of POST requests where parameters are passed as form parameters and do not appear in the URL. In case you are actually developing a login API, it is advisable you use POST request rather than GET and expose the data to the user.

In case of post request, it would work as follows:

#url
http://10.1.1.1:5000/login

HTML snippet:

<form action="http://10.1.1.1:5000/login" method="POST">
  Username : <input type="text" name="username"><br>
  Password : <input type="password" name="password"><br>
  <input type="submit" value="submit">
</form>

Route:

from flask import request
@app.route('/login', methods=['POST'])
    def login():
        username = request.form.get('username')
        print(username)
        password= request.form.get('password')
        print(password)

回答 4

网址:

http://0.0.0.0:5000/user/name/

码:

@app.route('/user/<string:name>/', methods=['GET', 'POST'])
def user_view(name):
    print(name)

(编辑:删除格式字符串中的空格)

url:

http://0.0.0.0:5000/user/name/

code:

@app.route('/user/<string:name>/', methods=['GET', 'POST'])
def user_view(name):
    print(name)

(Edit: removed spaces in format string)


回答 5

真的很简单。让我将此过程分为两个简单步骤。

  1. 在html模板上,您将用户名和密码的名称标签声明为

    <form method="POST">
    <input type="text" name="user_name"></input>
    <input type="text" name="password"></input>
    </form>
  2. 然后,将代码修改为:

    from flask import request
    
    @app.route('/my-route', methods=['POST']) #you should always parse username and 
    # password in a POST method not GET
    def my_route():
      username = request.form.get("user_name")
      print(username)
      password = request.form.get("password")
      print(password)
    #now manipulate the username and password variables as you wish
    #Tip: define another method instead of methods=['GET','POST'], if you want to  
    # render the same template with a GET request too

It’s really simple. Let me divide this process into two simple steps.

  1. On the html template you will declare name tag for username and password as

    <form method="POST">
    <input type="text" name="user_name"></input>
    <input type="text" name="password"></input>
    </form>
    
  2. Then, modify your code as:

    from flask import request
    
    @app.route('/my-route', methods=['POST']) #you should always parse username and 
    # password in a POST method not GET
    def my_route():
      username = request.form.get("user_name")
      print(username)
      password = request.form.get("password")
      print(password)
    #now manipulate the username and password variables as you wish
    #Tip: define another method instead of methods=['GET','POST'], if you want to  
    # render the same template with a GET request too
    

回答 6

使用request.args.get(param),例如:

http://10.1.1.1:5000/login?username=alex&password=pw1
@app.route('/login', methods=['GET', 'POST'])
def login():
    username = request.args.get('username')
    print(username)
    password = request.args.get('password')
    print(password)

这是代码的引用链接。

Use request.args.get(param), for example:

http://10.1.1.1:5000/login?username=alex&password=pw1
@app.route('/login', methods=['GET', 'POST'])
def login():
    username = request.args.get('username')
    print(username)
    password = request.args.get('password')
    print(password)

Here is the referenced link to the code.


系列的真值含糊不清。使用a.empty,a.bool(),a.item(),a.any()或a.all()

问题:系列的真值含糊不清。使用a.empty,a.bool(),a.item(),a.any()或a.all()

在用or条件过滤我的结果数据框时出现问题。我希望我的结果df提取var大于0.25且小于-0.25的所有列值。

下面的逻辑为我提供了一个模糊的真实值,但是当我将此过滤分为两个独立的操作时,它可以工作。这是怎么回事 不知道在哪里使用建议a.empty(), a.bool(), a.item(),a.any() or a.all()

 result = result[(result['var']>0.25) or (result['var']<-0.25)]

Having issue filtering my result dataframe with an or condition. I want my result df to extract all column var values that are above 0.25 and below -0.25.

This logic below gives me an ambiguous truth value however it work when I split this filtering in two separate operations. What is happening here? not sure where to use the suggested a.empty(), a.bool(), a.item(),a.any() or a.all().

 result = result[(result['var']>0.25) or (result['var']<-0.25)]

回答 0

orandPython语句需要truth-值。因为pandas这些被认为是模棱两可的,所以您应该使用“按位” |(或)或&(和)操作:

result = result[(result['var']>0.25) | (result['var']<-0.25)]

对于此类数据结构,它们会重载以生成元素级or(或and)。


只是为该语句添加更多解释:

当您想获取的时bool,将引发异常pandas.Series

>>> import pandas as pd
>>> x = pd.Series([1])
>>> bool(x)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

什么你打是一处经营隐含转换的操作数bool(你用or,但它也恰好为andifwhile):

>>> x or x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> x and x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> if x:
...     print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> while x:
...     print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

除了这些4个语句有一些隐藏某几个Python函数bool调用(如anyallfilter,…),这些都是通常不会有问题的pandas.Series,但出于完整性我想提一提这些。


在您的情况下,该异常并不是真正有用的,因为它没有提到正确的替代方法。对于and和,or您可以使用(如果您想要逐元素比较):

  • numpy.logical_or

    >>> import numpy as np
    >>> np.logical_or(x, y)

    或简单地|算:

    >>> x | y
  • numpy.logical_and

    >>> np.logical_and(x, y)

    或简单地&算:

    >>> x & y

如果您使用的是运算符,请确保由于运算符优先级而正确设置了括号。

几个逻辑numpy的功能,应该工作的pandas.Series


如果您在执行if或时遇到异常,则异常中提到的替代方法更适合while。我将在下面简短地解释每个:

  • 如果要检查您的系列是否为

    >>> x = pd.Series([])
    >>> x.empty
    True
    >>> x = pd.Series([1])
    >>> x.empty
    False

    如果没有明确的布尔值解释,Python通常会将len容器的gth(如list,,tuple…)解释为真值。因此,如果您想进行类似python的检查,可以执行:if x.sizeif not x.empty代替if x

  • 如果您Series包含一个且只有一个布尔值:

    >>> x = pd.Series([100])
    >>> (x > 50).bool()
    True
    >>> (x < 50).bool()
    False
  • 如果要检查系列的第一个也是唯一的一项(例如,.bool()但即使不是布尔型内容也可以使用):

    >>> x = pd.Series([100])
    >>> x.item()
    100
  • 如果要检查所有任何项目是否为非零,非空或非False:

    >>> x = pd.Series([0, 1, 2])
    >>> x.all()   # because one element is zero
    False
    >>> x.any()   # because one (or more) elements are non-zero
    True

The or and and python statements require truth-values. For pandas these are considered ambiguous so you should use “bitwise” | (or) or & (and) operations:

result = result[(result['var']>0.25) | (result['var']<-0.25)]

These are overloaded for these kind of datastructures to yield the element-wise or (or and).


Just to add some more explanation to this statement:

The exception is thrown when you want to get the bool of a pandas.Series:

>>> import pandas as pd
>>> x = pd.Series([1])
>>> bool(x)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What you hit was a place where the operator implicitly converted the operands to bool (you used or but it also happens for and, if and while):

>>> x or x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> x and x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> if x:
...     print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> while x:
...     print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Besides these 4 statements there are several python functions that hide some bool calls (like any, all, filter, …) these are normally not problematic with pandas.Series but for completeness I wanted to mention these.


In your case the exception isn’t really helpful, because it doesn’t mention the right alternatives. For and and or you can use (if you want element-wise comparisons):

  • numpy.logical_or:

    >>> import numpy as np
    >>> np.logical_or(x, y)
    

    or simply the | operator:

    >>> x | y
    
  • numpy.logical_and:

    >>> np.logical_and(x, y)
    

    or simply the & operator:

    >>> x & y
    

If you’re using the operators then make sure you set your parenthesis correctly because of the operator precedence.

There are several logical numpy functions which should work on pandas.Series.


The alternatives mentioned in the Exception are more suited if you encountered it when doing if or while. I’ll shortly explain each of these:

  • If you want to check if your Series is empty:

    >>> x = pd.Series([])
    >>> x.empty
    True
    >>> x = pd.Series([1])
    >>> x.empty
    False
    

    Python normally interprets the length of containers (like list, tuple, …) as truth-value if it has no explicit boolean interpretation. So if you want the python-like check, you could do: if x.size or if not x.empty instead of if x.

  • If your Series contains one and only one boolean value:

    >>> x = pd.Series([100])
    >>> (x > 50).bool()
    True
    >>> (x < 50).bool()
    False
    
  • If you want to check the first and only item of your Series (like .bool() but works even for not boolean contents):

    >>> x = pd.Series([100])
    >>> x.item()
    100
    
  • If you want to check if all or any item is not-zero, not-empty or not-False:

    >>> x = pd.Series([0, 1, 2])
    >>> x.all()   # because one element is zero
    False
    >>> x.any()   # because one (or more) elements are non-zero
    True
    

回答 1

对于布尔逻辑,请使用&|

np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))

>>> df
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
2  0.950088 -0.151357 -0.103219
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

>>> df.loc[(df.C > 0.25) | (df.C < -0.25)]
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

要查看发生了什么,您可以为每个比较获得一列布尔值,例如

df.C > 0.25
0     True
1    False
2    False
3     True
4     True
Name: C, dtype: bool

当您有多个条件时,将返回多个列。这就是为什么联接逻辑模棱两可的原因。单独使用andor对待每列,因此您首先需要将该列减少为单个布尔值。例如,查看每个列中的任何值或所有值是否为True。

# Any value in either column is True?
(df.C > 0.25).any() or (df.C < -0.25).any()
True

# All values in either column is True?
(df.C > 0.25).all() or (df.C < -0.25).all()
False

一种实现相同目的的复杂方法是​​将所有这些列压缩在一起,并执行适当的逻辑。

>>> df[[any([a, b]) for a, b in zip(df.C > 0.25, df.C < -0.25)]]
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

有关更多详细信息,请参阅文档中的布尔索引

For boolean logic, use & and |.

np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))

>>> df
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
2  0.950088 -0.151357 -0.103219
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

>>> df.loc[(df.C > 0.25) | (df.C < -0.25)]
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

To see what is happening, you get a column of booleans for each comparison, e.g.

df.C > 0.25
0     True
1    False
2    False
3     True
4     True
Name: C, dtype: bool

When you have multiple criteria, you will get multiple columns returned. This is why the the join logic is ambiguous. Using and or or treats each column separately, so you first need to reduce that column to a single boolean value. For example, to see if any value or all values in each of the columns is True.

# Any value in either column is True?
(df.C > 0.25).any() or (df.C < -0.25).any()
True

# All values in either column is True?
(df.C > 0.25).all() or (df.C < -0.25).all()
False

One convoluted way to achieve the same thing is to zip all of these columns together, and perform the appropriate logic.

>>> df[[any([a, b]) for a, b in zip(df.C > 0.25, df.C < -0.25)]]
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

For more details, refer to Boolean Indexing in the docs.


回答 2

好吧熊猫使用按位’&”|’ 并且每个条件都应该用’()’包装

例如以下作品

data_query = data[(data['year'] >= 2005) & (data['year'] <= 2010)]

但是没有适当括号的相同查询不会

data_query = data[(data['year'] >= 2005 & data['year'] <= 2010)]

Well pandas use bitwise ‘&’ ‘|’ and each condition should be wrapped in a ‘()’

For example following works

data_query = data[(data['year'] >= 2005) & (data['year'] <= 2010)]

But the same query without proper brackets does not

data_query = data[(data['year'] >= 2005 & data['year'] <= 2010)]

回答 3

或者,您也可以使用操作员模块。更详细的信息在这里Python文档

import operator
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df.loc[operator.or_(df.C > 0.25, df.C < -0.25)]

          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.4438

Or, alternatively, you could use Operator module. More detailed information is here Python docs

import operator
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df.loc[operator.or_(df.C > 0.25, df.C < -0.25)]

          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.4438

回答 4

这个极好的答案很好地解释了正在发生的事情并提供了解决方案。我想添加另一种可能在类似情况下适用的解决方案:使用query方法:

result = result.query("(var > 0.25) or (var < -0.25)")

另请参见http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-query

(一些我正在使用的数据帧的测试表明,该方法比在一系列布尔值上使用按位运算符要慢一些:2 ms vs. 870 µs)

警告:至少其中一种情况不是很简单,那就是列名恰好是python表达式。我有名为的列WT_38hph_IP_2WT_38hph_input_2log2(WT_38hph_IP_2/WT_38hph_input_2)想执行以下查询:"(log2(WT_38hph_IP_2/WT_38hph_input_2) > 1) and (WT_38hph_IP_2 > 20)"

我获得了以下异常级联:

  • KeyError: 'log2'
  • UndefinedVariableError: name 'log2' is not defined
  • ValueError: "log2" is not a supported function

我猜这是因为查询解析器试图从前两列中获取内容,而不是用第三列的名称来标识表达式。

这里提出一种可能的解决方法。

This excellent answer explains very well what is happening and provides a solution. I would like to add another solution that might be suitable in similar cases: using the query method:

result = result.query("(var > 0.25) or (var < -0.25)")

See also http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-query.

(Some tests with a dataframe I’m currently working with suggest that this method is a bit slower than using the bitwise operators on series of booleans: 2 ms vs. 870 µs)

A piece of warning: At least one situation where this is not straightforward is when column names happen to be python expressions. I had columns named WT_38hph_IP_2, WT_38hph_input_2 and log2(WT_38hph_IP_2/WT_38hph_input_2) and wanted to perform the following query: "(log2(WT_38hph_IP_2/WT_38hph_input_2) > 1) and (WT_38hph_IP_2 > 20)"

I obtained the following exception cascade:

  • KeyError: 'log2'
  • UndefinedVariableError: name 'log2' is not defined
  • ValueError: "log2" is not a supported function

I guess this happened because the query parser was trying to make something from the first two columns instead of identifying the expression with the name of the third column.

A possible workaround is proposed here.


回答 5

我遇到了同样的错误,并在pyspark数据帧中停滞了几天,我能够通过将na值填充为0来成功解决它,因为我正在比较2个字段的整数值。

I encountered the same error and got stalled with a pyspark dataframe for few days, I was able to resolve it successfully by filling na values with 0 since I was comparing integer values from 2 fields.


在Python中为日期添加5天

问题:在Python中为日期添加5天

我有一个日期"10/10/11(m-d-y)",我想使用Python脚本为其添加5天。请考虑在月底也可以使用的一般解决方案。

我正在使用以下代码:

import re
from datetime import datetime

StartDate = "10/10/11"

Date = datetime.strptime(StartDate, "%m/%d/%y")

print Date ->正在打印 '2011-10-10 00:00:00'

现在,我想在此日期之前增加5天。我使用以下代码:

EndDate = Date.today()+timedelta(days=10)

哪个返回此错误:

name 'timedelta' is not defined

I have a date "10/10/11(m-d-y)" and I want to add 5 days to it using a Python script. Please consider a general solution that works on the month ends also.

I am using following code:

import re
from datetime import datetime

StartDate = "10/10/11"

Date = datetime.strptime(StartDate, "%m/%d/%y")

print Date -> is printing '2011-10-10 00:00:00'

Now I want to add 5 days to this date. I used the following code:

EndDate = Date.today()+timedelta(days=10)

Which returned this error:

name 'timedelta' is not defined

回答 0

先前的答案是正确的,但是通常这样做是更好的做法:

import datetime

然后,您将拥有datetime.timedelta

date_1 = datetime.datetime.strptime(start_date, "%m/%d/%y")

end_date = date_1 + datetime.timedelta(days=10)

The previous answers are correct but it’s generally a better practice to do:

import datetime

Then you’ll have, using datetime.timedelta:

date_1 = datetime.datetime.strptime(start_date, "%m/%d/%y")

end_date = date_1 + datetime.timedelta(days=10)

回答 1

导入timedeltadate首先。

from datetime import timedelta, date

date.today()会返回今天的日期时间,可能是您想要的

EndDate = date.today() + timedelta(days=10)

Import timedelta and date first.

from datetime import timedelta, date

And date.today() will return today’s datetime, may be you want

EndDate = date.today() + timedelta(days=10)

回答 2

如果您碰巧已经在使用pandas,则可以通过不指定格式来节省一些空间:

import pandas as pd
startdate = "10/10/2011"
enddate = pd.to_datetime(startdate) + pd.DateOffset(days=5)

If you happen to already be using pandas, you can save a little space by not specifying the format:

import pandas as pd
startdate = "10/10/2011"
enddate = pd.to_datetime(startdate) + pd.DateOffset(days=5)

回答 3

我想您缺少这样的东西:

from datetime import timedelta

I guess you are missing something like that:

from datetime import timedelta

回答 4

这是另一种使用dateutil的relativedelta添加日期的方法。

from datetime import datetime
from dateutil.relativedelta import relativedelta

print 'Today: ',datetime.now().strftime('%d/%m/%Y %H:%M:%S') 
date_after_month = datetime.now()+ relativedelta(days=5)
print 'After 5 Days:', date_after_month.strftime('%d/%m/%Y %H:%M:%S')

输出:

今天:25/06/2015 15:56:09

5天后:30/06/2015 15:56:09

Here is another method to add days on date using dateutil’s relativedelta.

from datetime import datetime
from dateutil.relativedelta import relativedelta

print 'Today: ',datetime.now().strftime('%d/%m/%Y %H:%M:%S') 
date_after_month = datetime.now()+ relativedelta(days=5)
print 'After 5 Days:', date_after_month.strftime('%d/%m/%Y %H:%M:%S')

Output:

Today: 25/06/2015 15:56:09

After 5 Days: 30/06/2015 15:56:09


回答 5

如果要立即添加日期,可以使用此代码

from datetime import datetime
from datetime import timedelta


date_now_more_5_days = (datetime.now() + timedelta(days=5) ).strftime('%Y-%m-%d')

If you want add days to date now, you can use this code

from datetime import datetime
from datetime import timedelta


date_now_more_5_days = (datetime.now() + timedelta(days=5) ).strftime('%Y-%m-%d')

回答 6

这是从现在开始+指定天数的功能

import datetime

def get_date(dateFormat="%d-%m-%Y", addDays=0):

    timeNow = datetime.datetime.now()
    if (addDays!=0):
        anotherTime = timeNow + datetime.timedelta(days=addDays)
    else:
        anotherTime = timeNow

    return anotherTime.strftime(dateFormat)

用法:

addDays = 3 #days
output_format = '%d-%m-%Y'
output = get_date(output_format, addDays)
print output

Here is a function of getting from now + specified days

import datetime

def get_date(dateFormat="%d-%m-%Y", addDays=0):

    timeNow = datetime.datetime.now()
    if (addDays!=0):
        anotherTime = timeNow + datetime.timedelta(days=addDays)
    else:
        anotherTime = timeNow

    return anotherTime.strftime(dateFormat)

Usage:

addDays = 3 #days
output_format = '%d-%m-%Y'
output = get_date(output_format, addDays)
print output

回答 7

为了减少冗长的代码,并避免datetime和datetime.datetime之间的名称冲突 ,应使用CamelCase名称重命名这些类。

from datetime import datetime as DateTime, timedelta as TimeDelta

因此,您可以执行以下操作,我认为这更清楚。

date_1 = DateTime.today() 
end_date = date_1 + TimeDelta(days=10)

另外,如果您以后想要的话,也不会出现名称冲突import datetime

In order to have have a less verbose code, and avoid name conflicts between datetime and datetime.datetime, you should rename the classes with CamelCase names.

from datetime import datetime as DateTime, timedelta as TimeDelta

So you can do the following, which I think it is clearer.

date_1 = DateTime.today() 
end_date = date_1 + TimeDelta(days=10)

Also, there would be no name conflict if you want to import datetime later on.


回答 8

使用timedeltas可以做到:

import datetime
today=datetime.date.today()


time=datetime.time()
print("today :",today)

# One day different .
five_day=datetime.timedelta(days=5)
print("one day :",five_day)
#output - 1 day , 00:00:00


# five day extend .
fitfthday=today+five_day
print("fitfthday",fitfthday)


# five day extend .
fitfthday=today+five_day
print("fitfthday",fitfthday)
#output - 
today : 2019-05-29
one day : 5 days, 0:00:00
fitfthday 2019-06-03

using timedeltas you can do:

import datetime
today=datetime.date.today()


time=datetime.time()
print("today :",today)

# One day different .
five_day=datetime.timedelta(days=5)
print("one day :",five_day)
#output - 1 day , 00:00:00


# five day extend .
fitfthday=today+five_day
print("fitfthday",fitfthday)


# five day extend .
fitfthday=today+five_day
print("fitfthday",fitfthday)
#output - 
today : 2019-05-29
one day : 5 days, 0:00:00
fitfthday 2019-06-03

回答 9

通常,您现在没有答案,但是也许我创建的我的类也会有所帮助。对我来说,它可以解决我在Pyhon项目中曾经遇到的所有要求。

class GetDate:
    def __init__(self, date, format="%Y-%m-%d"):
        self.tz = pytz.timezone("Europe/Warsaw")

        if isinstance(date, str):
            date = datetime.strptime(date, format)

        self.date = date.astimezone(self.tz)

    def time_delta_days(self, days):
        return self.date + timedelta(days=days)

    def time_delta_hours(self, hours):
        return self.date + timedelta(hours=hours)

    def time_delta_seconds(self, seconds):
        return self.date + timedelta(seconds=seconds)

    def get_minimum_time(self):
        return datetime.combine(self.date, time.min).astimezone(self.tz)

    def get_maximum_time(self):
        return datetime.combine(self.date, time.max).astimezone(self.tz)

    def get_month_first_day(self):
        return datetime(self.date.year, self.date.month, 1).astimezone(self.tz)

    def current(self):
        return self.date

    def get_month_last_day(self):
        lastDay = calendar.monthrange(self.date.year, self.date.month)[1]
        date = datetime(self.date.year, self.date.month, lastDay)
        return datetime.combine(date, time.max).astimezone(self.tz)

如何使用它

  1. self.tz = pytz.timezone("Europe/Warsaw") -在此处定义要在项目中使用的时区
  2. GetDate("2019-08-08").current()-这会将您的字符串日期转换为具有您在pt 1中定义的时区的时间敏感对象。默认字符串格式为,format="%Y-%m-%d"但可以随时更改。(例如。GetDate("2019-08-08 08:45", format="%Y-%m-%d %H:%M").current()
  3. GetDate("2019-08-08").get_month_first_day() 返回给定日期(字符串或对象)月份的第一天
  4. GetDate("2019-08-08").get_month_last_day() 返回上个月的给定日期
  5. GetDate("2019-08-08").minimum_time() 返回给定日期的开始日期
  6. GetDate("2019-08-08").maximum_time() 返回给定日期的一天结束
  7. GetDate("2019-08-08").time_delta_days({number_of_days})返回给定的日期+添加{天数}(您也可以调用:GetDate(timezone.now()).time_delta_days(-1)昨天)
  8. GetDate("2019-08-08").time_delta_haours({number_of_hours}) 与pt 7类似,但工作时间较长
  9. GetDate("2019-08-08").time_delta_seconds({number_of_seconds}) 类似于pt 7,但工作几秒钟

Generally you have’got an answer now but maybe my class I created will be also helpfull. For me it solves all my requirements I have ever had in my Pyhon projects.

class GetDate:
    def __init__(self, date, format="%Y-%m-%d"):
        self.tz = pytz.timezone("Europe/Warsaw")

        if isinstance(date, str):
            date = datetime.strptime(date, format)

        self.date = date.astimezone(self.tz)

    def time_delta_days(self, days):
        return self.date + timedelta(days=days)

    def time_delta_hours(self, hours):
        return self.date + timedelta(hours=hours)

    def time_delta_seconds(self, seconds):
        return self.date + timedelta(seconds=seconds)

    def get_minimum_time(self):
        return datetime.combine(self.date, time.min).astimezone(self.tz)

    def get_maximum_time(self):
        return datetime.combine(self.date, time.max).astimezone(self.tz)

    def get_month_first_day(self):
        return datetime(self.date.year, self.date.month, 1).astimezone(self.tz)

    def current(self):
        return self.date

    def get_month_last_day(self):
        lastDay = calendar.monthrange(self.date.year, self.date.month)[1]
        date = datetime(self.date.year, self.date.month, lastDay)
        return datetime.combine(date, time.max).astimezone(self.tz)

How to use it

  1. self.tz = pytz.timezone("Europe/Warsaw") – here you define Time Zone you want to use in project
  2. GetDate("2019-08-08").current() – this will convert your string date to time aware object with timezone you defined in pt 1. Default string format is format="%Y-%m-%d" but feel free to change it. (eg. GetDate("2019-08-08 08:45", format="%Y-%m-%d %H:%M").current())
  3. GetDate("2019-08-08").get_month_first_day() returns given date (string or object) month first day
  4. GetDate("2019-08-08").get_month_last_day() returns given date month last day
  5. GetDate("2019-08-08").minimum_time() returns given date day start
  6. GetDate("2019-08-08").maximum_time() returns given date day end
  7. GetDate("2019-08-08").time_delta_days({number_of_days}) returns given date + add {number of days} (you can also call: GetDate(timezone.now()).time_delta_days(-1) for yesterday)
  8. GetDate("2019-08-08").time_delta_haours({number_of_hours}) similar to pt 7 but working on hours
  9. GetDate("2019-08-08").time_delta_seconds({number_of_seconds}) similar to pt 7 but working on seconds

回答 10

有时我们需要使用按日期和日期进行搜索。如果我们使用date__range,那么我们需要添加1天和to_date,否则queryset将为空。

例:

从datetime导入timedelta

from_date = parse_date(request.POST [‘from_date’])

to_date = parse_date(request.POST [‘to_date’])+ timedelta(days = 1)

Attenance_list = models.DailyAttendance.objects.filter(attdate__range = [from_date,to_date])

Some time we need to use searching by from date & to date. If we use date__range then we need to add 1 days with to_date otherwise queryset will empty.

Example:

from datetime import timedelta

from_date = parse_date(request.POST[‘from_date’])

to_date = parse_date(request.POST[‘to_date’]) + timedelta(days=1)

attendance_list = models.DailyAttendance.objects.filter(attdate__range = [from_date, to_date])


对Python中的数字列表求和

问题:对Python中的数字列表求和

我有一个数字列表,例如[1,2,3,4,5...],我想计算(1+2)/2第二个,(2+3)/2第三个, (3+4)/2等等。我怎样才能做到这一点?

我想将第一个数字与第二个数字相加并除以2,然后将第二个数字与第三个数字相加并除以2,依此类推。

另外,如何求和数字列表?

a = [1, 2, 3, 4, 5, ...]

是吗:

b = sum(a)
print b

得到一个号码?

这对我不起作用。

I have a list of numbers such as [1,2,3,4,5...], and I want to calculate (1+2)/2 and for the second, (2+3)/2 and the third, (3+4)/2, and so on. How can I do that?

I would like to sum the first number with the second and divide it by 2, then sum the second with the third and divide by 2, and so on.

Also, how can I sum a list of numbers?

a = [1, 2, 3, 4, 5, ...]

Is it:

b = sum(a)
print b

to get one number?

This doesn’t work for me.


回答 0

问题1:因此,您想要(元素0 +元素1)/ 2,(元素1 +元素2)/ 2,…等等。

我们列出两个列表:除第一个元素之外的每个元素中的一个,除最后一个元素之外的每个元素中的一个。然后,我们想要的平均值是从两个列表中得出的每对平均值。我们zip过去常常从两个列表中取对。

我假设即使输入值是整数,您也希望在结果中看到小数。默认情况下,Python会进行整数除法:它会丢弃余数。要完全划分事物,我们需要使用浮点数。幸运的是,将int除以浮点数将产生一个浮点数,因此我们仅将其2.0用作除数而不是2

从而:

averages = [(x + y) / 2.0 for (x, y) in zip(my_list[:-1], my_list[1:])]

问题2:

的使用sum应该可以正常工作。以下作品:

a = range(10)
# [0,1,2,3,4,5,6,7,8,9]
b = sum(a)
print b
# Prints 45

同样,您不需要在此过程的每一步都将所有内容分配给变量。print sum(a)效果很好。

您将必须更确切地确切地写出您所写的内容以及它是如何工作的。

Question 1: So you want (element 0 + element 1) / 2, (element 1 + element 2) / 2, … etc.

We make two lists: one of every element except the first, and one of every element except the last. Then the averages we want are the averages of each pair taken from the two lists. We use zip to take pairs from two lists.

I assume you want to see decimals in the result, even though your input values are integers. By default, Python does integer division: it discards the remainder. To divide things through all the way, we need to use floating-point numbers. Fortunately, dividing an int by a float will produce a float, so we just use 2.0 for our divisor instead of 2.

Thus:

averages = [(x + y) / 2.0 for (x, y) in zip(my_list[:-1], my_list[1:])]

Question 2:

That use of sum should work fine. The following works:

a = range(10)
# [0,1,2,3,4,5,6,7,8,9]
b = sum(a)
print b
# Prints 45

Also, you don’t need to assign everything to a variable at every step along the way. print sum(a) works just fine.

You will have to be more specific about exactly what you wrote and how it isn’t working.


回答 1

总数列表:

sum(list_of_nums)

使用列表推导计算n和n-1的一半(如果我的模式正确):

[(x + (x - 1)) / 2 for x in list_of_nums]

对相邻元素求和,例如((1 + 2)/ 2)+((2 + 3)/ 2)+ …使用reducelambda

reduce(lambda x, y: (x + y) / 2, list_of_nums)

Sum list of numbers:

sum(list_of_nums)

Calculating half of n and n – 1 (if I have the pattern correct), using a list comprehension:

[(x + (x - 1)) / 2 for x in list_of_nums]

Sum adjacent elements, e.g. ((1 + 2) / 2) + ((2 + 3) / 2) + … using reduce and lambdas

reduce(lambda x, y: (x + y) / 2, list_of_nums)

回答 2

问题2: 总结一个整数列表:

a = [2, 3, 5, 8]
sum(a)
# 18
# or you can do:
sum(i for i in a)
# 18

如果列表包含整数作为字符串:

a = ['5', '6']
# import Decimal: from decimal import Decimal
sum(Decimal(i) for i in a)

Question 2: To sum a list of integers:

a = [2, 3, 5, 8]
sum(a)
# 18
# or you can do:
sum(i for i in a)
# 18

If the list contains integers as strings:

a = ['5', '6']
# import Decimal: from decimal import Decimal
sum(Decimal(i) for i in a)

回答 3

您可以这样尝试:

a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
sm = sum(a[0:len(a)]) # Sum of 'a' from 0 index to 9 index. sum(a) == sum(a[0:len(a)]
print(sm) # Python 3
print sm  # Python 2

You can try this way:

a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
sm = sum(a[0:len(a)]) # Sum of 'a' from 0 index to 9 index. sum(a) == sum(a[0:len(a)]
print(sm) # Python 3
print sm  # Python 2

回答 4

>>> a = range(10)
>>> sum(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not callable
>>> del sum
>>> sum(a)
45

似乎sum已经在代码中的某个地方定义了该代码,并覆盖了默认功能。所以我删除了它,问题解决了。

>>> a = range(10)
>>> sum(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not callable
>>> del sum
>>> sum(a)
45

It seems that sum has been defined in the code somewhere and overwrites the default function. So I deleted it and the problem was solved.


回答 5

使用简单list-comprehensionsum

>> sum(i for i in range(x))/2. #if x = 10 the result will be 22.5

Using a simple list-comprehension and the sum:

>> sum(i for i in range(x))/2. #if x = 10 the result will be 22.5

回答 6

所有答案均显示出程序化和通用的方法。我建议针对您的情况的数学方法。特别是对于长列表,它可能会更快。之所以有效,是因为您的列表是一个自然数列表,最多可包含n

假设我们有自然数1, 2, 3, ..., 10

>>> nat_seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

您可以sum在列表中使用该功能:

>>> print sum(nat_seq)
55

您还可以使用公式n*(n+1)/2where n是列表中最后一个元素的值(此处nat_seq[-1]:),因此可以避免迭代元素:

>>> print (nat_seq[-1]*(nat_seq[-1]+1))/2
55

要生成序列(1+2)/2, (2+3)/2, ..., (9+10)/2,可以使用生成器和公式(2*k-1)/2.(注意点使值成为浮点)。生成新列表时,您必须跳过第一个元素:

>>> new_seq = [(2*k-1)/2. for k in nat_seq[1:]]
>>> print new_seq
[1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5]

您也可以sum在此列表中使用该函数:

>>> print sum(new_seq)
49.5

但是您也可以使用公式(((n*2+1)/2)**2-1)/2,因此可以避免迭代元素:

>>> print (((new_seq[-1]*2+1)/2)**2-1)/2
49.5

All answers did show a programmatic and general approach. I suggest a mathematical approach specific for your case. It can be faster in particular for long lists. It works because your list is a list of natural numbers up to n:

Let’s assume we have the natural numbers 1, 2, 3, ..., 10:

>>> nat_seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

You can use the sum function on a list:

>>> print sum(nat_seq)
55

You can also use the formula n*(n+1)/2 where n is the value of the last element in the list (here: nat_seq[-1]), so you avoid iterating over elements:

>>> print (nat_seq[-1]*(nat_seq[-1]+1))/2
55

To generate the sequence (1+2)/2, (2+3)/2, ..., (9+10)/2 you can use a generator and the formula (2*k-1)/2. (note the dot to make the values floating points). You have to skip the first element when generating the new list:

>>> new_seq = [(2*k-1)/2. for k in nat_seq[1:]]
>>> print new_seq
[1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5]

Here too, you can use the sum function on that list:

>>> print sum(new_seq)
49.5

But you can also use the formula (((n*2+1)/2)**2-1)/2, so you can avoid iterating over elements:

>>> print (((new_seq[-1]*2+1)/2)**2-1)/2
49.5

回答 7

解决此问题的最简单方法:

l =[1,2,3,4,5]
sum=0
for element in l:
    sum+=element
print sum

The simplest way to solve this problem:

l =[1,2,3,4,5]
sum=0
for element in l:
    sum+=element
print sum

回答 8

这个问题已经在这里回答

a = [1,2,3,4] sum(a)返回10

This question has been answered here

a = [1,2,3,4] sum(a) returns 10


回答 9

import numpy as np    
x = [1,2,3,4,5]
[(np.mean((x[i],x[i+1]))) for i in range(len(x)-1)]
# [1.5, 2.5, 3.5, 4.5]
import numpy as np    
x = [1,2,3,4,5]
[(np.mean((x[i],x[i+1]))) for i in range(len(x)-1)]
# [1.5, 2.5, 3.5, 4.5]

回答 10

生成器是编写此代码的简单方法:

from __future__ import division
# ^- so that 3/2 is 1.5 not 1

def averages( lst ):
    it = iter(lst) # Get a iterator over the list
    first = next(it)
    for item in it:
        yield (first+item)/2
        first = item

print list(averages(range(1,11)))
# [1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5]

Generators are an easy way to write this:

from __future__ import division
# ^- so that 3/2 is 1.5 not 1

def averages( lst ):
    it = iter(lst) # Get a iterator over the list
    first = next(it)
    for item in it:
        yield (first+item)/2
        first = item

print list(averages(range(1,11)))
# [1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5]

回答 11

让我们为初学者轻松:-

  1. global关键字将允许主函数内分配全局变量消息,而不产生新的本地变量
    message = "This is a global!"


def main():
    global message
    message = "This is a local"
    print(message)


main()
# outputs "This is a local" - From the Function call
print(message)
# outputs "This is a local" - From the Outer scope

这个概念叫做阴影

  1. 对Python中的数字列表求和
nums = [1, 2, 3, 4, 5]

var = 0


def sums():
    for num in nums:
        global var
        var = var + num
    print(var)


if __name__ == '__main__':
    sums()

输出= 15

Let us make it easy for Beginner:-

  1. The global keyword will allow the global variable message to be assigned within the main function without producing a new local variable
    message = "This is a global!"


def main():
    global message
    message = "This is a local"
    print(message)


main()
# outputs "This is a local" - From the Function call
print(message)
# outputs "This is a local" - From the Outer scope

This concept is called Shadowing

  1. Sum a list of numbers in Python
nums = [1, 2, 3, 4, 5]

var = 0


def sums():
    for num in nums:
        global var
        var = var + num
    print(var)


if __name__ == '__main__':
    sums()

Outputs = 15


回答 12

使用pairwise itertools配方

import itertools
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.izip(a, b)

def pair_averages(seq):
    return ( (a+b)/2 for a, b in pairwise(seq) )

Using the pairwise itertools recipe:

import itertools
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.izip(a, b)

def pair_averages(seq):
    return ( (a+b)/2 for a, b in pairwise(seq) )

回答 13

简短:

def ave(x,y):
  return (x + y) / 2.0

map(ave, a[:-1], a[1:])

外观如下:

>>> a = range(10)
>>> map(ave, a[:-1], a[1:])
[0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5]

由于Python如何处理map两个以上的列表有些愚蠢,因此您必须截断该列表a[:-1]。如果您使用它,它的工作效果将超出您的预期itertools.imap

>>> import itertools
>>> itertools.imap(ave, a, a[1:])
<itertools.imap object at 0x1005c3990>
>>> list(_)
[0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5]

Short and simple:

def ave(x,y):
  return (x + y) / 2.0

map(ave, a[:-1], a[1:])

And here’s how it looks:

>>> a = range(10)
>>> map(ave, a[:-1], a[1:])
[0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5]

Due to some stupidity in how Python handles a map over two lists, you do have to truncate the list, a[:-1]. It works more as you’d expect if you use itertools.imap:

>>> import itertools
>>> itertools.imap(ave, a, a[1:])
<itertools.imap object at 0x1005c3990>
>>> list(_)
[0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5]

回答 14

如此众多的解决方案,但我最喜欢的仍然缺失:

>>> import numpy as np
>>> arr = np.array([1,2,3,4,5])

numpy数组与列表没有太大区别(在这种情况下),除了可以将数组视为数字:

>>> ( arr[:-1] + arr[1:] ) / 2.0
[ 1.5  2.5  3.5  4.5]

做完了!

说明

花式索引的含义是:[1:]包括从1到末尾的所有元素(因此省略元素0),并且[:-1]是除最后一个元素之外的所有元素:

>>> arr[:-1]
array([1, 2, 3, 4])
>>> arr[1:]
array([2, 3, 4, 5])

因此,将这两个元素相加即可得到一个由元素(1 + 2),(2 + 3)等组成的数组。并不是说我要除以2.0,不是2因为否则Python会认为您仅使用整数并产生四舍五入的整数结果。

使用numpy的优势

NumPy的可快于各地的号码清单循环。根据列表的大小,速度快几个数量级。而且,它的代码少得多,至少对我来说,它更易于阅读。我正在尝试养成对所有数字组都使用numpy的习惯,这对我原本必须制作的所有循环和内部循环都是一个巨大的改进。

So many solutions, but my favourite is still missing:

>>> import numpy as np
>>> arr = np.array([1,2,3,4,5])

a numpy array is not too different from a list (in this use case), except that you can treat arrays like numbers:

>>> ( arr[:-1] + arr[1:] ) / 2.0
[ 1.5  2.5  3.5  4.5]

Done!

explanation

The fancy indices mean this: [1:] includes all elements from 1 to the end (thus omitting element 0), and [:-1] are all elements except the last one:

>>> arr[:-1]
array([1, 2, 3, 4])
>>> arr[1:]
array([2, 3, 4, 5])

So adding those two gives you an array consisting of elemens (1+2), (2+3) and so on. Not that I’m dividing by 2.0, not 2 because otherwise Python believes that you’re only using integers and produces rounded integer results.

advantage of using numpy

Numpy can be much faster than loops around lists of numbers. Depending on how big your list is, several orders of magnitude faster. Also, it’s a lot less code, and at least to me, it’s easier to read. I’m trying to make a habit out of using numpy for all groups of numbers, and it is a huge improvement to all the loops and lops-within-loops I would otherwise have had to make.


回答 15

我只是将lambda与map()一起使用

a = [1,2,3,4,5,6,7,8,9,10]
b = map(lambda x, y: (x+y)/2.0, fib[:-1], fib[1:])
print b

I’d just use a lambda with map()

a = [1,2,3,4,5,6,7,8,9,10]
b = map(lambda x, y: (x+y)/2.0, fib[:-1], fib[1:])
print b

回答 16

我使用while循环来获取结果:

i = 0
while i < len(a)-1:
   result = (a[i]+a[i+1])/2
   print result
   i +=1

I use a while loop to get the result:

i = 0
while i < len(a)-1:
   result = (a[i]+a[i+1])/2
   print result
   i +=1

回答 17

遍历列表中的元素并像这样更新总数:

def sum(a):
    total = 0
    index = 0
    while index < len(a):
        total = total + a[index]
        index = index + 1
    return total

Loop through elements in the list and update the total like this:

def sum(a):
    total = 0
    index = 0
    while index < len(a):
        total = total + a[index]
        index = index + 1
    return total

回答 18

多亏了Karl Knechtel,我才能够理解您的问题。我的解释:

  1. 您想要一个包含元素i和i + 1平均值的新列表。
  2. 您要对列表中的每个元素求和。

使用匿名函数(又名Lambda函数)的第一个问题:

s = lambda l: [(l[0]+l[1])/2.] + s(l[1:]) if len(l)>1 else []  #assuming you want result as float
s = lambda l: [(l[0]+l[1])//2] + s(l[1:]) if len(l)>1 else []  #assuming you want floor result

第二个问题也使用匿名函数(又名Lambda函数):

p = lambda l: l[0] + p(l[1:]) if l!=[] else 0

这两个问题合并在一行代码中:

s = lambda l: (l[0]+l[1])/2. + s(l[1:]) if len(l)>1 else 0  #assuming you want result as float
s = lambda l: (l[0]+l[1])/2. + s(l[1:]) if len(l)>1 else 0  #assuming you want floor result

使用最适合您需求的那个

Thanks to Karl Knechtel i was able to understand your question. My interpretation:

  1. You want a new list with the average of the element i and i+1.
  2. You want to sum each element in the list.

First question using anonymous function (aka. Lambda function):

s = lambda l: [(l[0]+l[1])/2.] + s(l[1:]) if len(l)>1 else []  #assuming you want result as float
s = lambda l: [(l[0]+l[1])//2] + s(l[1:]) if len(l)>1 else []  #assuming you want floor result

Second question also using anonymous function (aka. Lambda function):

p = lambda l: l[0] + p(l[1:]) if l!=[] else 0

Both questions combined in a single line of code :

s = lambda l: (l[0]+l[1])/2. + s(l[1:]) if len(l)>1 else 0  #assuming you want result as float
s = lambda l: (l[0]+l[1])/2. + s(l[1:]) if len(l)>1 else 0  #assuming you want floor result

use the one that fits best your needs


回答 19

您也可以使用递归进行相同的操作:

Python片段:

def sumOfArray(arr, startIndex):
    size = len(arr)
    if size == startIndex:  # To Check empty list
        return 0
    elif startIndex == (size - 1): # To Check Last Value
        return arr[startIndex]
    else:
        return arr[startIndex] + sumOfArray(arr, startIndex + 1)


print(sumOfArray([1,2,3,4,5], 0))

You can also do the same using recursion:

Python Snippet:

def sumOfArray(arr, startIndex):
    size = len(arr)
    if size == startIndex:  # To Check empty list
        return 0
    elif startIndex == (size - 1): # To Check Last Value
        return arr[startIndex]
    else:
        return arr[startIndex] + sumOfArray(arr, startIndex + 1)


print(sumOfArray([1,2,3,4,5], 0))

回答 20

尝试使用列表理解。就像是:

new_list = [(old_list[i] + old_list[i+1])/2 for i in range(len(old_list-1))]

Try using a list comprehension. Something like:

new_list = [(old_list[i] + old_list[i+1])/2 for i in range(len(old_list-1))]

回答 21

本着itertools的精神。成对配方的灵感。

from itertools import tee, izip

def average(iterable):
    "s -> (s0,s1)/2.0, (s1,s2)/2.0, ..."
    a, b = tee(iterable)
    next(b, None)
    return ((x+y)/2.0 for x, y in izip(a, b))

例子:

>>>list(average([1,2,3,4,5]))
[1.5, 2.5, 3.5, 4.5]
>>>list(average([1,20,31,45,56,0,0]))
[10.5, 25.5, 38.0, 50.5, 28.0, 0.0]
>>>list(average(average([1,2,3,4,5])))
[2.0, 3.0, 4.0]

In the spirit of itertools. Inspiration from the pairwise recipe.

from itertools import tee, izip

def average(iterable):
    "s -> (s0,s1)/2.0, (s1,s2)/2.0, ..."
    a, b = tee(iterable)
    next(b, None)
    return ((x+y)/2.0 for x, y in izip(a, b))

Examples:

>>>list(average([1,2,3,4,5]))
[1.5, 2.5, 3.5, 4.5]
>>>list(average([1,20,31,45,56,0,0]))
[10.5, 25.5, 38.0, 50.5, 28.0, 0.0]
>>>list(average(average([1,2,3,4,5])))
[2.0, 3.0, 4.0]

回答 22

n = int(input("Enter the length of array: "))
list1 = []
for i in range(n):
    list1.append(int(input("Enter numbers: ")))
print("User inputs are", list1)

list2 = []
for j in range(0, n-1):
    list2.append((list1[j]+list1[j+1])/2)
print("result = ", list2)
n = int(input("Enter the length of array: "))
list1 = []
for i in range(n):
    list1.append(int(input("Enter numbers: ")))
print("User inputs are", list1)

list2 = []
for j in range(0, n-1):
    list2.append((list1[j]+list1[j+1])/2)
print("result = ", list2)

回答 23

一种简单的方法是使用iter_tools排列

# If you are given a list

numList = [1,2,3,4,5,6,7]

# and you are asked to find the number of three sums that add to a particular number

target = 10
# How you could come up with the answer?

from itertools import permutations

good_permutations = []

for p in permutations(numList, 3):
    if sum(p) == target:
        good_permutations.append(p)

print(good_permutations)

结果是:

[(1, 2, 7), (1, 3, 6), (1, 4, 5), (1, 5, 4), (1, 6, 3), (1, 7, 2), (2, 1, 7), (2, 3, 
5), (2, 5, 3), (2, 7, 1), (3, 1, 6), (3, 2, 5), (3, 5, 2), (3, 6, 1), (4, 1, 5), (4, 
5, 1), (5, 1, 4), (5, 2, 3), (5, 3, 2), (5, 4, 1), (6, 1, 3), (6, 3, 1), (7, 1, 2), 
(7, 2, 1)]

请注意,顺序很重要-意思是1、2、7也显示为2、1、7和7、1、2。您可以通过使用集合来减少此数目。

A simple way is to use the iter_tools permutation

# If you are given a list

numList = [1,2,3,4,5,6,7]

# and you are asked to find the number of three sums that add to a particular number

target = 10
# How you could come up with the answer?

from itertools import permutations

good_permutations = []

for p in permutations(numList, 3):
    if sum(p) == target:
        good_permutations.append(p)

print(good_permutations)

The result is:

[(1, 2, 7), (1, 3, 6), (1, 4, 5), (1, 5, 4), (1, 6, 3), (1, 7, 2), (2, 1, 7), (2, 3, 
5), (2, 5, 3), (2, 7, 1), (3, 1, 6), (3, 2, 5), (3, 5, 2), (3, 6, 1), (4, 1, 5), (4, 
5, 1), (5, 1, 4), (5, 2, 3), (5, 3, 2), (5, 4, 1), (6, 1, 3), (6, 3, 1), (7, 1, 2), 
(7, 2, 1)]

Note that order matters – meaning 1, 2, 7 is also shown as 2, 1, 7 and 7, 1, 2. You can reduce this by using a set.


回答 24

在Python 3.8中,可以使用新的赋值运算符

>>> my_list = [1, 2, 3, 4, 5]
>>> itr = iter(my_list)
>>> a = next(itr)
>>> [(a + (a:=x))/2 for x in itr]
[1.5, 2.5, 3.5, 4.5]

a是对列表中前一个值的运行引用,因此将其初始化为列表的第一个元素,并且迭代在列表的其余部分进行,a并在每次迭代中使用后进行更新。

显式迭代器用于避免使用创建列表的副本my_list[1:]

In Python 3.8, the new assignment operator can be used

>>> my_list = [1, 2, 3, 4, 5]
>>> itr = iter(my_list)
>>> a = next(itr)
>>> [(a + (a:=x))/2 for x in itr]
[1.5, 2.5, 3.5, 4.5]

a is a running reference to the previous value in the list, hence it is initialized to the first element of the list and the iteration occurs over the rest of the list, updating a after it is used in each iteration.

An explicit iterator is used to avoid needing to create a copy of the list using my_list[1:].


回答 25

试试以下-

mylist = [1, 2, 3, 4]   

def add(mylist):
    total = 0
    for i in mylist:
        total += i
    return total

result = add(mylist)
print("sum = ", result)

Try the following –

mylist = [1, 2, 3, 4]   

def add(mylist):
    total = 0
    for i in mylist:
        total += i
    return total

result = add(mylist)
print("sum = ", result)