分类目录归档:知识问答

在pandas / python中的数据框中合并两列文本

问题:在pandas / python中的数据框中合并两列文本

我在Python中使用熊猫有20 x 4000数据框。其中两列分别命名为Yearquarter。我想创建一个名为periodmake Year = 2000quarter= q2into 的变量2000q2

有人可以帮忙吗?

I have a 20 x 4000 dataframe in Python using pandas. Two of these columns are named Year and quarter. I’d like to create a variable called period that makes Year = 2000 and quarter= q2 into 2000q2.

Can anyone help with that?


回答 0

如果两个列都是字符串,则可以直接将它们连接起来:

df["period"] = df["Year"] + df["quarter"]

如果其中一列(或两列)均未输入字符串,则应首先将其转换为字符串,

df["period"] = df["Year"].astype(str) + df["quarter"]

这样做时要小心NaN!


如果需要连接多个字符串列,则可以使用agg

df['period'] = df[['Year', 'quarter', ...]].agg('-'.join, axis=1)

其中“-”是分隔符。

if both columns are strings, you can concatenate them directly:

df["period"] = df["Year"] + df["quarter"]

If one (or both) of the columns are not string typed, you should convert it (them) first,

df["period"] = df["Year"].astype(str) + df["quarter"]

Beware of NaNs when doing this!


If you need to join multiple string columns, you can use agg:

df['period'] = df[['Year', 'quarter', ...]].agg('-'.join, axis=1)

Where “-” is the separator.


回答 1

df = pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']})
df['period'] = df[['Year', 'quarter']].apply(lambda x: ''.join(x), axis=1)

产生此数据框

   Year quarter  period
0  2014      q1  2014q1
1  2015      q2  2015q2

此方法通过替换df[['Year', 'quarter']]为数据框的任何列切片(例如)将其推广为任意数量的字符串列df.iloc[:,0:2].apply(lambda x: ''.join(x), axis=1)

您可以在此处查看有关apply()方法的更多信息

df = pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']})
df['period'] = df[['Year', 'quarter']].apply(lambda x: ''.join(x), axis=1)

Yields this dataframe

   Year quarter  period
0  2014      q1  2014q1
1  2015      q2  2015q2

This method generalizes to an arbitrary number of string columns by replacing df[['Year', 'quarter']] with any column slice of your dataframe, e.g. df.iloc[:,0:2].apply(lambda x: ''.join(x), axis=1).

You can check more information about apply() method here


回答 2

小型数据集(<150行)

[''.join(i) for i in zip(df["Year"].map(str),df["quarter"])]

或稍慢但更紧凑:

df.Year.str.cat(df.quarter)

更大的数据集(> 150行)

df['Year'].astype(str) + df['quarter']

更新:时序图熊猫0.23.4

在此处输入图片说明

让我们在200K行DF上进行测试:

In [250]: df
Out[250]:
   Year quarter
0  2014      q1
1  2015      q2

In [251]: df = pd.concat([df] * 10**5)

In [252]: df.shape
Out[252]: (200000, 2)

更新:使用Pandas 0.19.0的新计时

定时不CPU / GPU优化(从排序最快到最慢):

In [107]: %timeit df['Year'].astype(str) + df['quarter']
10 loops, best of 3: 131 ms per loop

In [106]: %timeit df['Year'].map(str) + df['quarter']
10 loops, best of 3: 161 ms per loop

In [108]: %timeit df.Year.str.cat(df.quarter)
10 loops, best of 3: 189 ms per loop

In [109]: %timeit df.loc[:, ['Year','quarter']].astype(str).sum(axis=1)
1 loop, best of 3: 567 ms per loop

In [110]: %timeit df[['Year','quarter']].astype(str).sum(axis=1)
1 loop, best of 3: 584 ms per loop

In [111]: %timeit df[['Year','quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)
1 loop, best of 3: 24.7 s per loop

时序采用CPU / GPU优化:

In [113]: %timeit df['Year'].astype(str) + df['quarter']
10 loops, best of 3: 53.3 ms per loop

In [114]: %timeit df['Year'].map(str) + df['quarter']
10 loops, best of 3: 65.5 ms per loop

In [115]: %timeit df.Year.str.cat(df.quarter)
10 loops, best of 3: 79.9 ms per loop

In [116]: %timeit df.loc[:, ['Year','quarter']].astype(str).sum(axis=1)
1 loop, best of 3: 230 ms per loop

In [117]: %timeit df[['Year','quarter']].astype(str).sum(axis=1)
1 loop, best of 3: 230 ms per loop

In [118]: %timeit df[['Year','quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)
1 loop, best of 3: 9.38 s per loop

回答@ anton-vbr的贡献

Small data-sets (< 150rows)

[''.join(i) for i in zip(df["Year"].map(str),df["quarter"])]

or slightly slower but more compact:

df.Year.str.cat(df.quarter)

Larger data sets (> 150rows)

df['Year'].astype(str) + df['quarter']

UPDATE: Timing graph Pandas 0.23.4

enter image description here

Let’s test it on 200K rows DF:

In [250]: df
Out[250]:
   Year quarter
0  2014      q1
1  2015      q2

In [251]: df = pd.concat([df] * 10**5)

In [252]: df.shape
Out[252]: (200000, 2)

UPDATE: new timings using Pandas 0.19.0

Timing without CPU/GPU optimization (sorted from fastest to slowest):

In [107]: %timeit df['Year'].astype(str) + df['quarter']
10 loops, best of 3: 131 ms per loop

In [106]: %timeit df['Year'].map(str) + df['quarter']
10 loops, best of 3: 161 ms per loop

In [108]: %timeit df.Year.str.cat(df.quarter)
10 loops, best of 3: 189 ms per loop

In [109]: %timeit df.loc[:, ['Year','quarter']].astype(str).sum(axis=1)
1 loop, best of 3: 567 ms per loop

In [110]: %timeit df[['Year','quarter']].astype(str).sum(axis=1)
1 loop, best of 3: 584 ms per loop

In [111]: %timeit df[['Year','quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)
1 loop, best of 3: 24.7 s per loop

Timing using CPU/GPU optimization:

In [113]: %timeit df['Year'].astype(str) + df['quarter']
10 loops, best of 3: 53.3 ms per loop

In [114]: %timeit df['Year'].map(str) + df['quarter']
10 loops, best of 3: 65.5 ms per loop

In [115]: %timeit df.Year.str.cat(df.quarter)
10 loops, best of 3: 79.9 ms per loop

In [116]: %timeit df.loc[:, ['Year','quarter']].astype(str).sum(axis=1)
1 loop, best of 3: 230 ms per loop

In [117]: %timeit df[['Year','quarter']].astype(str).sum(axis=1)
1 loop, best of 3: 230 ms per loop

In [118]: %timeit df[['Year','quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)
1 loop, best of 3: 9.38 s per loop

Answer contribution by @anton-vbr


回答 3

该方法cat()的的.str访问可以很好地表现这一点:

>>> import pandas as pd
>>> df = pd.DataFrame([["2014", "q1"], 
...                    ["2015", "q3"]],
...                   columns=('Year', 'Quarter'))
>>> print(df)
   Year Quarter
0  2014      q1
1  2015      q3
>>> df['Period'] = df.Year.str.cat(df.Quarter)
>>> print(df)
   Year Quarter  Period
0  2014      q1  2014q1
1  2015      q3  2015q3

cat() 甚至允许您添加分隔符,因此,例如,假设年份和期间只有整数,则可以执行以下操作:

>>> import pandas as pd
>>> df = pd.DataFrame([[2014, 1],
...                    [2015, 3]],
...                   columns=('Year', 'Quarter'))
>>> print(df)
   Year Quarter
0  2014       1
1  2015       3
>>> df['Period'] = df.Year.astype(str).str.cat(df.Quarter.astype(str), sep='q')
>>> print(df)
   Year Quarter  Period
0  2014       1  2014q1
1  2015       3  2015q3

连接多列只是传递一系列列表或包含除第一列之外的所有列的数据框作为要str.cat()在第一列(系列)上调用的参数的问题:

>>> df = pd.DataFrame(
...     [['USA', 'Nevada', 'Las Vegas'],
...      ['Brazil', 'Pernambuco', 'Recife']],
...     columns=['Country', 'State', 'City'],
... )
>>> df['AllTogether'] = df['Country'].str.cat(df[['State', 'City']], sep=' - ')
>>> print(df)
  Country       State       City                   AllTogether
0     USA      Nevada  Las Vegas      USA - Nevada - Las Vegas
1  Brazil  Pernambuco     Recife  Brazil - Pernambuco - Recife

请注意,如果您的pandas数据框/系列具有空值,则需要包括参数na_rep以用字符串替换NaN值,否则合并的列将默认为NaN。

The method cat() of the .str accessor works really well for this:

>>> import pandas as pd
>>> df = pd.DataFrame([["2014", "q1"], 
...                    ["2015", "q3"]],
...                   columns=('Year', 'Quarter'))
>>> print(df)
   Year Quarter
0  2014      q1
1  2015      q3
>>> df['Period'] = df.Year.str.cat(df.Quarter)
>>> print(df)
   Year Quarter  Period
0  2014      q1  2014q1
1  2015      q3  2015q3

cat() even allows you to add a separator so, for example, suppose you only have integers for year and period, you can do this:

>>> import pandas as pd
>>> df = pd.DataFrame([[2014, 1],
...                    [2015, 3]],
...                   columns=('Year', 'Quarter'))
>>> print(df)
   Year Quarter
0  2014       1
1  2015       3
>>> df['Period'] = df.Year.astype(str).str.cat(df.Quarter.astype(str), sep='q')
>>> print(df)
   Year Quarter  Period
0  2014       1  2014q1
1  2015       3  2015q3

Joining multiple columns is just a matter of passing either a list of series or a dataframe containing all but the first column as a parameter to str.cat() invoked on the first column (Series):

>>> df = pd.DataFrame(
...     [['USA', 'Nevada', 'Las Vegas'],
...      ['Brazil', 'Pernambuco', 'Recife']],
...     columns=['Country', 'State', 'City'],
... )
>>> df['AllTogether'] = df['Country'].str.cat(df[['State', 'City']], sep=' - ')
>>> print(df)
  Country       State       City                   AllTogether
0     USA      Nevada  Las Vegas      USA - Nevada - Las Vegas
1  Brazil  Pernambuco     Recife  Brazil - Pernambuco - Recife

Do note that if your pandas dataframe/series has null values, you need to include the parameter na_rep to replace the NaN values with a string, otherwise the combined column will default to NaN.


回答 4

这次通过string.format()使用lamba函数。

import pandas as pd
df = pd.DataFrame({'Year': ['2014', '2015'], 'Quarter': ['q1', 'q2']})
print df
df['YearQuarter'] = df[['Year','Quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)
print df

  Quarter  Year
0      q1  2014
1      q2  2015
  Quarter  Year YearQuarter
0      q1  2014      2014q1
1      q2  2015      2015q2

这使您可以根据需要使用非字符串并重新格式化值。

import pandas as pd
df = pd.DataFrame({'Year': ['2014', '2015'], 'Quarter': [1, 2]})
print df.dtypes
print df

df['YearQuarter'] = df[['Year','Quarter']].apply(lambda x : '{}q{}'.format(x[0],x[1]), axis=1)
print df

Quarter     int64
Year       object
dtype: object
   Quarter  Year
0        1  2014
1        2  2015
   Quarter  Year YearQuarter
0        1  2014      2014q1
1        2  2015      2015q2

Use of a lamba function this time with string.format().

import pandas as pd
df = pd.DataFrame({'Year': ['2014', '2015'], 'Quarter': ['q1', 'q2']})
print df
df['YearQuarter'] = df[['Year','Quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)
print df

  Quarter  Year
0      q1  2014
1      q2  2015
  Quarter  Year YearQuarter
0      q1  2014      2014q1
1      q2  2015      2015q2

This allows you to work with non-strings and reformat values as needed.

import pandas as pd
df = pd.DataFrame({'Year': ['2014', '2015'], 'Quarter': [1, 2]})
print df.dtypes
print df

df['YearQuarter'] = df[['Year','Quarter']].apply(lambda x : '{}q{}'.format(x[0],x[1]), axis=1)
print df

Quarter     int64
Year       object
dtype: object
   Quarter  Year
0        1  2014
1        2  2015
   Quarter  Year YearQuarter
0        1  2014      2014q1
1        2  2015      2015q2

回答 5

您问题的简单答案。

    year    quarter
0   2000    q1
1   2000    q2

> df['year_quarter'] = df['year'] + '' + df['quarter']

> print(df['year_quarter'])
  2000q1
  2000q2

Simple answer for your question.

    year    quarter
0   2000    q1
1   2000    q2

> df['year_quarter'] = df['year'] + '' + df['quarter']

> print(df['year_quarter'])
  2000q1
  2000q2

回答 6

虽然@silvado的答案很好,但如果更改df.map(str)df.astype(str)它会更快:

import pandas as pd
df = pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']})

In [131]: %timeit df["Year"].map(str)
10000 loops, best of 3: 132 us per loop

In [132]: %timeit df["Year"].astype(str)
10000 loops, best of 3: 82.2 us per loop

Although the @silvado answer is good if you change df.map(str) to df.astype(str) it will be faster:

import pandas as pd
df = pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']})

In [131]: %timeit df["Year"].map(str)
10000 loops, best of 3: 132 us per loop

In [132]: %timeit df["Year"].astype(str)
10000 loops, best of 3: 82.2 us per loop

回答 7

让我们假设您 dataframedfYear和为Quarter

import pandas as pd
df = pd.DataFrame({'Quarter':'q1 q2 q3 q4'.split(), 'Year':'2000'})

假设我们要看数据框;

df
>>>  Quarter    Year
   0    q1      2000
   1    q2      2000
   2    q3      2000
   3    q4      2000

最后,将Year和连接Quarter如下。

df['Period'] = df['Year'] + ' ' + df['Quarter']

现在print df ,您可以查看生成的数据框。

df
>>>  Quarter    Year    Period
    0   q1      2000    2000 q1
    1   q2      2000    2000 q2
    2   q3      2000    2000 q3
    3   q4      2000    2000 q4

如果您不想在年份和季度之间留出空间,只需将其删除即可;

df['Period'] = df['Year'] + df['Quarter']

Let us suppose your dataframe is df with columns Year and Quarter.

import pandas as pd
df = pd.DataFrame({'Quarter':'q1 q2 q3 q4'.split(), 'Year':'2000'})

Suppose we want to see the dataframe;

df
>>>  Quarter    Year
   0    q1      2000
   1    q2      2000
   2    q3      2000
   3    q4      2000

Finally, concatenate the Year and the Quarter as follows.

df['Period'] = df['Year'] + ' ' + df['Quarter']

You can now print df to see the resulting dataframe.

df
>>>  Quarter    Year    Period
    0   q1      2000    2000 q1
    1   q2      2000    2000 q2
    2   q3      2000    2000 q3
    3   q4      2000    2000 q4

If you do not want the space between the year and quarter, simply remove it by doing;

df['Period'] = df['Year'] + df['Quarter']

回答 8

这是我发现非常通用的实现:

In [1]: import pandas as pd 

In [2]: df = pd.DataFrame([[0, 'the', 'quick', 'brown'],
   ...:                    [1, 'fox', 'jumps', 'over'], 
   ...:                    [2, 'the', 'lazy', 'dog']],
   ...:                   columns=['c0', 'c1', 'c2', 'c3'])

In [3]: def str_join(df, sep, *cols):
   ...:     from functools import reduce
   ...:     return reduce(lambda x, y: x.astype(str).str.cat(y.astype(str), sep=sep), 
   ...:                   [df[col] for col in cols])
   ...: 

In [4]: df['cat'] = str_join(df, '-', 'c0', 'c1', 'c2', 'c3')

In [5]: df
Out[5]: 
   c0   c1     c2     c3                cat
0   0  the  quick  brown  0-the-quick-brown
1   1  fox  jumps   over   1-fox-jumps-over
2   2  the   lazy    dog     2-the-lazy-dog

Here is an implementation that I find very versatile:

In [1]: import pandas as pd 

In [2]: df = pd.DataFrame([[0, 'the', 'quick', 'brown'],
   ...:                    [1, 'fox', 'jumps', 'over'], 
   ...:                    [2, 'the', 'lazy', 'dog']],
   ...:                   columns=['c0', 'c1', 'c2', 'c3'])

In [3]: def str_join(df, sep, *cols):
   ...:     from functools import reduce
   ...:     return reduce(lambda x, y: x.astype(str).str.cat(y.astype(str), sep=sep), 
   ...:                   [df[col] for col in cols])
   ...: 

In [4]: df['cat'] = str_join(df, '-', 'c0', 'c1', 'c2', 'c3')

In [5]: df
Out[5]: 
   c0   c1     c2     c3                cat
0   0  the  quick  brown  0-the-quick-brown
1   1  fox  jumps   over   1-fox-jumps-over
2   2  the   lazy    dog     2-the-lazy-dog

回答 9

将数据插入数据框时,此命令应该可以解决您的问题:

df['period'] = df[['Year', 'quarter']].apply(lambda x: ' '.join(x.astype(str)), axis=1)

As your data are inserted to a dataframe, this command should solve your problem:

df['period'] = df[['Year', 'quarter']].apply(lambda x: ' '.join(x.astype(str)), axis=1)

回答 10

更有效的是

def concat_df_str1(df):
    """ run time: 1.3416s """
    return pd.Series([''.join(row.astype(str)) for row in df.values], index=df.index)

这是一个时间测试:

import numpy as np
import pandas as pd

from time import time


def concat_df_str1(df):
    """ run time: 1.3416s """
    return pd.Series([''.join(row.astype(str)) for row in df.values], index=df.index)


def concat_df_str2(df):
    """ run time: 5.2758s """
    return df.astype(str).sum(axis=1)


def concat_df_str3(df):
    """ run time: 5.0076s """
    df = df.astype(str)
    return df[0] + df[1] + df[2] + df[3] + df[4] + \
           df[5] + df[6] + df[7] + df[8] + df[9]


def concat_df_str4(df):
    """ run time: 7.8624s """
    return df.astype(str).apply(lambda x: ''.join(x), axis=1)


def main():
    df = pd.DataFrame(np.zeros(1000000).reshape(100000, 10))
    df = df.astype(int)

    time1 = time()
    df_en = concat_df_str4(df)
    print('run time: %.4fs' % (time() - time1))
    print(df_en.head(10))


if __name__ == '__main__':
    main()

最后,当使用sum(concat_df_str2)时,结果不是简单的concat,它将转换为整数。

more efficient is

def concat_df_str1(df):
    """ run time: 1.3416s """
    return pd.Series([''.join(row.astype(str)) for row in df.values], index=df.index)

and here is a time test:

import numpy as np
import pandas as pd

from time import time


def concat_df_str1(df):
    """ run time: 1.3416s """
    return pd.Series([''.join(row.astype(str)) for row in df.values], index=df.index)


def concat_df_str2(df):
    """ run time: 5.2758s """
    return df.astype(str).sum(axis=1)


def concat_df_str3(df):
    """ run time: 5.0076s """
    df = df.astype(str)
    return df[0] + df[1] + df[2] + df[3] + df[4] + \
           df[5] + df[6] + df[7] + df[8] + df[9]


def concat_df_str4(df):
    """ run time: 7.8624s """
    return df.astype(str).apply(lambda x: ''.join(x), axis=1)


def main():
    df = pd.DataFrame(np.zeros(1000000).reshape(100000, 10))
    df = df.astype(int)

    time1 = time()
    df_en = concat_df_str4(df)
    print('run time: %.4fs' % (time() - time1))
    print(df_en.head(10))


if __name__ == '__main__':
    main()

final, when sum(concat_df_str2) is used, the result is not simply concat, it will trans to integer.


回答 11

归纳为多列,为什么不这样做:

columns = ['whatever', 'columns', 'you', 'choose']
df['period'] = df[columns].astype(str).sum(axis=1)

generalising to multiple columns, why not:

columns = ['whatever', 'columns', 'you', 'choose']
df['period'] = df[columns].astype(str).sum(axis=1)

回答 12

使用zip甚至可以更快:

df["period"] = [''.join(i) for i in zip(df["Year"].map(str),df["quarter"])]

图形:

在此处输入图片说明

import pandas as pd
import numpy as np
import timeit
import matplotlib.pyplot as plt
from collections import defaultdict

df = pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']})

myfuncs = {
"df['Year'].astype(str) + df['quarter']":
    lambda: df['Year'].astype(str) + df['quarter'],
"df['Year'].map(str) + df['quarter']":
    lambda: df['Year'].map(str) + df['quarter'],
"df.Year.str.cat(df.quarter)":
    lambda: df.Year.str.cat(df.quarter),
"df.loc[:, ['Year','quarter']].astype(str).sum(axis=1)":
    lambda: df.loc[:, ['Year','quarter']].astype(str).sum(axis=1),
"df[['Year','quarter']].astype(str).sum(axis=1)":
    lambda: df[['Year','quarter']].astype(str).sum(axis=1),
    "df[['Year','quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)":
    lambda: df[['Year','quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1),
    "[''.join(i) for i in zip(dataframe['Year'].map(str),dataframe['quarter'])]":
    lambda: [''.join(i) for i in zip(df["Year"].map(str),df["quarter"])]
}

d = defaultdict(dict)
step = 10
cont = True
while cont:
    lendf = len(df); print(lendf)
    for k,v in myfuncs.items():
        iters = 1
        t = 0
        while t < 0.2:
            ts = timeit.repeat(v, number=iters, repeat=3)
            t = min(ts)
            iters *= 10
        d[k][lendf] = t/iters
        if t > 2: cont = False
    df = pd.concat([df]*step)

pd.DataFrame(d).plot().legend(loc='upper center', bbox_to_anchor=(0.5, -0.15))
plt.yscale('log'); plt.xscale('log'); plt.ylabel('seconds'); plt.xlabel('df rows')
plt.show()

Using zip could be even quicker:

df["period"] = [''.join(i) for i in zip(df["Year"].map(str),df["quarter"])]

Graph:

enter image description here

import pandas as pd
import numpy as np
import timeit
import matplotlib.pyplot as plt
from collections import defaultdict

df = pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']})

myfuncs = {
"df['Year'].astype(str) + df['quarter']":
    lambda: df['Year'].astype(str) + df['quarter'],
"df['Year'].map(str) + df['quarter']":
    lambda: df['Year'].map(str) + df['quarter'],
"df.Year.str.cat(df.quarter)":
    lambda: df.Year.str.cat(df.quarter),
"df.loc[:, ['Year','quarter']].astype(str).sum(axis=1)":
    lambda: df.loc[:, ['Year','quarter']].astype(str).sum(axis=1),
"df[['Year','quarter']].astype(str).sum(axis=1)":
    lambda: df[['Year','quarter']].astype(str).sum(axis=1),
    "df[['Year','quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)":
    lambda: df[['Year','quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1),
    "[''.join(i) for i in zip(dataframe['Year'].map(str),dataframe['quarter'])]":
    lambda: [''.join(i) for i in zip(df["Year"].map(str),df["quarter"])]
}

d = defaultdict(dict)
step = 10
cont = True
while cont:
    lendf = len(df); print(lendf)
    for k,v in myfuncs.items():
        iters = 1
        t = 0
        while t < 0.2:
            ts = timeit.repeat(v, number=iters, repeat=3)
            t = min(ts)
            iters *= 10
        d[k][lendf] = t/iters
        if t > 2: cont = False
    df = pd.concat([df]*step)

pd.DataFrame(d).plot().legend(loc='upper center', bbox_to_anchor=(0.5, -0.15))
plt.yscale('log'); plt.xscale('log'); plt.ylabel('seconds'); plt.xlabel('df rows')
plt.show()

回答 13

最简单的解决方案:

通用解决方案

df['combined_col'] = df[['col1', 'col2']].astype(str).apply('-'.join, axis=1)

特定问题的解决方案

df['quarter_year'] = df[['quarter', 'year']].astype(str).apply(''.join, axis=1)

.join之前的引号内指定首选的分隔符

Simplest Solution:

Generic Solution

df['combined_col'] = df[['col1', 'col2']].astype(str).apply('-'.join, axis=1)

Question specific solution

df['quarter_year'] = df[['quarter', 'year']].astype(str).apply(''.join, axis=1)

Specify the preferred delimiter inside the quotes before .join


回答 14

此解决方案使用中间步骤将DataFrame的两列压缩为包含列表单列。这不仅适用于字符串,而且适用于所有类型的column-dtypes

import pandas as pd
df = pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']})
df['list']=df[['Year','quarter']].values.tolist()
df['period']=df['list'].apply(''.join)
print(df)

结果:

   Year quarter        list  period
0  2014      q1  [2014, q1]  2014q1
1  2015      q2  [2015, q2]  2015q2

This solution uses an intermediate step compressing two columns of the DataFrame to a single column containing a list of the values. This works not only for strings but for all kind of column-dtypes

import pandas as pd
df = pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']})
df['list']=df[['Year','quarter']].values.tolist()
df['period']=df['list'].apply(''.join)
print(df)

Result:

   Year quarter        list  period
0  2014      q1  [2014, q1]  2014q1
1  2015      q2  [2015, q2]  2015q2

回答 15

如前所述,您必须将每一列转换为字符串,然后使用加号运算符组合两个字符串列。使用NumPy可以大大提高性能。

%timeit df['Year'].values.astype(str) + df.quarter
71.1 ms ± 3.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df['Year'].astype(str) + df['quarter']
565 ms ± 22.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

As many have mentioned previously, you must convert each column to string and then use the plus operator to combine two string columns. You can get a large performance improvement by using NumPy.

%timeit df['Year'].values.astype(str) + df.quarter
71.1 ms ± 3.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df['Year'].astype(str) + df['quarter']
565 ms ± 22.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

回答 16

我认为在pandas中组合列的最好方法是将两个列都转换为整数,然后转换为str。

df[['Year', 'quarter']] = df[['Year', 'quarter']].astype(int).astype(str)
df['Period']= df['Year'] + 'q' + df['quarter']

I think the best way to combine the columns in pandas is by converting both the columns to integer and then to str.

df[['Year', 'quarter']] = df[['Year', 'quarter']].astype(int).astype(str)
df['Period']= df['Year'] + 'q' + df['quarter']

回答 17

这是我上面的解决方案的摘要,该方法使用列值之间的分隔符将具有int和str值的两列连接/合并为新列。为此有三种解决方案。

# be cautious about the separator, some symbols may cause "SyntaxError: EOL while scanning string literal".
# e.g. ";;" as separator would raise the SyntaxError

separator = "&&" 

# pd.Series.str.cat() method does not work to concatenate / combine two columns with int value and str value. This would raise "AttributeError: Can only use .cat accessor with a 'category' dtype"

df["period"] = df["Year"].map(str) + separator + df["quarter"]
df["period"] = df[['Year','quarter']].apply(lambda x : '{} && {}'.format(x[0],x[1]), axis=1)
df["period"] = df.apply(lambda x: f'{x["Year"]} && {x["quarter"]}', axis=1)

Here is my summary of the above solutions to concatenate / combine two columns with int and str value into a new column, using a separator between the values of columns. Three solutions work for this purpose.

# be cautious about the separator, some symbols may cause "SyntaxError: EOL while scanning string literal".
# e.g. ";;" as separator would raise the SyntaxError

separator = "&&" 

# pd.Series.str.cat() method does not work to concatenate / combine two columns with int value and str value. This would raise "AttributeError: Can only use .cat accessor with a 'category' dtype"

df["period"] = df["Year"].map(str) + separator + df["quarter"]
df["period"] = df[['Year','quarter']].apply(lambda x : '{} && {}'.format(x[0],x[1]), axis=1)
df["period"] = df.apply(lambda x: f'{x["Year"]} && {x["quarter"]}', axis=1)

回答 18

使用.combine_first

df['Period'] = df['Year'].combine_first(df['Quarter'])

Use .combine_first.

df['Period'] = df['Year'].combine_first(df['Quarter'])

回答 19

def madd(x):
    """Performs element-wise string concatenation with multiple input arrays.

    Args:
        x: iterable of np.array.

    Returns: np.array.
    """
    for i, arr in enumerate(x):
        if type(arr.item(0)) is not str:
            x[i] = x[i].astype(str)
    return reduce(np.core.defchararray.add, x)

例如:

data = list(zip([2000]*4, ['q1', 'q2', 'q3', 'q4']))
df = pd.DataFrame(data=data, columns=['Year', 'quarter'])
df['period'] = madd([df[col].values for col in ['Year', 'quarter']])

df

    Year    quarter period
0   2000    q1  2000q1
1   2000    q2  2000q2
2   2000    q3  2000q3
3   2000    q4  2000q4
def madd(x):
    """Performs element-wise string concatenation with multiple input arrays.

    Args:
        x: iterable of np.array.

    Returns: np.array.
    """
    for i, arr in enumerate(x):
        if type(arr.item(0)) is not str:
            x[i] = x[i].astype(str)
    return reduce(np.core.defchararray.add, x)

For example:

data = list(zip([2000]*4, ['q1', 'q2', 'q3', 'q4']))
df = pd.DataFrame(data=data, columns=['Year', 'quarter'])
df['period'] = madd([df[col].values for col in ['Year', 'quarter']])

df

    Year    quarter period
0   2000    q1  2000q1
1   2000    q2  2000q2
2   2000    q3  2000q3
3   2000    q4  2000q4

回答 20

一个可以使用DataFrame的分配方法:

df= (pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']}).
  assign(period=lambda x: x.Year+x.quarter ))

One can use assign method of DataFrame:

df= (pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']}).
  assign(period=lambda x: x.Year+x.quarter ))

回答 21

dataframe["period"] = dataframe["Year"].astype(str).add(dataframe["quarter"])

或者如果值类似于[2000] [4]并想要设为[2000q4]

dataframe["period"] = dataframe["Year"].astype(str).add('q').add(dataframe["quarter"]).astype(str)

.astype(str).map(str)作品代替。

dataframe["period"] = dataframe["Year"].astype(str).add(dataframe["quarter"])

or if values are like [2000] [4] and want to make [2000q4]

dataframe["period"] = dataframe["Year"].astype(str).add('q').add(dataframe["quarter"]).astype(str)

substituting .astype(str) with .map(str) works too.


django中业务逻辑和数据访问的分离

问题:django中业务逻辑和数据访问的分离

我正在Django中编写一个项目,并且看到80%的代码在file中models.py。这段代码令人困惑,并且在一段时间之后,我不再了解实际发生的事情。

这是困扰我的事情:

  1. 我发现模型级别(应该只负责处理数据库中的数据)在发送电子邮件,使用API​​到其他服务等方面也很丑陋。
  2. 另外,我发现在视图中放置业务逻辑也是不可接受的,因为这样很难控制。例如,在我的应用程序中,至少有三种方法来创建的新实例User,但从技术上讲,它应统一创建它们。
  3. 我并不总是注意到模型的方法和属性何时变得不确定,以及何时出现副作用。

这是一个简单的例子。首先,User模型是这样的:

class User(db.Models):

    def get_present_name(self):
        return self.name or 'Anonymous'

    def activate(self):
        self.status = 'activated'
        self.save()

随着时间的流逝,它变成了:

class User(db.Models):

    def get_present_name(self): 
        # property became non-deterministic in terms of database
        # data is taken from another service by api
        return remote_api.request_user_name(self.uid) or 'Anonymous' 

    def activate(self):
        # method now has a side effect (send message to user)
        self.status = 'activated'
        self.save()
        send_mail('Your account is activated!', '…', [self.email])

我想要的是在代码中分离实体:

  1. 数据库级别的数据库实体:什么包含我的应用程序?
  2. 我的应用程序的实体,业务逻辑级别:可以使我的应用程序做什么?

有什么好的实践来实现可以在Django中应用的方法?

I am writing a project in Django and I see that 80% of the code is in the file models.py. This code is confusing and, after a certain time, I cease to understand what is really happening.

Here is what bothers me:

  1. I find it ugly that my model level (which was supposed to be responsible only for the work with data from a database) is also sending email, walking on API to other services, etc.
  2. Also, I find it unacceptable to place business logic in the view, because this way it becomes difficult to control. For example, in my application there are at least three ways to create new instances of User, but technically it should create them uniformly.
  3. I do not always notice when the methods and properties of my models become non-deterministic and when they develop side effects.

Here is a simple example. At first, the User model was like this:

class User(db.Models):

    def get_present_name(self):
        return self.name or 'Anonymous'

    def activate(self):
        self.status = 'activated'
        self.save()

Over time, it turned into this:

class User(db.Models):

    def get_present_name(self): 
        # property became non-deterministic in terms of database
        # data is taken from another service by api
        return remote_api.request_user_name(self.uid) or 'Anonymous' 

    def activate(self):
        # method now has a side effect (send message to user)
        self.status = 'activated'
        self.save()
        send_mail('Your account is activated!', '…', [self.email])

What I want is to separate entities in my code:

  1. Entities of my database, database level: What contains my application?
  2. Entities of my application, business logic level: What can make my application?

What are the good practices to implement such an approach that can be applied in Django?


回答 0

似乎您是在询问数据模型域模型 –后者是您可以找到最终用户感知的业务逻辑和实体的地方,前者是您实际存储数据的地方。

此外,我将问题的第三部分解释为:如何注意到未能将这些模型分开的问题。

这是两个截然不同的概念,很难将它们分开。但是,有一些常见的模式和工具可用于此目的。

关于领域模型

您需要认识的第一件事是您的域模型并不是真正的数据。它涉及诸如“激活此用户”,“停用此用户”,“当前已激活哪些用户”和“该用户的名字是什么”之类的动作问题。用经典术语来说:它是关于查询命令的

指挥思维

让我们从示例中的命令开始:“激活此用户”和“停用此用户”。关于命令的好处是,它们可以很容易地用小给定的情况来表示:


管理员激活该用户时,将其指定为非活动用户
该用户将变为活动状态
并向该用户发送确认电子邮件,
并将条目添加到系统日志
(等)。

这种情况对于查看单个命令如何影响基础结构的不同部分很有用,在这种情况下,您的数据库(某种“活动”标志),邮件服务器,系统日志等会受到影响。

这样的场景也确实可以帮助您设置测试驱动开发环境。

最后,思考命令确实可以帮助您创建面向任务的应用程序。您的用户将对此表示赞赏:-)

表达命令

Django提供了两种简单的表达命令的方式:它们都是有效的选择,并且将两种方法混合使用并不罕见。

服务层

服务模块已经通过@Hedde描述。在这里,您定义了一个单独的模块,每个命令都表示为一个函数。

services.py

def activate_user(user_id):
    user = User.objects.get(pk=user_id)

    # set active flag
    user.active = True
    user.save()

    # mail user
    send_mail(...)

    # etc etc

使用表格

另一种方法是为每个命令使用Django表单。我更喜欢这种方法,因为它结合了多个紧密相关的方面:

  • 命令的执行(它做什么?)
  • 验证命令参数(可以执行此操作吗?)
  • 命令演示(如何执行此操作?)

表格

class ActivateUserForm(forms.Form):

    user_id = IntegerField(widget = UsernameSelectWidget, verbose_name="Select a user to activate")
    # the username select widget is not a standard Django widget, I just made it up

    def clean_user_id(self):
        user_id = self.cleaned_data['user_id']
        if User.objects.get(pk=user_id).active:
            raise ValidationError("This user cannot be activated")
        # you can also check authorizations etc. 
        return user_id

    def execute(self):
        """
        This is not a standard method in the forms API; it is intended to replace the 
        'extract-data-from-form-in-view-and-do-stuff' pattern by a more testable pattern. 
        """
        user_id = self.cleaned_data['user_id']

        user = User.objects.get(pk=user_id)

        # set active flag
        user.active = True
        user.save()

        # mail user
        send_mail(...)

        # etc etc

在查询中思考

您的示例不包含任何查询,因此我自由地编写了一些有用的查询。我更喜欢使用“问题”一词,但是查询是经典的术语。有趣的查询是:“此用户的名称是什么?”,“此用户可以登录吗?”,“向我显示已停用用户的列表”和“已停用用户的地理分布是什么?”。

在着手回答这些查询之前,您应该始终问自己两个问题:这是仅针对我的模板的表示性查询,和/或与执行我的命令相关的业务逻辑查询,和/或报告查询。

呈现查询只是为了改善用户界面。业务逻辑查询的答案直接影响命令的执行。报告查询仅用于分析目的,并且具有较宽松的时间限制。这些类别不是互相排斥的。

另一个问题是:“我是否完全控制答案?” 例如,在查询用户名(在这种情况下)时,我们对结果没有任何控制权,因为我们依赖于外部API。

进行查询

Django中最基本的查询是使用Manager对象:

User.objects.filter(active=True)

当然,这仅在数据实际在数据模型中表示时才有效。这并非总是如此。在这种情况下,您可以考虑以下选项。

自定义标签和过滤器

第一种替代方法仅对表示性查询有用:自定义标记和模板过滤器。

template.html

<h1>Welcome, {{ user|friendly_name }}</h1>

template_tags.py

@register.filter
def friendly_name(user):
    return remote_api.get_cached_name(user.id)

查询方法

如果您的查询不只是表示形式的查询,则可以将查询添加到您的services.py(如果正在使用的话),或者引入querys.py模块:

querys.py

def inactive_users():
    return User.objects.filter(active=False)


def users_called_publysher():
    for user in User.objects.all():
        if remote_api.get_cached_name(user.id) == "publysher":
            yield user 

代理模型

代理模型在业务逻辑和报告的上下文中非常有用。您基本上定义了模型的增强子集。您可以通过覆盖Manager的基本QuerySet来覆盖Manager.get_queryset()方法。

models.py

class InactiveUserManager(models.Manager):
    def get_queryset(self):
        query_set = super(InactiveUserManager, self).get_queryset()
        return query_set.filter(active=False)

class InactiveUser(User):
    """
    >>> for user in InactiveUser.objects.all():
    …        assert user.active is False 
    """

    objects = InactiveUserManager()
    class Meta:
        proxy = True

查询模型

对于本质上很复杂但经常执行的查询,存在查询模型的可能性。查询模型是非规范化的一种形式,其中单个查询的相关数据存储在单独的模型中。当然,技巧是使非规范化模型与主模型保持同步。仅当更改完全在您的控制之下时才能使用查询模型。

models.py

class InactiveUserDistribution(models.Model):
    country = CharField(max_length=200)
    inactive_user_count = IntegerField(default=0)

第一种选择是在命令中更新这些模型。如果仅通过一个或两个命令更改这些模型,这将非常有用。

表格

class ActivateUserForm(forms.Form):
    # see above

    def execute(self):
        # see above
        query_model = InactiveUserDistribution.objects.get_or_create(country=user.country)
        query_model.inactive_user_count -= 1
        query_model.save()

更好的选择是使用自定义信号。这些信号当然是由您的命令发出的。信号的优点是您可以使多个查询模型与原始模型保持同步。此外,可以使用Celery或类似框架将信号处理任务转移给后台任务。

signal.py

user_activated = Signal(providing_args = ['user'])
user_deactivated = Signal(providing_args = ['user'])

表格

class ActivateUserForm(forms.Form):
    # see above

    def execute(self):
        # see above
        user_activated.send_robust(sender=self, user=user)

models.py

class InactiveUserDistribution(models.Model):
    # see above

@receiver(user_activated)
def on_user_activated(sender, **kwargs):
        user = kwargs['user']
        query_model = InactiveUserDistribution.objects.get_or_create(country=user.country)
        query_model.inactive_user_count -= 1
        query_model.save()

保持清洁

使用这种方法时,很容易确定代码是否保持干净。只需遵循以下准则:

  • 我的模型中是否包含比管理数据库状态还执行更多功能的方法?您应该提取命令。
  • 我的模型是否包含未映射到数据库字段的属性?您应该提取一个查询。
  • 我的模型是否引用了不是数据库的基础架构(例如邮件)?您应该提取命令。

视图也一样(因为视图经常遇到相同的问题)。

  • 我的视图是否主动管理数据库模型?您应该提取命令。

一些参考

Django文档:代理模型

Django文档:信号

体系结构:域驱动设计

It seems like you are asking about the difference between the data model and the domain model – the latter is where you can find the business logic and entities as perceived by your end user, the former is where you actually store your data.

Furthermore, I’ve interpreted the 3rd part of your question as: how to notice failure to keep these models separate.

These are two very different concepts and it’s always hard to keep them separate. However, there are some common patterns and tools that can be used for this purpose.

About the Domain Model

The first thing you need to recognize is that your domain model is not really about data; it is about actions and questions such as “activate this user”, “deactivate this user”, “which users are currently activated?”, and “what is this user’s name?”. In classical terms: it’s about queries and commands.

Thinking in Commands

Let’s start by looking at the commands in your example: “activate this user” and “deactivate this user”. The nice thing about commands is that they can easily be expressed by small given-when-then scenario’s:

given an inactive user
when the admin activates this user
then the user becomes active
and a confirmation e-mail is sent to the user
and an entry is added to the system log
(etc. etc.)

Such scenario’s are useful to see how different parts of your infrastructure can be affected by a single command – in this case your database (some kind of ‘active’ flag), your mail server, your system log, etc.

Such scenario’s also really help you in setting up a Test Driven Development environment.

And finally, thinking in commands really helps you create a task-oriented application. Your users will appreciate this :-)

Expressing Commands

Django provides two easy ways of expressing commands; they are both valid options and it is not unusual to mix the two approaches.

The service layer

The service module has already been described by @Hedde. Here you define a separate module and each command is represented as a function.

services.py

def activate_user(user_id):
    user = User.objects.get(pk=user_id)

    # set active flag
    user.active = True
    user.save()

    # mail user
    send_mail(...)

    # etc etc

Using forms

The other way is to use a Django Form for each command. I prefer this approach, because it combines multiple closely related aspects:

  • execution of the command (what does it do?)
  • validation of the command parameters (can it do this?)
  • presentation of the command (how can I do this?)

forms.py

class ActivateUserForm(forms.Form):

    user_id = IntegerField(widget = UsernameSelectWidget, verbose_name="Select a user to activate")
    # the username select widget is not a standard Django widget, I just made it up

    def clean_user_id(self):
        user_id = self.cleaned_data['user_id']
        if User.objects.get(pk=user_id).active:
            raise ValidationError("This user cannot be activated")
        # you can also check authorizations etc. 
        return user_id

    def execute(self):
        """
        This is not a standard method in the forms API; it is intended to replace the 
        'extract-data-from-form-in-view-and-do-stuff' pattern by a more testable pattern. 
        """
        user_id = self.cleaned_data['user_id']

        user = User.objects.get(pk=user_id)

        # set active flag
        user.active = True
        user.save()

        # mail user
        send_mail(...)

        # etc etc

Thinking in Queries

You example did not contain any queries, so I took the liberty of making up a few useful queries. I prefer to use the term “question”, but queries is the classical terminology. Interesting queries are: “What is the name of this user?”, “Can this user log in?”, “Show me a list of deactivated users”, and “What is the geographical distribution of deactivated users?”

Before embarking on answering these queries, you should always ask yourself two questions: is this a presentational query just for my templates, and/or a business logic query tied to executing my commands, and/or a reporting query.

Presentational queries are merely made to improve the user interface. The answers to business logic queries directly affect the execution of your commands. Reporting queries are merely for analytical purposes and have looser time constraints. These categories are not mutually exclusive.

The other question is: “do I have complete control over the answers?” For example, when querying the user’s name (in this context) we do not have any control over the outcome, because we rely on an external API.

Making Queries

The most basic query in Django is the use of the Manager object:

User.objects.filter(active=True)

Of course, this only works if the data is actually represented in your data model. This is not always the case. In those cases, you can consider the options below.

Custom tags and filters

The first alternative is useful for queries that are merely presentational: custom tags and template filters.

template.html

<h1>Welcome, {{ user|friendly_name }}</h1>

template_tags.py

@register.filter
def friendly_name(user):
    return remote_api.get_cached_name(user.id)

Query methods

If your query is not merely presentational, you could add queries to your services.py (if you are using that), or introduce a queries.py module:

queries.py

def inactive_users():
    return User.objects.filter(active=False)


def users_called_publysher():
    for user in User.objects.all():
        if remote_api.get_cached_name(user.id) == "publysher":
            yield user 

Proxy models

Proxy models are very useful in the context of business logic and reporting. You basically define an enhanced subset of your model. You can override a Manager’s base QuerySet by overriding the Manager.get_queryset() method.

models.py

class InactiveUserManager(models.Manager):
    def get_queryset(self):
        query_set = super(InactiveUserManager, self).get_queryset()
        return query_set.filter(active=False)

class InactiveUser(User):
    """
    >>> for user in InactiveUser.objects.all():
    …        assert user.active is False 
    """

    objects = InactiveUserManager()
    class Meta:
        proxy = True

Query models

For queries that are inherently complex, but are executed quite often, there is the possibility of query models. A query model is a form of denormalization where relevant data for a single query is stored in a separate model. The trick of course is to keep the denormalized model in sync with the primary model. Query models can only be used if changes are entirely under your control.

models.py

class InactiveUserDistribution(models.Model):
    country = CharField(max_length=200)
    inactive_user_count = IntegerField(default=0)

The first option is to update these models in your commands. This is very useful if these models are only changed by one or two commands.

forms.py

class ActivateUserForm(forms.Form):
    # see above

    def execute(self):
        # see above
        query_model = InactiveUserDistribution.objects.get_or_create(country=user.country)
        query_model.inactive_user_count -= 1
        query_model.save()

A better option would be to use custom signals. These signals are of course emitted by your commands. Signals have the advantage that you can keep multiple query models in sync with your original model. Furthermore, signal processing can be offloaded to background tasks, using Celery or similar frameworks.

signals.py

user_activated = Signal(providing_args = ['user'])
user_deactivated = Signal(providing_args = ['user'])

forms.py

class ActivateUserForm(forms.Form):
    # see above

    def execute(self):
        # see above
        user_activated.send_robust(sender=self, user=user)

models.py

class InactiveUserDistribution(models.Model):
    # see above

@receiver(user_activated)
def on_user_activated(sender, **kwargs):
        user = kwargs['user']
        query_model = InactiveUserDistribution.objects.get_or_create(country=user.country)
        query_model.inactive_user_count -= 1
        query_model.save()

Keeping it clean

When using this approach, it becomes ridiculously easy to determine if your code stays clean. Just follow these guidelines:

  • Does my model contain methods that do more than managing database state? You should extract a command.
  • Does my model contain properties that do not map to database fields? You should extract a query.
  • Does my model reference infrastructure that is not my database (such as mail)? You should extract a command.

The same goes for views (because views often suffer from the same problem).

  • Does my view actively manage database models? You should extract a command.

Some References

Django documentation: proxy models

Django documentation: signals

Architecture: Domain Driven Design


回答 1

我通常在视图和模型之间实现服务层。这就像您项目的API一样,并为您提供了一个很好的直升机视图,可以了解正在发生的事情。我从我的一位同事那里继承了这种做法,该同事在Java项目(JSF)中经常使用这种分层技术,例如:

models.py

class Book:
   author = models.ForeignKey(User)
   title = models.CharField(max_length=125)

   class Meta:
       app_label = "library"

services.py

from library.models import Book

def get_books(limit=None, **filters):
    """ simple service function for retrieving books can be widely extended """
    return Book.objects.filter(**filters)[:limit]  # list[:None] will return the entire list

views.py

from library.services import get_books

class BookListView(ListView):
    """ simple view, e.g. implement a _build and _apply filters function """
    queryset = get_books()

请注意,我通常将模型,视图和服务带到模块级别,并根据项目的规模进一步分开

I usually implement a service layer in between views and models. This acts like your project’s API and gives you a good helicopter view of what is going on. I inherited this practice from a colleague of mine that uses this layering technique a lot with Java projects (JSF), e.g:

models.py

class Book:
   author = models.ForeignKey(User)
   title = models.CharField(max_length=125)

   class Meta:
       app_label = "library"

services.py

from library.models import Book

def get_books(limit=None, **filters):
    """ simple service function for retrieving books can be widely extended """
    return Book.objects.filter(**filters)[:limit]  # list[:None] will return the entire list

views.py

from library.services import get_books

class BookListView(ListView):
    """ simple view, e.g. implement a _build and _apply filters function """
    queryset = get_books()

Mind you, I usually take models, views and services to module level and separate even further depending on the project’s size


回答 2

首先,不要重复自己

然后,请注意不要过度设计,有时这只是浪费时间,并使某人失去对重要内容的关注。回顾python禅宗不时。

看一下活跃的项目

  • 更多的人=更多需要适当组织
  • Django的存储库,他们有一个简单的结构。
  • 点子库,他们有一个straigtforward目录结构。
  • 面料库也是一个很好的来看待。

    • 您可以将所有模型放置在 yourapp/models/logicalgroup.py
  • 例如 UserGroup相关模型可能会失败yourapp/models/users.py
  • 例如 PollQuestionAnswer…可以去下yourapp/models/polls.py
  • 加载您需要的内容 __all__里面东西yourapp/models/__init__.py

有关MVC的更多信息

  • 模型就是你的数据
    • 这包括您的实际数据
    • 这还包括您的会话/ Cookie /缓存/ FS /索引数据
  • 用户与控制器交互以操纵模型
    • 这可以是API,也可以是保存/更新数据的视图
    • 可以通过request.GET/ 进行调整request.POST … etc
    • 也考虑分页过滤
  • 数据更新视图
    • 模板获取数据并相应地格式化
    • 甚至没有模板的API都是视图的一部分;例如tastypiepiston
    • 这也应该考虑中间件。

利用中间件 / 模板标签

  • 如果您需要为每个请求完成一些工作,那么中间件是一种解决方法。
    • 例如添加时间戳
    • 例如,更新有关网页点击量的指标
    • 例如,填充缓存
  • 如果您的代码片段总是在格式化对象时反复出现,那么模板标签就不错了。
    • 例如,活动选项卡/ URL面包屑

利用模型经理

  • 创建User可以进入UserManager(models.Manager)
  • 实例的血腥细节应该在上面models.Model
  • 有关的细节,queryset可以进去models.Manager
  • 您可能想一次创建一个对象User,因此您可能认为它应该存在于模型本身,但是在创建对象时,您可能并没有所有的细节:

例:

class UserManager(models.Manager):
   def create_user(self, username, ...):
      # plain create
   def create_superuser(self, username, ...):
      # may set is_superuser field.
   def activate(self, username):
      # may use save() and send_mail()
   def activate_in_bulk(self, queryset):
      # may use queryset.update() instead of save()
      # may use send_mass_mail() instead of send_mail()

尽可能使用表格

如果您有映射到模型的表单,则可以省去很多样板代码。的ModelForm documentation还不错。如果您有很多自定义功能,则最好将表单的代码与模型代码分开(或者为避免更高级的使用有时避免循环导入错误)。

尽可能使用管理命令

  • 例如 yourapp/management/commands/createsuperuser.py
  • 例如 yourapp/management/commands/activateinbulk.py

如果您有业务逻辑,可以将其分离出来

  • django.contrib.auth 使用后端,就像db有一个后端…等。
  • setting为您的业务逻辑添加一个(例如AUTHENTICATION_BACKENDS
  • 你可以用 django.contrib.auth.backends.RemoteUserBackend
  • 你可以用 yourapp.backends.remote_api.RemoteUserBackend
  • 你可以用 yourapp.backends.memcached.RemoteUserBackend
  • 将困难的业务逻辑委托给后端
  • 确保在输入/输出上设置期望值。
  • 更改业务逻辑就像更改设置一样简单:)

后端示例:

class User(db.Models):
    def get_present_name(self): 
        # property became not deterministic in terms of database
        # data is taken from another service by api
        return remote_api.request_user_name(self.uid) or 'Anonymous' 

可能成为:

class User(db.Models):
   def get_present_name(self):
      for backend in get_backends():
         try:
            return backend.get_present_name(self)
         except: # make pylint happy.
            pass
      return None

有关设计模式的更多信息

有关界面边界的更多信息

  • 您要使用的代码确实是模型的一部分吗?->yourapp.models
  • 代码是业务逻辑的一部分吗?->yourapp.vendor
  • 代码是通用工具/库的一部分吗?->yourapp.libs
  • 代码是业务逻辑库的一部分吗?-> yourapp.libs.vendoryourapp.vendor.libs
  • 这是一个很好的例子:您可以独立测试代码吗?
    • 对很好 :)
    • 不,您可能有接口问题
    • 当有明确的分离时,使用嘲笑可以使单元测试变得轻而易举
  • 分离符合逻辑吗?
    • 对很好 :)
    • 不,您可能无法单独测试这些逻辑概念。
  • 您认为当您获得10倍以上的代码时是否需要重构?
    • 是的,没有好处,没有布宜诺斯艾利斯,重构可能需要大量工作
    • 不,那太棒了!

简而言之,您可以

  • yourapp/core/backends.py
  • yourapp/core/models/__init__.py
  • yourapp/core/models/users.py
  • yourapp/core/models/questions.py
  • yourapp/core/backends.py
  • yourapp/core/forms.py
  • yourapp/core/handlers.py
  • yourapp/core/management/commands/__init__.py
  • yourapp/core/management/commands/closepolls.py
  • yourapp/core/management/commands/removeduplicates.py
  • yourapp/core/middleware.py
  • yourapp/core/signals.py
  • yourapp/core/templatetags/__init__.py
  • yourapp/core/templatetags/polls_extras.py
  • yourapp/core/views/__init__.py
  • yourapp/core/views/users.py
  • yourapp/core/views/questions.py
  • yourapp/core/signals.py
  • yourapp/lib/utils.py
  • yourapp/lib/textanalysis.py
  • yourapp/lib/ratings.py
  • yourapp/vendor/backends.py
  • yourapp/vendor/morebusinesslogic.py
  • yourapp/vendor/handlers.py
  • yourapp/vendor/middleware.py
  • yourapp/vendor/signals.py
  • yourapp/tests/test_polls.py
  • yourapp/tests/test_questions.py
  • yourapp/tests/test_duplicates.py
  • yourapp/tests/test_ratings.py

或任何其他可以帮助您的东西;找到所需的接口边界将对您有所帮助。

First of all, Don’t repeat yourself.

Then, please be careful not to overengineer, sometimes it is just a waste of time, and makes someone lose focus on what is important. Review the zen of python from time to time.

Take a look at active projects

  • more people = more need to organize properly
  • the django repository they have a straightforward structure.
  • the pip repository they have a straigtforward directory structure.
  • the fabric repository is also a good one to look at.

    • you can place all your models under yourapp/models/logicalgroup.py
  • e.g User, Group and related models can go under yourapp/models/users.py
  • e.g Poll, Question, Answer … could go under yourapp/models/polls.py
  • load what you need in __all__ inside of yourapp/models/__init__.py

More about MVC

  • model is your data
    • this includes your actual data
    • this also includes your session / cookie / cache / fs / index data
  • user interacts with controller to manipulate the model
    • this could be an API, or a view that saves/updates your data
    • this can be tuned with request.GET / request.POST …etc
    • think paging or filtering too.
  • the data updates the view
    • the templates take the data and format it accordingly
    • APIs even w/o templates are part of the view; e.g. tastypie or piston
    • this should also account for the middleware.

Take advantage of middleware / templatetags

  • If you need some work to be done for each request, middleware is one way to go.
    • e.g. adding timestamps
    • e.g. updating metrics about page hits
    • e.g. populating a cache
  • If you have snippets of code that always reoccur for formatting objects, templatetags are good.
    • e.g. active tab / url breadcrumbs

Take advantage of model managers

  • creating User can go in a UserManager(models.Manager).
  • gory details for instances should go on the models.Model.
  • gory details for queryset could go in a models.Manager.
  • you might want to create a User one at a time, so you may think that it should live on the model itself, but when creating the object, you probably don’t have all the details:

Example:

class UserManager(models.Manager):
   def create_user(self, username, ...):
      # plain create
   def create_superuser(self, username, ...):
      # may set is_superuser field.
   def activate(self, username):
      # may use save() and send_mail()
   def activate_in_bulk(self, queryset):
      # may use queryset.update() instead of save()
      # may use send_mass_mail() instead of send_mail()

Make use of forms where possible

A lot of boilerplate code can be eliminated if you have forms that map to a model. The ModelForm documentation is pretty good. Separating code for forms from model code can be good if you have a lot of customization (or sometimes avoid cyclic import errors for more advanced uses).

Use management commands when possible

  • e.g. yourapp/management/commands/createsuperuser.py
  • e.g. yourapp/management/commands/activateinbulk.py

if you have business logic, you can separate it out

  • django.contrib.auth uses backends, just like db has a backend…etc.
  • add a setting for your business logic (e.g. AUTHENTICATION_BACKENDS)
  • you could use django.contrib.auth.backends.RemoteUserBackend
  • you could use yourapp.backends.remote_api.RemoteUserBackend
  • you could use yourapp.backends.memcached.RemoteUserBackend
  • delegate the difficult business logic to the backend
  • make sure to set the expectation right on the input/output.
  • changing business logic is as simple as changing a setting :)

backend example:

class User(db.Models):
    def get_present_name(self): 
        # property became not deterministic in terms of database
        # data is taken from another service by api
        return remote_api.request_user_name(self.uid) or 'Anonymous' 

could become:

class User(db.Models):
   def get_present_name(self):
      for backend in get_backends():
         try:
            return backend.get_present_name(self)
         except: # make pylint happy.
            pass
      return None

more about design patterns

more about interface boundaries

  • Is the code you want to use really part of the models? -> yourapp.models
  • Is the code part of business logic? -> yourapp.vendor
  • Is the code part of generic tools / libs? -> yourapp.libs
  • Is the code part of business logic libs? -> yourapp.libs.vendor or yourapp.vendor.libs
  • Here is a good one: can you test your code independently?
    • yes, good :)
    • no, you may have an interface problem
    • when there is clear separation, unittest should be a breeze with the use of mocking
  • Is the separation logical?
    • yes, good :)
    • no, you may have trouble testing those logical concepts separately.
  • Do you think you will need to refactor when you get 10x more code?
    • yes, no good, no bueno, refactor could be a lot of work
    • no, that’s just awesome!

In short, you could have

  • yourapp/core/backends.py
  • yourapp/core/models/__init__.py
  • yourapp/core/models/users.py
  • yourapp/core/models/questions.py
  • yourapp/core/backends.py
  • yourapp/core/forms.py
  • yourapp/core/handlers.py
  • yourapp/core/management/commands/__init__.py
  • yourapp/core/management/commands/closepolls.py
  • yourapp/core/management/commands/removeduplicates.py
  • yourapp/core/middleware.py
  • yourapp/core/signals.py
  • yourapp/core/templatetags/__init__.py
  • yourapp/core/templatetags/polls_extras.py
  • yourapp/core/views/__init__.py
  • yourapp/core/views/users.py
  • yourapp/core/views/questions.py
  • yourapp/core/signals.py
  • yourapp/lib/utils.py
  • yourapp/lib/textanalysis.py
  • yourapp/lib/ratings.py
  • yourapp/vendor/backends.py
  • yourapp/vendor/morebusinesslogic.py
  • yourapp/vendor/handlers.py
  • yourapp/vendor/middleware.py
  • yourapp/vendor/signals.py
  • yourapp/tests/test_polls.py
  • yourapp/tests/test_questions.py
  • yourapp/tests/test_duplicates.py
  • yourapp/tests/test_ratings.py

or anything else that helps you; finding the interfaces you need and the boundaries will help you.


回答 3

Django使用了一种稍微修改的MVC。Django中没有“控制器”的概念。最接近的代理是“视图”,它倾向于与MVC转换混淆,因为在MVC中,视图更像Django的“模板”。

在Django中,“模型”不仅是数据库抽象。在某些方面,它与Django作为MVC的控制器的“视图”共同承担责任。它包含与实例相关联的全部行为。如果该实例需要与外部API交互作为其行为的一部分,那么那仍然是模型代码。实际上,根本不需要模型与数据库进行交互,因此可以想象,模型完全作为外部API的交互层存在。它是“模型”的更自由的概念。

Django employs a slightly modified kind of MVC. There’s no concept of a “controller” in Django. The closest proxy is a “view”, which tends to cause confusion with MVC converts because in MVC a view is more like Django’s “template”.

In Django, a “model” is not merely a database abstraction. In some respects, it shares duty with the Django’s “view” as the controller of MVC. It holds the entirety of behavior associated with an instance. If that instance needs to interact with an external API as part of it’s behavior, then that’s still model code. In fact, models aren’t required to interact with the database at all, so you could conceivable have models that entirely exist as an interactive layer to an external API. It’s a much more free concept of a “model”.


回答 4

正如Chris Pratt所说,在Django中,MVC结构不同于其他框架中使用的经典MVC模型,我认为这样做的主要原因是避免过于严格的应用程序结构,就像在其他MVC框架(如CakePHP)中那样。

在Django中,MVC是通过以下方式实现的:

视图层分为两部分。该视图仅应用于管理HTTP请求,它们将被调用并对其进行响应。视图与应用程序的其余部分(表单,模型表单,自定义类,在简单情况下直接与模型)进行通信。要创建界面,我们使用模板。模板就像Django的字符串一样,它将一个上下文映射到其中,并且该上下文由应用程序传达给视图(当视图询问时)。

模型层提供封装,抽象,验证,智能,并使您的数据面向对象(他们说有朝一日DBMS也将面向对象)。这并不意味着您应该制作巨大的models.py文件(实际上,一个很好的建议是将模型分成不同的文件,将它们放入名为“ models”的文件夹中,在其中创建一个“ __init__.py”文件导入所有模型并最终使用models.Model类的属性“ app_label”的文件夹)。模型应该使您摆脱对数据的操作,这将使您的应用程序更简单。如果需要,您还应该为模型创建外部类,例如“工具”。您还可以在模型中使用继承,将模型的Meta类的“抽象”属性设置为“真”。

其余在哪里?好吧,小型Web应用程序通常是数据的一种接口,在某些小型程序中,使用视图查询或插入数据就足够了。更常见的情况是使用Forms或ModelForms,它们实际上是“控制器”。这不是解决一个常见问题的实用方法,而且是非常快速的方法。这就是网站要做的事情。

如果Forms不适合您,那么您应该创建自己的类来解决问题,一个很好的例子是管理应用程序:您可以阅读ModelAmin代码,它实际上可以用作控制器。没有标准的结构,我建议您检查现有的Django应用程序,具体取决于每种情况。这就是Django开发人员的意图,您可以添加xml解析器类,API连接器类,添加Celery来执行任务,为基于反应堆的应用程序而扭曲,仅使用ORM,制作Web服务,修改管理应用程序等等。 ..您有责任制作高质量的代码,无论是否尊重MVC哲学,使其基于模块并创建自己的抽象层。非常灵活。

我的建议是:尽可能多地阅读代码,周围有很多django应用程序,但是不要那么认真地对待它们。每种情况都是不同的,模式和理论会有所帮助,但并非总是如此,这不是很精确,django只是为您提供了一些有用的工具,您可以使用这些工具来减轻一些麻烦(例如管理界面,Web表单验证,i18n,观察者模式实施,所有以及之前提到的内容和其他内容),但是好的设计来自经验丰富的设计师。

PS .:使用auth应用程序中的“ User”类(来自标准django),您可以创建用户个人资料,或者至少读取其代码,这对您的情况很有用。

In Django, MVC structure is as Chris Pratt said, different from classical MVC model used in other frameworks, I think the main reason for doing this is avoiding a too strict application structure, like happens in others MVC frameworks like CakePHP.

In Django, MVC was implemented in the following way:

View layer is splitted in two. The views should be used only to manage HTTP requests, they are called and respond to them. Views communicate with the rest of your application (forms, modelforms, custom classes, of in simple cases directly with models). To create the interface we use Templates. Templates are string-like to Django, it maps a context into them, and this context was communicated to the view by the application (when view asks).

Model layer gives encapsulation, abstraction, validation, intelligence and makes your data object-oriented (they say someday DBMS will also). This doesn’t means that you should make huge models.py files (in fact a very good advice is to split your models in different files, put them into a folder called ‘models’, make an ‘__init__.py’ file into this folder where you import all your models and finally use the attribute ‘app_label’ of models.Model class). Model should abstract you from operating with data, it will make your application simpler. You should also, if required, create external classes, like “tools” for your models.You can also use heritage in models, setting the ‘abstract’ attribute of your model’s Meta class to ‘True’.

Where is the rest? Well, small web applications generally are a sort of an interface to data, in some small program cases using views to query or insert data would be enough. More common cases will use Forms or ModelForms, which are actually “controllers”. This is not other than a practical solution to a common problem, and a very fast one. It’s what a website use to do.

If Forms are not enogh for you, then you should create your own classes to do the magic, a very good example of this is admin application: you can read ModelAmin code, this actually works as a controller. There is not a standard structure, I suggest you to examine existing Django apps, it depends on each case. This is what Django developers intended, you can add xml parser class, an API connector class, add Celery for performing tasks, twisted for a reactor-based application, use only the ORM, make a web service, modify the admin application and more… It’s your responsability to make good quality code, respect MVC philosophy or not, make it module based and creating your own abstraction layers. It’s very flexible.

My advice: read as much code as you can, there are lots of django applications around, but don’t take them so seriously. Each case is different, patterns and theory helps, but not always, this is an imprecise cience, django just provide you good tools that you can use to aliviate some pains (like admin interface, web form validation, i18n, observer pattern implementation, all the previously mentioned and others), but good designs come from experienced designers.

PS.: use ‘User’ class from auth application (from standard django), you can make for example user profiles, or at least read its code, it will be useful for your case.


回答 5

一个古老的问题,但是我还是想提供我的解决方案。基于接受,模型对象也需要一些其他功能,而将它们放置在models.py中很尴尬。可以根据个人喜好单独编写繁琐的业务逻辑,但是我至少喜欢该模型来完成与自身相关的所有事情。该解决方案还为那些喜欢将所有逻辑放在模型中的人提供支持。

因此,我设计了一种hack,使我可以将逻辑与模型定义分开,并且仍然可以从IDE中获得所有提示。

优点应该很明显,但这列出了我观察到的一些优点:

  • 数据库定义仅保留了这一点-没有附加逻辑“垃圾”
  • 与模型相关的逻辑都整齐地放在一个地方
  • 所有服务(表单,REST,视图)都具有单个逻辑访问点
  • 最棒的是:一旦意识到我的models.py变得过于混乱并且不必将逻辑分开,就不必重写任何代码。分离是平滑且迭代的:我可以一次执行一个函数,也可以一次执行整个类或整个model.py。

我一直在Python 3.4和更高版本以及Django 1.8和更高版本上使用它。

app / models.py

....
from app.logic.user import UserLogic

class User(models.Model, UserLogic):
    field1 = models.AnyField(....)
    ... field definitions ...

app / logic / user.py

if False:
    # This allows the IDE to know about the User model and its member fields
    from main.models import User

class UserLogic(object):
    def logic_function(self: 'User'):
        ... code with hinting working normally ...

我唯一不知道的是如何使我的IDE(在本例中为PyCharm)识别UserLogic实际上是用户模型。但是由于这显然是黑客,所以我很高兴接受总是为self参数指定类型的小小的麻烦。

An old question, but I’d like to offer my solution anyway. It’s based on acceptance that model objects too require some additional functionality while it’s awkward to place it within the models.py. Heavy business logic may be written separately depending on personal taste, but I at least like the model to do everything related to itself. This solution also supports those who like to have all the logic placed within models themselves.

As such, I devised a hack that allows me to separate logic from model definitions and still get all the hinting from my IDE.

The advantages should be obvious, but this lists a few that I have observed:

  • DB definitions remain just that – no logic “garbage” attached
  • Model-related logic is all placed neatly in one place
  • All the services (forms, REST, views) have a single access point to logic
  • Best of all: I did not have to rewrite any code once I realised that my models.py became too cluttered and had to separate the logic away. The separation is smooth and iterative: I could do a function at a time or entire class or the entire models.py.

I have been using this with Python 3.4 and greater and Django 1.8 and greater.

app/models.py

....
from app.logic.user import UserLogic

class User(models.Model, UserLogic):
    field1 = models.AnyField(....)
    ... field definitions ...

app/logic/user.py

if False:
    # This allows the IDE to know about the User model and its member fields
    from main.models import User

class UserLogic(object):
    def logic_function(self: 'User'):
        ... code with hinting working normally ...

The only thing I can’t figure out is how to make my IDE (PyCharm in this case) recognise that UserLogic is actually User model. But since this is obviously a hack, I’m quite happy to accept the little nuisance of always specifying type for self parameter.


回答 6

我必须同意你的看法。Django有很多可能性,但是最好的起点是回顾Django的设计理念

  1. 从模型属性调用API是不理想的,似乎在视图中执行这样的事情并可能创建一个服务层来保持干燥是更有意义的。如果对API的调用是非阻塞的,并且调用成本很高,则可以将请求发送给服务工作者(从队列中消耗资源的工作者)。

  2. 按照Django的设计理念,模型封装了“对象”的各个方面。因此,与该对象相关的所有业务逻辑都应存在于此:

包括所有相关领域逻辑

模型应遵循Martin Fowler的Active Record设计模式来封装“对象”的各个方面。

  1. 您描述的副作用是显而易见的,这里的逻辑可以更好地分解为Querysets和manager。这是一个例子:

    models.py

    import datetime
    
    from djongo import models
    from django.db.models.query import QuerySet
    from django.contrib import admin
    from django.db import transaction
    
    
    class MyUser(models.Model):
    
        present_name = models.TextField(null=False, blank=True)
        status = models.TextField(null=False, blank=True)
        last_active = models.DateTimeField(auto_now=True, editable=False)
    
        # As mentioned you could put this in a template tag to pull it
        # from cache there. Depending on how it is used, it could be
        # retrieved from within the admin view or from a custom view
        # if that is the only place you will use it.
        #def get_present_name(self):
        #    # property became non-deterministic in terms of database
        #    # data is taken from another service by api
        #    return remote_api.request_user_name(self.uid) or 'Anonymous'
    
        # Moved to admin as an action
        # def activate(self):
        #     # method now has a side effect (send message to user)
        #     self.status = 'activated'
        #     self.save()
        #     # send email via email service
        #     #send_mail('Your account is activated!', '…', [self.email])
    
        class Meta:
            ordering = ['-id']  # Needed for DRF pagination
    
        def __unicode__(self):
            return '{}'.format(self.pk)
    
    
    class MyUserRegistrationQuerySet(QuerySet):
    
        def for_inactive_users(self):
            new_date = datetime.datetime.now() - datetime.timedelta(days=3*365)  # 3 Years ago
            return self.filter(last_active__lte=new_date.year)
    
        def by_user_id(self, user_ids):
            return self.filter(id__in=user_ids)
    
    
    class MyUserRegistrationManager(models.Manager):
    
        def get_query_set(self):
            return MyUserRegistrationQuerySet(self.model, using=self._db)
    
        def with_no_activity(self):
            return self.get_query_set().for_inactive_users()

    管理员

    # Then in model admin
    
    class MyUserRegistrationAdmin(admin.ModelAdmin):
        actions = (
            'send_welcome_emails',
        )
    
        def send_activate_emails(self, request, queryset):
            rows_affected = 0
            for obj in queryset:
                with transaction.commit_on_success():
                    # send_email('welcome_email', request, obj) # send email via email service
                    obj.status = 'activated'
                    obj.save()
                    rows_affected += 1
    
            self.message_user(request, 'sent %d' % rows_affected)
    
    admin.site.register(MyUser, MyUserRegistrationAdmin)

I would have to agree with you. There are a lot of possibilities in django but best place to start is reviewing Django’s design philosophy.

  1. Calling an API from a model property would not be ideal, it seems like it would make more sense to do something like this in the view and possibly create a service layer to keep things dry. If the call to the API is non-blocking and the call is an expensive one, sending the request to a service worker (a worker that consumes from a queue) might make sense.

  2. As per Django’s design philosophy models encapsulate every aspect of an “object”. So all business logic related to that object should live there:

Include all relevant domain logic

Models should encapsulate every aspect of an “object,” following Martin Fowler’s Active Record design pattern.

  1. The side effects you describe are apparent, the logic here could be better broken down into Querysets and managers. Here is an example:

    models.py

    import datetime
    
    from djongo import models
    from django.db.models.query import QuerySet
    from django.contrib import admin
    from django.db import transaction
    
    
    class MyUser(models.Model):
    
        present_name = models.TextField(null=False, blank=True)
        status = models.TextField(null=False, blank=True)
        last_active = models.DateTimeField(auto_now=True, editable=False)
    
        # As mentioned you could put this in a template tag to pull it
        # from cache there. Depending on how it is used, it could be
        # retrieved from within the admin view or from a custom view
        # if that is the only place you will use it.
        #def get_present_name(self):
        #    # property became non-deterministic in terms of database
        #    # data is taken from another service by api
        #    return remote_api.request_user_name(self.uid) or 'Anonymous'
    
        # Moved to admin as an action
        # def activate(self):
        #     # method now has a side effect (send message to user)
        #     self.status = 'activated'
        #     self.save()
        #     # send email via email service
        #     #send_mail('Your account is activated!', '…', [self.email])
    
        class Meta:
            ordering = ['-id']  # Needed for DRF pagination
    
        def __unicode__(self):
            return '{}'.format(self.pk)
    
    
    class MyUserRegistrationQuerySet(QuerySet):
    
        def for_inactive_users(self):
            new_date = datetime.datetime.now() - datetime.timedelta(days=3*365)  # 3 Years ago
            return self.filter(last_active__lte=new_date.year)
    
        def by_user_id(self, user_ids):
            return self.filter(id__in=user_ids)
    
    
    class MyUserRegistrationManager(models.Manager):
    
        def get_query_set(self):
            return MyUserRegistrationQuerySet(self.model, using=self._db)
    
        def with_no_activity(self):
            return self.get_query_set().for_inactive_users()
    

    admin.py

    # Then in model admin
    
    class MyUserRegistrationAdmin(admin.ModelAdmin):
        actions = (
            'send_welcome_emails',
        )
    
        def send_activate_emails(self, request, queryset):
            rows_affected = 0
            for obj in queryset:
                with transaction.commit_on_success():
                    # send_email('welcome_email', request, obj) # send email via email service
                    obj.status = 'activated'
                    obj.save()
                    rows_affected += 1
    
            self.message_user(request, 'sent %d' % rows_affected)
    
    admin.site.register(MyUser, MyUserRegistrationAdmin)
    

回答 7

我大多同意选择的答案(https://stackoverflow.com/a/12857584/871392),但想在“进行查询”部分中添加选项。

可以为模型定义QuerySet类,以进行过滤器查询等。之后,您可以将此查询集类代理给模型的管理器,就像内置管理器和QuerySet类一样。

虽然,如果必须查询多个数据模型以获得一个域模型,对我来说,将其放在像以前建议的那样的单独模块中似乎更合理。

I’m mostly agree with chosen answer (https://stackoverflow.com/a/12857584/871392), but want to add option in Making Queries section.

One can define QuerySet classes for models for make filter queries and so on. After that you can proxy this queryset class for model’s manager, like build-in Manager and QuerySet classes do.

Although, if you had to query several data models to get one domain model, it seems more reasonable to me to put this in separate module like suggested before.


回答 8

关于优缺点的不同选择的最全面的文章:

  1. 想法1:胖模型
  2. 想法2:将业务逻辑放入视图/表单
  3. 理念3:服务
  4. 理念4:QuerySet / Manager
  5. 结论

资料来源:https : //sunscrapers.com/blog/where-to-put-business-logic-django/

Most comprehensive article on the different options with pros and cons:

  1. Idea #1: Fat Models
  2. Idea #2: Putting Business Logic in Views/Forms
  3. Idea #3: Services
  4. Idea #4: QuerySets/Managers
  5. Conclusion

Source: https://sunscrapers.com/blog/where-to-put-business-logic-django/


回答 9

Django设计用于轻松交付网页。如果您对此不满意,则应该使用其他解决方案。

我在模型上写根或通用操作(具有相同的接口),在模型控制器上写其他操作。如果需要其他模型的操作,请导入其控制器。

这种方法对我和应用程序的复杂性已经足够。

Hedde的回应是一个示例,展示了django和python本身的灵活性。

无论如何,这是一个非常有趣的问题!

Django is designed to be easely used to deliver web pages. If you are not confortable with this perhaps you should use another solution.

I’m writting the root or common operations on the model (to have the same interface) and the others on the controller of the model. If I need an operation from other model I import its controller.

This approach it’s enough for me and the complexity of my applications.

Hedde’s response is an example that shows the flexibility of django and python itself.

Very interesting question anyway!


为什么“ [False,True]中的not(True)”返回False?

问题:为什么“ [False,True]中的not(True)”返回False?

如果我这样做:

>>> False in [False, True]
True

那又回来了True。仅仅是因为False在列表中。

但是如果我这样做:

>>> not(True) in [False, True]
False

那又回来了False。而not(True)等于False

>>> not(True)
False

为什么?

If I do this:

>>> False in [False, True]
True

That returns True. Simply because False is in the list.

But if I do:

>>> not(True) in [False, True]
False

That returns False. Whereas not(True) is equal to False:

>>> not(True)
False

Why?


回答 0

运算符优先级 2.x3.x。的优先级not低于的优先级in。因此,它等效于:

>>> not ((True) in [False, True])
False

这就是你想要的:

>>> (not True) in [False, True]
True

正如@Ben指出的:建议从不写not(True),而不是not True。前者使它看起来像一个函数调用,而它却not是一个运算符,而不是一个函数。

Operator precedence 2.x, 3.x. The precedence of not is lower than that of in. So it is equivalent to:

>>> not ((True) in [False, True])
False

This is what you want:

>>> (not True) in [False, True]
True

As @Ben points out: It’s recommended to never write not(True), prefer not True. The former makes it look like a function call, while not is an operator, not a function.


回答 1

not x in y 被评估为 x not in y

您可以通过反汇编代码来确切了解正在发生的事情。第一种情况按您的预期工作:

>>> x = lambda: False in [False, True]
>>> dis.dis(x)
  1           0 LOAD_GLOBAL              0 (False)
              3 LOAD_GLOBAL              0 (False)
              6 LOAD_GLOBAL              1 (True)
              9 BUILD_LIST               2
             12 COMPARE_OP               6 (in)
             15 RETURN_VALUE

第二种情况计算为True not in [False, True]False显然是:

>>> x = lambda: not(True) in [False, True]
>>> dis.dis(x)
  1           0 LOAD_GLOBAL              0 (True)
              3 LOAD_GLOBAL              1 (False)
              6 LOAD_GLOBAL              0 (True)
              9 BUILD_LIST               2
             12 COMPARE_OP               7 (not in)
             15 RETURN_VALUE        
>>> 

您想要表达的是(not(True)) in [False, True],正如True您所期望的那样,您可以看到原因:

>>> x = lambda: (not(True)) in [False, True]
>>> dis.dis(x)
  1           0 LOAD_GLOBAL              0 (True)
              3 UNARY_NOT           
              4 LOAD_GLOBAL              1 (False)
              7 LOAD_GLOBAL              0 (True)
             10 BUILD_LIST               2
             13 COMPARE_OP               6 (in)
             16 RETURN_VALUE        

not x in y is evaluated as x not in y

You can see exactly what’s happening by disassembling the code. The first case works as you expect:

>>> x = lambda: False in [False, True]
>>> dis.dis(x)
  1           0 LOAD_GLOBAL              0 (False)
              3 LOAD_GLOBAL              0 (False)
              6 LOAD_GLOBAL              1 (True)
              9 BUILD_LIST               2
             12 COMPARE_OP               6 (in)
             15 RETURN_VALUE

The second case, evaluates to True not in [False, True], which is False clearly:

>>> x = lambda: not(True) in [False, True]
>>> dis.dis(x)
  1           0 LOAD_GLOBAL              0 (True)
              3 LOAD_GLOBAL              1 (False)
              6 LOAD_GLOBAL              0 (True)
              9 BUILD_LIST               2
             12 COMPARE_OP               7 (not in)
             15 RETURN_VALUE        
>>> 

What you wanted to express instead was (not(True)) in [False, True], which as expected is True, and you can see why:

>>> x = lambda: (not(True)) in [False, True]
>>> dis.dis(x)
  1           0 LOAD_GLOBAL              0 (True)
              3 UNARY_NOT           
              4 LOAD_GLOBAL              1 (False)
              7 LOAD_GLOBAL              0 (True)
             10 BUILD_LIST               2
             13 COMPARE_OP               6 (in)
             16 RETURN_VALUE        

回答 2

运算符优先级。in绑定比紧密not,因此您的表达式等效于not((True) in [False, True])

Operator precedence. in binds more tightly than not, so your expression is equivalent to not((True) in [False, True]).


回答 3

一切都与运算符优先级有关in比强大not)。但是可以通过在适当的位置添加括号来轻松地纠正它:

(not(True)) in [False, True]  # prints true

写作:

not(True) in [False, True]

就像这样:

not((True) in [False, True])

它看起来是否True在列表中,并返回结果的“ not”。

It’s all about operator precedence (in is stronger than not). But it can be easily corrected by adding parentheses at the right place:

(not(True)) in [False, True]  # prints true

writing:

not(True) in [False, True]

is the same like:

not((True) in [False, True])

which looks if True is in the list and returns the “not” of the result.


回答 4

它的计算结果为not True in [False, True]False由于True位于中而返回[False, True]

如果你试试

>>>(not(True)) in [False, True]
True

您会得到预期的结果。

It is evaluating as not True in [False, True], which returns False because True is in [False, True]

If you try

>>>(not(True)) in [False, True]
True

You get the expected result.


回答 5

除了提到优先级not低于的其他答案in,实际上您的陈述还等同于:

not (True in [False, True])

但请注意,如果您不将条件与其他条件分开,则python将使用2个角色(precedencechaining)将其分开,在这种情况下,python使用优先级。另外,请注意,如果要分隔条件,则需要将所有条件放在括号中,而不仅是对象或值:

(not True) in [False, True]

但是如前所述,python对运算符进行了另一种修改,即链接

基于python 文档

请注意,比较,成员资格测试和身份测试均具有相同的优先级,并且具有“比较”部分中所述的从左到右的链接功能。

例如,以下语句的结果是False

>>> True == False in [False, True]
False

因为python将像下面这样链接语句:

(True == False) and (False in [False, True])

确切的False and TrueFalse

您可以假定中心对象将在2个操作和其他对象之间共享(在这种情况下为False)。

并注意,对于所有比较,包括隶属度测试和身份测试操作(其后为操作数),它也适用:

in, not in, is, is not, <, <=, >, >=, !=, ==

范例:

>>> 1 in [1,2] == True
False

另一个著名的例子是数字范围:

7<x<20

等于:

7<x and x<20   

Alongside the other answers that mentioned the precedence of not is lower than in, actually your statement is equivalent to :

not (True in [False, True])

But note that if you don’t separate your condition from the other ones, python will use 2 roles (precedence or chaining) in order to separate that, and in this case python used precedence. Also, note that if you want to separate a condition you need to put all the condition in parenthesis not just the object or value :

(not True) in [False, True]

But as mentioned, there is another modification by python on operators that is chaining:

Based on python documentation :

Note that comparisons, membership tests, and identity tests, all have the same precedence and have a left-to-right chaining feature as described in the Comparisons section.

For example the result of following statement is False:

>>> True == False in [False, True]
False

Because python will chain the statements like following :

(True == False) and (False in [False, True])

Which exactly is False and True that is False.

You can assume that the central object will be shared between 2 operations and other objects (False in this case).

And note that its also true for all Comparisons, including membership tests and identity tests operations which are following operands :

in, not in, is, is not, <, <=, >, >=, !=, ==

Example :

>>> 1 in [1,2] == True
False

Another famous example is number range :

7<x<20

which is equal to :

7<x and x<20   

回答 6

让我们将其视为集合包含检查操作:[False, True]是包含一些元素的列表。

表达式True in [False, True]返回True,就像True列表中包含的元素一样。

因此,not True in [False, True]给出not上述表达式的“布尔相反” (没有任何括号可保留优先级,因为in优先级大于not运算符)。因此,not True将导致False

在另一方面,(not True) in [False, True]是等于False in [False, True],这是TrueFalse被包含在列表中)。

Let’s see it as a collection containment checking operation: [False, True] is a list containing some elements.

The expression True in [False, True] returns True, as True is an element contained in the list.

Therefore, not True in [False, True] gives the “boolean opposite”, not result of the above expression (without any parentheses to preserve precedence, as in has greater precedence than not operator). Therefore, not True will result False.

On the other hand, (not True) in [False, True], is equal to False in [False, True], which is True (False is contained in the list).


回答 7

为了阐明其他一些答案,一元运算符添加括号不会更改其优先级。not(True)不会使not绑定更紧密True。这只是一个多余的括号True。与大致相同(True) in [True, False]。括号不做任何事情。如果要使绑定更紧密,则必须在整个表达式两边加上括号,这意味着运算符和操作数(即)(not True) in [True, False]

要以另一种方式查看,请考虑

>>> -2**2
-4

**绑定比紧密-,这就是为什么要得到两个平方的负数,而不是两个负数的平方(也就是正四个)的原因。

如果您确实想要负二的平方怎么办?显然,您需要添加括号:

>>> (-2)**2
4

但是,期望以下内容不合理 4

>>> -(2)**2
-4

因为-(2)和一样-2。括号绝对不起作用。not(True)完全一样

To clarify on some of the other answers, adding parentheses after a unary operator does not change its precedence. not(True) does not make not bind more tightly to True. It’s just a redundant set of parentheses around True. It’s much the same as (True) in [True, False]. The parentheses don’t do anything. If you want the binding to be more tight, you have to put the parentheses around the whole expression, meaning both the operator and the operand, i.e., (not True) in [True, False].

To see this another way, consider

>>> -2**2
-4

** binds more tightly than -, which is why you get the negative of two squared, not the square of negative two (which would be positive four).

What if you did want the square of negative two? Obviously, you’d add parentheses:

>>> (-2)**2
4

However, it’s not reasonable to expect the following to give 4

>>> -(2)**2
-4

because -(2) is the same as -2. The parentheses do absolutely nothing. not(True) is exactly the same.


如何获取当前正在执行的文件的路径和名称?

问题:如何获取当前正在执行的文件的路径和名称?

我有调用其他脚本文件的脚本,但是我需要获取该进程中当前正在运行的文件的文件路径。

例如,假设我有三个文件。使用execfile

  • script_1.py来电script_2.py
  • 依次script_2.py调用script_3.py

我怎样才能获得的文件名和路径script_3.py从内部代码script_3.py,而无需从传递这些信息作为参数script_2.py

(执行os.getcwd()将返回原始启动脚本的文件路径,而不是当前文件的路径。)

I have scripts calling other script files but I need to get the filepath of the file that is currently running within the process.

For example, let’s say I have three files. Using execfile:

  • script_1.py calls script_2.py.
  • In turn, script_2.py calls script_3.py.

How can I get the file name and path of script_3.py, from code within script_3.py, without having to pass that information as arguments from script_2.py?

(Executing os.getcwd() returns the original starting script’s filepath not the current file’s.)


回答 0

p1.py:

execfile("p2.py")

p2.py:

import inspect, os
print (inspect.getfile(inspect.currentframe()) # script filename (usually with path)
print (os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))) # script directory

p1.py:

execfile("p2.py")

p2.py:

import inspect, os
print (inspect.getfile(inspect.currentframe()) # script filename (usually with path)
print (os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))) # script directory

回答 1

__file__

正如其他人所说。您可能还想使用os.path.realpath消除符号链接:

import os

os.path.realpath(__file__)
__file__

as others have said. You may also want to use os.path.realpath to eliminate symlinks:

import os

os.path.realpath(__file__)

回答 2

更新2018-11-28:

以下是使用Python 2和3进行实验的摘要。

main.py-运行foo.py
foo.py-运行lib / bar.py
lib / bar.py-打印文件路径表达式

| Python | Run statement       | Filepath expression                    |
|--------+---------------------+----------------------------------------|
|      2 | execfile            | os.path.abspath(inspect.stack()[0][1]) |
|      2 | from lib import bar | __file__                               |
|      3 | exec                | (wasn't able to obtain it)             |
|      3 | import lib.bar      | __file__                               |

对于Python 2,切换到软件包以便可以使用更为清晰from lib import bar-只需将空__init__.py文件添加到两个文件夹中即可。

对于Python 3,execfile不存在-最接近的替代方法是exec(open(<filename>).read()),尽管这会影响堆栈框架。使用起来最简单,import foo而且import lib.bar-无需__init__.py文件。

另请参见import和execfile之间的区别


原始答案:

这是基于该线程答案的实验-Windows上的Python 2.7.10。

基于堆栈的堆栈似乎是唯一可以提供可靠结果的堆栈。后两个语法最短,即-

print os.path.abspath(inspect.stack()[0][1])                   # C:\filepaths\lib\bar.py
print os.path.dirname(os.path.abspath(inspect.stack()[0][1]))  # C:\filepaths\lib

这些是作为功​​能添加到sys中的!归功于@Usagi和@pablog

基于以下三个文件,并从其文件夹运行main.py python main.py(也尝试了具有绝对路径并从另一个文件夹调用的execfile)。

C:\ filepaths \ main.py:execfile('foo.py')
C:\ filepaths \ foo.py:execfile('lib/bar.py')
C:\ filepaths \ lib \ bar.py:

import sys
import os
import inspect

print "Python " + sys.version
print

print __file__                                        # main.py
print sys.argv[0]                                     # main.py
print inspect.stack()[0][1]                           # lib/bar.py
print sys.path[0]                                     # C:\filepaths
print

print os.path.realpath(__file__)                      # C:\filepaths\main.py
print os.path.abspath(__file__)                       # C:\filepaths\main.py
print os.path.basename(__file__)                      # main.py
print os.path.basename(os.path.realpath(sys.argv[0])) # main.py
print

print sys.path[0]                                     # C:\filepaths
print os.path.abspath(os.path.split(sys.argv[0])[0])  # C:\filepaths
print os.path.dirname(os.path.abspath(__file__))      # C:\filepaths
print os.path.dirname(os.path.realpath(sys.argv[0]))  # C:\filepaths
print os.path.dirname(__file__)                       # (empty string)
print

print inspect.getfile(inspect.currentframe())         # lib/bar.py

print os.path.abspath(inspect.getfile(inspect.currentframe())) # C:\filepaths\lib\bar.py
print os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe()))) # C:\filepaths\lib
print

print os.path.abspath(inspect.stack()[0][1])          # C:\filepaths\lib\bar.py
print os.path.dirname(os.path.abspath(inspect.stack()[0][1]))  # C:\filepaths\lib
print

Update 2018-11-28:

Here is a summary of experiments with Python 2 and 3. With

main.py – runs foo.py
foo.py – runs lib/bar.py
lib/bar.py – prints filepath expressions

| Python | Run statement       | Filepath expression                    |
|--------+---------------------+----------------------------------------|
|      2 | execfile            | os.path.abspath(inspect.stack()[0][1]) |
|      2 | from lib import bar | __file__                               |
|      3 | exec                | (wasn't able to obtain it)             |
|      3 | import lib.bar      | __file__                               |

For Python 2, it might be clearer to switch to packages so can use from lib import bar – just add empty __init__.py files to the two folders.

For Python 3, execfile doesn’t exist – the nearest alternative is exec(open(<filename>).read()), though this affects the stack frames. It’s simplest to just use import foo and import lib.bar – no __init__.py files needed.

See also Difference between import and execfile


Original Answer:

Here is an experiment based on the answers in this thread – with Python 2.7.10 on Windows.

The stack-based ones are the only ones that seem to give reliable results. The last two have the shortest syntax, i.e. –

print os.path.abspath(inspect.stack()[0][1])                   # C:\filepaths\lib\bar.py
print os.path.dirname(os.path.abspath(inspect.stack()[0][1]))  # C:\filepaths\lib

Here’s to these being added to sys as functions! Credit to @Usagi and @pablog

Based on the following three files, and running main.py from its folder with python main.py (also tried execfiles with absolute paths and calling from a separate folder).

C:\filepaths\main.py: execfile('foo.py')
C:\filepaths\foo.py: execfile('lib/bar.py')
C:\filepaths\lib\bar.py:

import sys
import os
import inspect

print "Python " + sys.version
print

print __file__                                        # main.py
print sys.argv[0]                                     # main.py
print inspect.stack()[0][1]                           # lib/bar.py
print sys.path[0]                                     # C:\filepaths
print

print os.path.realpath(__file__)                      # C:\filepaths\main.py
print os.path.abspath(__file__)                       # C:\filepaths\main.py
print os.path.basename(__file__)                      # main.py
print os.path.basename(os.path.realpath(sys.argv[0])) # main.py
print

print sys.path[0]                                     # C:\filepaths
print os.path.abspath(os.path.split(sys.argv[0])[0])  # C:\filepaths
print os.path.dirname(os.path.abspath(__file__))      # C:\filepaths
print os.path.dirname(os.path.realpath(sys.argv[0]))  # C:\filepaths
print os.path.dirname(__file__)                       # (empty string)
print

print inspect.getfile(inspect.currentframe())         # lib/bar.py

print os.path.abspath(inspect.getfile(inspect.currentframe())) # C:\filepaths\lib\bar.py
print os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe()))) # C:\filepaths\lib
print

print os.path.abspath(inspect.stack()[0][1])          # C:\filepaths\lib\bar.py
print os.path.dirname(os.path.abspath(inspect.stack()[0][1]))  # C:\filepaths\lib
print

回答 3

我认为这更干净:

import inspect
print inspect.stack()[0][1]

并获得与以下信息相同的信息:

print inspect.getfile(inspect.currentframe())

其中[0]是堆栈中的当前帧(堆栈的顶部),[1]是文件名,请增加以在堆栈中向后移动,即

print inspect.stack()[1][1]

将是调用当前框架的脚本的文件名。另外,使用[-1]将使您到达堆栈的底部,即原始调用脚本。

I think this is cleaner:

import inspect
print inspect.stack()[0][1]

and gets the same information as:

print inspect.getfile(inspect.currentframe())

Where [0] is the current frame in the stack (top of stack) and [1] is for the file name, increase to go backwards in the stack i.e.

print inspect.stack()[1][1]

would be the file name of the script that called the current frame. Also, using [-1] will get you to the bottom of the stack, the original calling script.


回答 4

import os
os.path.dirname(__file__) # relative directory path
os.path.abspath(__file__) # absolute file path
os.path.basename(__file__) # the file name only
import os
os.path.dirname(__file__) # relative directory path
os.path.abspath(__file__) # absolute file path
os.path.basename(__file__) # the file name only

回答 5

如果您的脚本仅包含一个文件,则标记为“最佳”的建议都是正确的。

如果要从可能作为模块导入的文件中找出可执行文件的名称(即,传递给当前程序的python解释器的根文件),则需要执行此操作(假设此文件位于文件中)名为foo.py):

import inspect

print inspect.stack()[-1][1]

因为[-1]堆栈上的最后一件事()是进入堆栈的第一件事(堆栈是LIFO / FILO数据结构)。

然后在文件bar.py中,如果您import foo将输出bar.py,而不是foo.py,它将是所有这些值:

  • __file__
  • inspect.getfile(inspect.currentframe())
  • inspect.stack()[0][1]

The suggestions marked as best are all true if your script consists of only one file.

If you want to find out the name of the executable (i.e. the root file passed to the python interpreter for the current program) from a file that may be imported as a module, you need to do this (let’s assume this is in a file named foo.py):

import inspect

print inspect.stack()[-1][1]

Because the last thing ([-1]) on the stack is the first thing that went into it (stacks are LIFO/FILO data structures).

Then in file bar.py if you import foo it’ll print bar.py, rather than foo.py, which would be the value of all of these:

  • __file__
  • inspect.getfile(inspect.currentframe())
  • inspect.stack()[0][1]

回答 6

import os
print os.path.basename(__file__)

这只会给我们文件名。即如果文件的绝对路径为c:\ abcd \ abc.py,则第二行将打印abc.py

import os
print os.path.basename(__file__)

this will give us the filename only. i.e. if abspath of file is c:\abcd\abc.py then 2nd line will print abc.py


回答 7

您还不清楚“进程中当前正在运行的文件的文件路径”是什么意思。 sys.argv[0]通常包含由Python解释器调用的脚本的位置。查看sys文档以获取更多详细信息。

正如@Tim和@Pat Notz指出的那样,__file__属性提供了对

从模块加载文件的文件(如果从文件加载的文件)

It’s not entirely clear what you mean by “the filepath of the file that is currently running within the process”. sys.argv[0] usually contains the location of the script that was invoked by the Python interpreter. Check the sys documentation for more details.

As @Tim and @Pat Notz have pointed out, the __file__ attribute provides access to

the file from which the module was loaded, if it was loaded from a file


回答 8

我有一个必须在Windows环境下工作的脚本。这段代码是我完成的:

import os,sys
PROJECT_PATH = os.path.abspath(os.path.split(sys.argv[0])[0])

这是一个不明智的决定。但这不需要外部库,这对我来说是最重要的。

I have a script that must work under windows environment. This code snipped is what I’ve finished with:

import os,sys
PROJECT_PATH = os.path.abspath(os.path.split(sys.argv[0])[0])

it’s quite a hacky decision. But it requires no external libraries and it’s the most important thing in my case.


回答 9

尝试这个,

import os
os.path.dirname(os.path.realpath(__file__))

Try this,

import os
os.path.dirname(os.path.realpath(__file__))

回答 10

import os
os.path.dirname(os.path.abspath(__file__))

无需检查或任何其他库。

当我必须导入脚本(从与执行脚本所在的目录不同的目录)导入脚本时,此方法对我有用,该脚本使用与导入脚本位于同一文件夹中的配置文件。

import os
os.path.dirname(os.path.abspath(__file__))

No need for inspect or any other library.

This worked for me when I had to import a script (from a different directory then the executed script), that used a configuration file residing in the same folder as the imported script.


回答 11

__file__属性适用于包含主要执行代码的文件以及导入的模块。

参见https://web.archive.org/web/20090918095828/http://pyref.infogami.com/__file__

The __file__ attribute works for both the file containing the main execution code as well as imported modules.

See https://web.archive.org/web/20090918095828/http://pyref.infogami.com/__file__


回答 12

import sys

print sys.path[0]

这将打印当前正在执行的脚本的路径

import sys

print sys.path[0]

this would print the path of the currently executing script


回答 13

我认为这__file__ 听起来像您可能还想签出inspect模块

I think it’s just __file__ Sounds like you may also want to checkout the inspect module.


回答 14

您可以使用 inspect.stack()

import inspect,os
inspect.stack()[0]  => (<frame object at 0x00AC2AC0>, 'g:\\Python\\Test\\_GetCurrentProgram.py', 15, '<module>', ['print inspect.stack()[0]\n'], 0)
os.path.abspath (inspect.stack()[0][1]) => 'g:\\Python\\Test\\_GetCurrentProgram.py'

You can use inspect.stack()

import inspect,os
inspect.stack()[0]  => (<frame object at 0x00AC2AC0>, 'g:\\Python\\Test\\_GetCurrentProgram.py', 15, '<module>', ['print inspect.stack()[0]\n'], 0)
os.path.abspath (inspect.stack()[0][1]) => 'g:\\Python\\Test\\_GetCurrentProgram.py'

回答 15

由于Python 3相当主流,因此我想提供一个pathlib答案,因为我认为它现在可能是访问文件和路径信息的更好工具。

from pathlib import Path

current_file: Path = Path(__file__).resolve()

如果要查找当前文件的目录,则只需添加.parent以下Path()语句即可:

current_path: Path = Path(__file__).parent.resolve()

Since Python 3 is fairly mainstream, I wanted to include a pathlib answer, as I believe that it is probably now a better tool for accessing file and path information.

from pathlib import Path

current_file: Path = Path(__file__).resolve()

If you are seeking the directory of the current file, it is as easy as adding .parent to the Path() statement:

current_path: Path = Path(__file__).parent.resolve()

回答 16

import sys
print sys.argv[0]
import sys
print sys.argv[0]

回答 17

这应该工作:

import os,sys
filename=os.path.basename(os.path.realpath(sys.argv[0]))
dirname=os.path.dirname(os.path.realpath(sys.argv[0]))

This should work:

import os,sys
filename=os.path.basename(os.path.realpath(sys.argv[0]))
dirname=os.path.dirname(os.path.realpath(sys.argv[0]))

回答 18

print(__file__)
print(__import__("pathlib").Path(__file__).parent)
print(__file__)
print(__import__("pathlib").Path(__file__).parent)

回答 19

获取执行脚本的目录

 print os.path.dirname( inspect.getfile(inspect.currentframe()))

To get directory of executing script

 print os.path.dirname( inspect.getfile(inspect.currentframe()))

回答 20

我一直只使用“当前工作目录”或CWD的os功能。这是标准库的一部分,非常容易实现。这是一个例子:

    import os
    base_directory = os.getcwd()

I have always just used the os feature of Current Working Directory, or CWD. This is part of the standard library, and is very easy to implement. Here is an example:

    import os
    base_directory = os.getcwd()

回答 21

我在__file__中使用了此方法,
os.path.abspath(__file__)
但是有一个小技巧,它在第一次运行代码时返回.py文件,下一次运行给出* .pyc文件的名称,
所以我使用:
inspect.getfile(inspect.currentframe())

sys._getframe().f_code.co_filename

I used the approach with __file__
os.path.abspath(__file__)
but there is a little trick, it returns the .py file when the code is run the first time, next runs give the name of *.pyc file
so I stayed with:
inspect.getfile(inspect.currentframe())
or
sys._getframe().f_code.co_filename


回答 22

我编写了一个函数,该函数考虑了Eclipse 调试器unittest。它返回您启动的第一个脚本的文件夹。您可以选择指定__file__ var,但主要的事情是您不必在所有调用层次结构中共享此变量

也许您可以处理其他我没看到的特殊情况,但是对我来说还可以。

import inspect, os
def getRootDirectory(_file_=None):
    """
    Get the directory of the root execution file
    Can help: http://stackoverflow.com/questions/50499/how-do-i-get-the-path-and-name-of-the-file-that-is-currently-executing
    For eclipse user with unittest or debugger, the function search for the correct folder in the stack
    You can pass __file__ (with 4 underscores) if you want the caller directory
    """
    # If we don't have the __file__ :
    if _file_ is None:
        # We get the last :
        rootFile = inspect.stack()[-1][1]
        folder = os.path.abspath(rootFile)
        # If we use unittest :
        if ("/pysrc" in folder) & ("org.python.pydev" in folder):
            previous = None
            # We search from left to right the case.py :
            for el in inspect.stack():
                currentFile = os.path.abspath(el[1])
                if ("unittest/case.py" in currentFile) | ("org.python.pydev" in currentFile):
                    break
                previous = currentFile
            folder = previous
        # We return the folder :
        return os.path.dirname(folder)
    else:
        # We return the folder according to specified __file__ :
        return os.path.dirname(os.path.realpath(_file_))

I wrote a function which take into account eclipse debugger and unittest. It return the folder of the first script you launch. You can optionally specify the __file__ var, but the main thing is that you don’t have to share this variable across all your calling hierarchy.

Maybe you can handle others stack particular cases I didn’t see, but for me it’s ok.

import inspect, os
def getRootDirectory(_file_=None):
    """
    Get the directory of the root execution file
    Can help: http://stackoverflow.com/questions/50499/how-do-i-get-the-path-and-name-of-the-file-that-is-currently-executing
    For eclipse user with unittest or debugger, the function search for the correct folder in the stack
    You can pass __file__ (with 4 underscores) if you want the caller directory
    """
    # If we don't have the __file__ :
    if _file_ is None:
        # We get the last :
        rootFile = inspect.stack()[-1][1]
        folder = os.path.abspath(rootFile)
        # If we use unittest :
        if ("/pysrc" in folder) & ("org.python.pydev" in folder):
            previous = None
            # We search from left to right the case.py :
            for el in inspect.stack():
                currentFile = os.path.abspath(el[1])
                if ("unittest/case.py" in currentFile) | ("org.python.pydev" in currentFile):
                    break
                previous = currentFile
            folder = previous
        # We return the folder :
        return os.path.dirname(folder)
    else:
        # We return the folder according to specified __file__ :
        return os.path.dirname(os.path.realpath(_file_))

回答 23

要保持跨平台(macOS / Windows / Linux)的迁移一致性,请尝试:

path = r'%s' % os.getcwd().replace('\\','/')

To keep the migration consistency across platforms (macOS/Windows/Linux), try:

path = r'%s' % os.getcwd().replace('\\','/')


回答 24

最简单的方法是:

script_1.py中:

import subprocess
subprocess.call(['python3',<path_to_script_2.py>])

script_2.py中:

sys.argv[0]

PS:我已经尝试过execfile,但是由于它以字符串形式读取script_2.py,所以sys.argv[0]返回了<string>

Simplest way is:

in script_1.py:

import subprocess
subprocess.call(['python3',<path_to_script_2.py>])

in script_2.py:

sys.argv[0]

P.S.: I’ve tried execfile, but since it reads script_2.py as a string, sys.argv[0] returned <string>.


回答 25

这是我所使用的,因此我可以将我的代码无处不在。__name__总是被定义,但是__file__仅当代码作为文件运行时才被定义(例如,不在IDLE / iPython中)。

    if '__file__' in globals():
        self_name = globals()['__file__']
    elif '__file__' in locals():
        self_name = locals()['__file__']
    else:
        self_name = __name__

或者,可以这样写:

self_name = globals().get('__file__', locals().get('__file__', __name__))

Here is what I use so I can throw my code anywhere without issue. __name__ is always defined, but __file__ is only defined when the code is run as a file (e.g. not in IDLE/iPython).

    if '__file__' in globals():
        self_name = globals()['__file__']
    elif '__file__' in locals():
        self_name = locals()['__file__']
    else:
        self_name = __name__

Alternatively, this can be written as:

self_name = globals().get('__file__', locals().get('__file__', __name__))

回答 26

这些答案大多数都是用Python 2.x或更早版本编写的。在Python 3.x中,print函数的语法已更改为需要括号,即print()。

因此,Python 2.x中来自user13993的较早的高分答案:

import inspect, os
print inspect.getfile(inspect.currentframe()) # script filename (usually with path)
print os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe()))) # script directory

在Python 3.x中成为:

import inspect, os
print(inspect.getfile(inspect.currentframe())) # script filename (usually with path)
print(os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe()))) ) # script directory

Most of these answers were written in Python version 2.x or earlier. In Python 3.x the syntax for the print function has changed to require parentheses, i.e. print().

So, this earlier high score answer from user13993 in Python 2.x:

import inspect, os
print inspect.getfile(inspect.currentframe()) # script filename (usually with path)
print os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe()))) # script directory

Becomes in Python 3.x:

import inspect, os
print(inspect.getfile(inspect.currentframe())) # script filename (usually with path)
print(os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe()))) ) # script directory

回答 27

如果您只想要文件名而没有,./或者.py可以尝试此操作

filename = testscript.py
file_name = __file__[2:-3]

file_name 将打印测试脚本,您可以通过更改[]中的索引来生成所需的任何内容

if you want just the filename without ./ or .py you can try this

filename = testscript.py
file_name = __file__[2:-3]

file_name will print testscript you can generate whatever you want by changing the index inside []


回答 28

import os

import wx


# return the full path of this file
print(os.getcwd())

icon = wx.Icon(os.getcwd() + '/img/image.png', wx.BITMAP_TYPE_PNG, 16, 16)

# put the icon on the frame
self.SetIcon(icon)
import os

import wx


# return the full path of this file
print(os.getcwd())

icon = wx.Icon(os.getcwd() + '/img/image.png', wx.BITMAP_TYPE_PNG, 16, 16)

# put the icon on the frame
self.SetIcon(icon)

Python断言的最佳实践

问题:Python断言的最佳实践

  1. assert作为标准代码的一部分而不是仅用于调试目的,是否存在性能或代码维护问题?

    assert x >= 0, 'x is less than zero'

    胜过或坏于

    if x < 0:
        raise Exception, 'x is less than zero'
  2. 另外,是否有任何方法可以设置业务规则,例如if x < 0 raise error始终不进行检查,try/except/finally因此在整个代码中的任何时候都x小于0时,都会引发错误,例如assert x < 0在函数的开始处,函数内的任何位置进行设置哪里x变得小于0引发异常?

  1. Is there a performance or code maintenance issue with using assert as part of the standard code instead of using it just for debugging purposes?

    Is

    assert x >= 0, 'x is less than zero'
    

    better or worse than

    if x < 0:
        raise Exception, 'x is less than zero'
    
  2. Also, is there any way to set a business rule like if x < 0 raise error that is always checked without the try/except/finally so, if at anytime throughout the code x is less than 0 an error is raised, like if you set assert x < 0 at the start of a function, anywhere within the function where x becomes less then 0 an exception is raised?


回答 0

为了能够在整个函数中x小于零时自动引发错误。您可以使用类描述符。这是一个例子:

class LessThanZeroException(Exception):
    pass

class variable(object):
    def __init__(self, value=0):
        self.__x = value

    def __set__(self, obj, value):
        if value < 0:
            raise LessThanZeroException('x is less than zero')

        self.__x  = value

    def __get__(self, obj, objType):
        return self.__x

class MyClass(object):
    x = variable()

>>> m = MyClass()
>>> m.x = 10
>>> m.x -= 20
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "my.py", line 7, in __set__
    raise LessThanZeroException('x is less than zero')
LessThanZeroException: x is less than zero

To be able to automatically throw an error when x become less than zero throughout the function. You can use class descriptors. Here is an example:

class LessThanZeroException(Exception):
    pass

class variable(object):
    def __init__(self, value=0):
        self.__x = value

    def __set__(self, obj, value):
        if value < 0:
            raise LessThanZeroException('x is less than zero')

        self.__x  = value

    def __get__(self, obj, objType):
        return self.__x

class MyClass(object):
    x = variable()

>>> m = MyClass()
>>> m.x = 10
>>> m.x -= 20
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "my.py", line 7, in __set__
    raise LessThanZeroException('x is less than zero')
LessThanZeroException: x is less than zero

回答 1

应该使用断言来测试永远不会发生的条件。目的是在程序状态损坏的情况下尽早崩溃。

应该将异常用于可能发生的错误,并且几乎应该始终创建自己的Exception类


例如,如果您要编写一个从配置文件读取到的函数,则文件中dict不正确的格式将引发a ConfigurationSyntaxError,同时assert您可以避免返回None


在您的示例中,如果x是通过用户界面或外部来源设置的值,则最好是exceptions。

如果x仅由您自己的代码在同一程序中设置,请声明。

Asserts should be used to test conditions that should never happen. The purpose is to crash early in the case of a corrupt program state.

Exceptions should be used for errors that can conceivably happen, and you should almost always create your own Exception classes.


For example, if you’re writing a function to read from a configuration file into a dict, improper formatting in the file should raise a ConfigurationSyntaxError, while you can assert that you’re not about to return None.


In your example, if x is a value set via a user interface or from an external source, an exception is best.

If x is only set by your own code in the same program, go with an assertion.


回答 2

优化编译后,将删除“ assert”语句。因此,是的,在性能和功能上都存在差异。

当在编译时请求优化时,当前代码生成器不会为assert语句生成任何代码。- Python 2的文档 的Python 3文档

如果您用于assert实现应用程序功能,然后优化对生产的部署,那么“ but-it-works-in-dev”缺陷将给您带来困扰。

参见PYTHONOPTIMIZE-O -​​OO

“assert” statements are removed when the compilation is optimized. So, yes, there are both performance and functional differences.

The current code generator emits no code for an assert statement when optimization is requested at compile time. – Python 2 Docs Python 3 Docs

If you use assert to implement application functionality, then optimize the deployment to production, you will be plagued by “but-it-works-in-dev” defects.

See PYTHONOPTIMIZE and -O -OO


回答 3

的四个目的 assert

假设您与四个同事Alice,Bernd,Carl和Daphne一起处理了200,000行代码。他们叫您的代码,您叫他们的代码。

然后assert具有四个角色

  1. 告知Alice,Bernd,Carl和Daphne您的代码期望什么。
    假设您有一个处理元组列表的方法,并且如果这些元组不是不可变的,则程序逻辑可能会中断:

    def mymethod(listOfTuples):
        assert(all(type(tp)==tuple for tp in listOfTuples))

    比文档中的等效信息更值得信赖,并且更易于维护。

  2. 通知计算机您的代码期望什么。
    assert强制代码调用者采取适当的行为。如果您的代码调用了Alices的代码,而Bernd的代码调用了您的代码,则没有assert,如果程序在Alices代码中崩溃,Bernd可能认为这是Alice的错误,Alice进行了调查,并可能认为这是您的错误,您调查了并告诉Bernd实际上他的。很多工作丢失了。
    有了断言,无论谁打错电话,他们都将能够迅速看到这是他们的错,而不是您的错。爱丽丝,伯恩德,你们都将从中受益。节省大量时间。

  3. 通知您的代码(包括您自己)的读者在某些时候取得了什么成就。
    假设您有一个条目列表,并且每个条目都可以是干净的(很好),也可以是乱码,流浪汉,gullup或闪烁的(都不可接受)。如果它很轻便,必须将其清零。如果是流浪汉,则必须加以保护;如果它是古怪的,则必须将其放小(然后也可能要加快速度);如果已闪烁,则必须再次闪烁(星期四除外)。您会明白:这是复杂的东西。但是最终结果是(或应该是)所有条目都是干净的。Right Thing(TM)要做的是将清洁循环的效果总结为

    assert(all(entry.isClean() for entry in mylist))

    该报表节省了大家试图了解头痛究竟它是一个美妙的循环实现。这些人中最常出现的人可能就是你自己。

  4. 通知计算机您的代码在某些时候已经实现了什么。
    如果您在小跑后忘记了需要它的条目,assert它将节省您的时间,并避免您的代码在以后很长时间内破坏亲爱的达芙妮。

在我看来,assert文档的两个目的(1和3)和保障(2和4)同等重要。
通知人们甚至比通知计算机有价值,因为通知人们可以防止assert目标要抓住的错误(在情况1中)以及在任何情况下都可以避免许多后续错误。

The four purposes of assert

Assume you work on 200,000 lines of code with four colleagues Alice, Bernd, Carl, and Daphne. They call your code, you call their code.

Then assert has four roles:

  1. Inform Alice, Bernd, Carl, and Daphne what your code expects.
    Assume you have a method that processes a list of tuples and the program logic can break if those tuples are not immutable:

    def mymethod(listOfTuples):
        assert(all(type(tp)==tuple for tp in listOfTuples))
    

    This is more trustworthy than equivalent information in the documentation and much easier to maintain.

  2. Inform the computer what your code expects.
    assert enforces proper behavior from the callers of your code. If your code calls Alices’s and Bernd’s code calls yours, then without the assert, if the program crashes in Alices code, Bernd might assume it was Alice’s fault, Alice investigates and might assume it was your fault, you investigate and tell Bernd it was in fact his. Lots of work lost.
    With asserts, whoever gets a call wrong, they will quickly be able to see it was their fault, not yours. Alice, Bernd, and you all benefit. Saves immense amounts of time.

  3. Inform the readers of your code (including yourself) what your code has achieved at some point.
    Assume you have a list of entries and each of them can be clean (which is good) or it can be smorsh, trale, gullup, or twinkled (which are all not acceptable). If it’s smorsh it must be unsmorshed; if it’s trale it must be baludoed; if it’s gullup it must be trotted (and then possibly paced, too); if it’s twinkled it must be twinkled again except on Thursdays. You get the idea: It’s complicated stuff. But the end result is (or ought to be) that all entries are clean. The Right Thing(TM) to do is to summarize the effect of your cleaning loop as

    assert(all(entry.isClean() for entry in mylist))
    

    This statements saves a headache for everybody trying to understand what exactly it is that the wonderful loop is achieving. And the most frequent of these people will likely be yourself.

  4. Inform the computer what your code has achieved at some point.
    Should you ever forget to pace an entry needing it after trotting, the assert will save your day and avoid that your code breaks dear Daphne’s much later.

In my mind, assert‘s two purposes of documentation (1 and 3) and safeguard (2 and 4) are equally valuable.
Informing the people may even be more valuable than informing the computer because it can prevent the very mistakes the assert aims to catch (in case 1) and plenty of subsequent mistakes in any case.


回答 4

除了其他答案外,断言本身会引发异常,但仅断言AssertionErrors。从功利主义的角度来看,断言不适用于需要精细控制所捕获的异常的情况。

In addition to the other answers, asserts themselves throw exceptions, but only AssertionErrors. From a utilitarian standpoint, assertions aren’t suitable for when you need fine grain control over which exceptions you catch.


回答 5

这种方法唯一真正出错的地方是,使用assert语句很难创建非常描述性的异常。如果您正在寻找更简单的语法,请记住您可以执行以下操作:

class XLessThanZeroException(Exception):
    pass

def CheckX(x):
    if x < 0:
        raise XLessThanZeroException()

def foo(x):
    CheckX(x)
    #do stuff here

另一个问题是,使用assert进行正常的条件检查是,使用-O标志很难禁用调试声明。

The only thing that’s really wrong with this approach is that it’s hard to make a very descriptive exception using assert statements. If you’re looking for the simpler syntax, remember you can also do something like this:

class XLessThanZeroException(Exception):
    pass

def CheckX(x):
    if x < 0:
        raise XLessThanZeroException()

def foo(x):
    CheckX(x)
    #do stuff here

Another problem is that using assert for normal condition-checking is that it makes it difficult to disable the debugging asserts using the -O flag.


回答 6

英语语言文字断言在这里的意义上使用发誓申明招认。这并不意味着“检查”“应该”。这意味着作为编码人员正在此处宣誓就职

# I solemnly swear that here I will tell the truth, the whole truth, 
# and nothing but the truth, under pains and penalties of perjury, so help me FSM
assert answer == 42

如果代码正确,则除非发生单事件失败,硬件故障等,否则断言不会失败。这就是为什么不得影响最终用户的程序行为。特别是,断言即使在特殊的程序条件下也不会失败。只是从来没有发生过。如果发生这种情况,程序员应该对此进行调整。

The English language word assert here is used in the sense of swear, affirm, avow. It doesn’t mean “check” or “should be”. It means that you as a coder are making a sworn statement here:

# I solemnly swear that here I will tell the truth, the whole truth, 
# and nothing but the truth, under pains and penalties of perjury, so help me FSM
assert answer == 42

If the code is correct, barring Single-event upsets, hardware failures and such, no assert will ever fail. That is why the behaviour of the program to an end user must not be affected. Especially, an assert cannot fail even under exceptional programmatic conditions. It just doesn’t ever happen. If it happens, the programmer should be zapped for it.


回答 7

如前所述,当您的代码永远都不能达到目标时就应该使用断言,这意味着那里存在一个错误。我可以看到使用断言的最有用的原因可能是不变/前置/后置条件。这些在循环或函数的每次迭代的开始或结束时必须为真。

例如,一个递归函数(2个独立的函数,因此1个处理错误的输入,另一个处理错误的代码,导致很难通过递归来区分)。如果我忘记编写if语句,那将很明显地说明出了问题。

def SumToN(n):
    if n <= 0:
        raise ValueError, "N must be greater than or equal to 0"
    else:
        return RecursiveSum(n)

def RecursiveSum(n):
    #precondition: n >= 0
    assert(n >= 0)
    if n == 0:
        return 0
    return RecursiveSum(n - 1) + n
    #postcondition: returned sum of 1 to n

这些循环不变式通常可以用断言来表示。

As has been said previously, assertions should be used when your code SHOULD NOT ever reach a point, meaning there is a bug there. Probably the most useful reason I can see to use an assertion is an invariant/pre/postcondition. These are something that must be true at the start or end of each iteration of a loop or a function.

For example, a recursive function (2 seperate functions so 1 handles bad input and the other handles bad code, cause it’s hard to distinguish with recursion). This would make it obvious if I forgot to write the if statement, what had gone wrong.

def SumToN(n):
    if n <= 0:
        raise ValueError, "N must be greater than or equal to 0"
    else:
        return RecursiveSum(n)

def RecursiveSum(n):
    #precondition: n >= 0
    assert(n >= 0)
    if n == 0:
        return 0
    return RecursiveSum(n - 1) + n
    #postcondition: returned sum of 1 to n

These loop invariants often can be represented with an assertion.


回答 8

是否存在性能问题?

  • 请记住“先使其工作,然后再使其快速工作”
    通常,几乎没有任何程序的百分比与其速度有关。assert如果事实证明存在性能问题,您总是可以开除或简化它,而其中大多数绝不会。

  • 务实
    假设您有一种处理元组的非空列表的方法,并且如果这些元组不是不可变的,则程序逻辑将中断。您应该写:

    def mymethod(listOfTuples):
        assert(all(type(tp)==tuple for tp in listOfTuples))

    如果您的列表往往有十个条目,这可能很好,但是如果它们有一百万个条目,则可能会成为问题。但是,与其完全丢弃这张贵重的支票,不如将其降级为

    def mymethod(listOfTuples):
        assert(type(listOfTuples[0])==tuple)  # in fact _all_ must be tuples!

    这很便宜,但无论如何都会捕获大多数实际程序错误。

Is there a performance issue?

  • Please remember to “make it work first before you make it work fast”.
    Very few percent of any program are usually relevant for its speed. You can always kick out or simplify an assert if it ever proves to be a performance problem — and most of them never will.

  • Be pragmatic:
    Assume you have a method that processes a non-empty list of tuples and the program logic will break if those tuples are not immutable. You should write:

    def mymethod(listOfTuples):
        assert(all(type(tp)==tuple for tp in listOfTuples))
    

    This is probably fine if your lists tend to be ten entries long, but it can become a problem if they have a million entries. But rather than discarding this valuable check entirely you could simply downgrade it to

    def mymethod(listOfTuples):
        assert(type(listOfTuples[0])==tuple)  # in fact _all_ must be tuples!
    

    which is cheap but will likely catch most of the actual program errors anyway.


回答 9

好吧,这是一个悬而未决的问题,我想谈谈两个方面:何时添加断言以及如何编写错误消息。

目的

向初学者解释它-断言是可能引发错误的语句,但是您不会抓住它们。而且通常不应该将它们提高,但是在现实生活中,无论如何它们有时都会得到提高。这是一种严重的情况,代码无法从中恢复,我们称之为“致命错误”。

接下来,它是出于“调试目的”,虽然正确,但听起来很不屑一顾。我更喜欢“声明不变式,永远不应该被违反”的表述,尽管它在不同的初学者中的工作方式有所不同……有些“只懂它”,而另一些要么找不到用处,要么替换正常的异常,甚至用它控制流程。

样式

在Python中,assert它是语句,而不是函数!(请记住assert(False, 'is true')不会提高。但是,请注意:

何时以及如何编写可选的“错误消息”?

此acually适用于单元测试框架,其通常具有许多专用的方法来做断言(assertTrue(condition)assertFalse(condition), assertEqual(actual, expected)等)。它们通常还提供一种对断言进行评论的方法。

在一次性代码中,您可以不显示错误消息。

在某些情况下,没有要添加的断言:

def dump(something):断言isinstance(something,Dumpable)#…

但是除此之外,一条消息对于与其他程序员(有时是代码的交互用户,例如在Ipython / Jupyter等中)的交互用户很有用。

给他们提供信息,而不仅仅是泄漏内部实施细节。

代替:

assert meaningless_identifier <= MAGIC_NUMBER_XXX, 'meaningless_identifier is greater than MAGIC_NUMBER_XXX!!!'

写:

assert meaningless_identifier > MAGIC_NUMBER_XXX, 'reactor temperature above critical threshold'

甚至:

assert meaningless_identifier > MAGIC_NUMBER_XXX, f'reactor temperature({meaningless_identifier }) above critical threshold ({MAGIC_NUMBER_XXX})'

我知道,我知道-这不是静态断言的情况,但我想指出消息的信息价值。

消极或正面信息?

这可能是肯定的,但阅读以下内容会伤害我:

assert a == b, 'a is not equal to b'
  • 这是彼此矛盾的两件事。因此,只要我对代码库产生影响,我就会通过使用诸如“必须”和“应该”之类的多余动词来推动我们想要的内容,而不是说我们不需要的内容。

    断言a == b,’a必须等于b’

然后,获取AssertionError: a must be equal to b也是可读的,并且该语句在代码中看起来合乎逻辑。另外,您可以从中获得某些东西而无需阅读回溯(有时甚至不可用)。

Well, this is an open question, and I have two aspects that I want to touch on: when to add assertions and how to write the error messages.

Purpose

To explain it to a beginner – assertions are statements which can raise errors, but you won’t be catching them. And they normally should not be raised, but in real life they sometimes do get raised anyway. And this is a serious situation, which the code cannot recover from, what we call a ‘fatal error’.

Next, it’s for ‘debugging purposes’, which, while correct, sounds very dismissive. I like the ‘declaring invariants, which should never be violated’ formulation better, although it works differently on different beginners… Some ‘just get it’, and others either don’t find any use for it, or replace normal exceptions, or even control flow with it.

Style

In Python, assert is a statement, not a function! (remember assert(False, 'is true') will not raise. But, having that out of the way:

When, and how, to write the optional ‘error message’?

This acually applies to unit testing frameworks, which often have many dedicated methods to do assertions (assertTrue(condition), assertFalse(condition), assertEqual(actual, expected) etc.). They often also provide a way to comment on the assertion.

In throw-away code you could do without the error messages.

In some cases, there is nothing to add to the assertion:

def dump(something): assert isinstance(something, Dumpable) # …

But apart from that, a message is useful for communication with other programmers (which are sometimes interactive users of your code, e.g. in Ipython/Jupyter etc.).

Give them information, not just leak internal implementation details.

instead of:

assert meaningless_identifier <= MAGIC_NUMBER_XXX, 'meaningless_identifier is greater than MAGIC_NUMBER_XXX!!!'

write:

assert meaningless_identifier > MAGIC_NUMBER_XXX, 'reactor temperature above critical threshold'

or maybe even:

assert meaningless_identifier > MAGIC_NUMBER_XXX, f'reactor temperature({meaningless_identifier }) above critical threshold ({MAGIC_NUMBER_XXX})'

I know, I know – this is not a case for a static assertion, but I want to point to the informational value of the message.

Negative or positive message?

This may be conroversial, but it hurts me to read things like:

assert a == b, 'a is not equal to b'
  • these are two contradictory things written next to eachother. So whenever I have an influence on the codebase, I push for specifying what we want, by using extra verbs like ‘must’ and ‘should’, and not to say what we don’t want.

    assert a == b, ‘a must be equal to b’

Then, getting AssertionError: a must be equal to b is also readable, and the statement looks logical in code. Also, you can get something out of it without reading the traceback (which can sometimes not even be available).


回答 10

assert异常的使用和引发都与沟通有关。

  • 断言是关于开发人员要解决的代码正确性的声明:代码中的断言将代码的正确性告知读者,有关正确代码必须满足的条件。在运行时失败的断言通知开发人员代码中存在需要修复的缺陷。

  • 异常是关于非典型情况的指示,这些非典型情况可能在运行时发生,但不能被手头的代码解决,请在此处处理的调用代码处解决。发生异常并不表示代码中存在错误。

最佳实践

因此,如果您将运行时发生的特定情况视为要通知开发人员的错误(“开发人员,此情况表明某个地方存在错误,请修复代码。”)然后断言。如果断言检查代码的输入参数,则通常应在输入参数违反条件时向文档添加代码具有“未定义行为”的文档。

如果不是这样的情况的发生并不是您眼中的错误的迹象,而是您认为应该由客户端代码处理的(可能很少见但)可能的情况,请引发异常。引发异常的情况应该是相应代码文档的一部分。

使用时是否存在性能问题? assert

断言的评估需要一些时间。不过,可以在编译时将其消除。但是,这会带来一些后果,请参见下文。

使用时是否存在代码维护问题 assert

断言通常可以提高代码的可维护性,因为它们可以通过使假设明确化并在运行时定期验证这些假设来提高可读性。这也将有助于捕获回归。但是,需要牢记一个问题:断言中使用的表达式应该没有副作用。如上所述,可以在编译时消除断言-这意味着潜在的副作用也将消失。这可以-意外地-更改代码的行为。

Both the use of assert and the raising of exceptions are about communication.

  • Assertions are statements about the correctness of code addressed at developers: An assertion in the code informs readers of the code about conditions that have to be fulfilled for the code being correct. An assertion that fails at run-time informs developers that there is a defect in the code that needs fixing.

  • Exceptions are indications about non-typical situations that can occur at run-time but can not be resolved by the code at hand, addressed at the calling code to be handled there. The occurence of an exception does not indicate that there is a bug in the code.

Best practice

Therefore, if you consider the occurence of a specific situation at run-time as a bug that you would like to inform the developers about (“Hi developer, this condition indicates that there is a bug somewhere, please fix the code.”) then go for an assertion. If the assertion checks input arguments of your code, you should typically add to the documentation that your code has “undefined behaviour” when the input arguments violate that conditions.

If instead the occurrence of that very situation is not an indication of a bug in your eyes, but instead a (maybe rare but) possible situation that you think should rather be handled by the client code, raise an exception. The situations when which exception is raised should be part of the documentation of the respective code.

Is there a performance […] issue with using assert

The evaluation of assertions takes some time. They can be eliminated at compile time, though. This has some consequences, however, see below.

Is there a […] code maintenance issue with using assert

Normally assertions improve the maintainability of the code, since they improve readability by making assumptions explicit and during run-time regularly verifying these assumptions. This will also help catching regressions. There is one issue, however, that needs to be kept in mind: Expressions used in assertions should have no side-effects. As mentioned above, assertions can be eliminated at compile time – which means that also the potential side-effects would disappear. This can – unintendedly – change the behaviour of the code.


回答 11

声明将检查-1
.有效条件,
2.有效语句,
3.真实逻辑;
源代码。不会使整个项目失败,而是发出警报,指出源文件中不适当的内容。

在示例1中,由于变量’str’不为null。因此,不会引发任何断言或异常。

范例1:

#!/usr/bin/python

str = 'hello Python!'
strNull = 'string is Null'

if __debug__:
    if not str: raise AssertionError(strNull)
print str

if __debug__:
    print 'FileName '.ljust(30,'.'),(__name__)
    print 'FilePath '.ljust(30,'.'),(__file__)


------------------------------------------------------

Output:
hello Python!
FileName ..................... hello
FilePath ..................... C:/Python\hello.py

在示例2中,var’str’为null。因此,我们可以通过assert语句来挽救用户,使其免于出现错误的程序。

范例2:

#!/usr/bin/python

str = ''
strNull = 'NULL String'

if __debug__:
    if not str: raise AssertionError(strNull)
print str

if __debug__:
    print 'FileName '.ljust(30,'.'),(__name__)
    print 'FilePath '.ljust(30,'.'),(__file__)


------------------------------------------------------

Output:
AssertionError: NULL String

当我们不想调试并意识到源代码中的断言问题时。禁用优化标志

python -O assertStatement.py
什么也不会得到打印

An Assert is to check –
1. the valid condition,
2. the valid statement,
3. true logic;
of source code. Instead of failing the whole project it gives an alarm that something is not appropriate in your source file.

In example 1, since variable ‘str’ is not null. So no any assert or exception get raised.

Example 1:

#!/usr/bin/python

str = 'hello Python!'
strNull = 'string is Null'

if __debug__:
    if not str: raise AssertionError(strNull)
print str

if __debug__:
    print 'FileName '.ljust(30,'.'),(__name__)
    print 'FilePath '.ljust(30,'.'),(__file__)


------------------------------------------------------

Output:
hello Python!
FileName ..................... hello
FilePath ..................... C:/Python\hello.py

In example 2, var ‘str’ is null. So we are saving the user from going ahead of faulty program by assert statement.

Example 2:

#!/usr/bin/python

str = ''
strNull = 'NULL String'

if __debug__:
    if not str: raise AssertionError(strNull)
print str

if __debug__:
    print 'FileName '.ljust(30,'.'),(__name__)
    print 'FilePath '.ljust(30,'.'),(__file__)


------------------------------------------------------

Output:
AssertionError: NULL String

The moment we don’t want debug and realized the assertion issue in the source code. Disable the optimization flag

python -O assertStatement.py
nothing will get print


回答 12

在PTVS,PyCharm等IDE中,assert isinstance()可以使用Wing 语句为一些不清楚的对象启用代码完成功能。

In IDE’s such as PTVS, PyCharm, Wing assert isinstance() statements can be used to enable code completion for some unclear objects.


回答 13

对于它的价值,如果您要处理依靠assert其正常运行的代码,那么添加以下代码将确保启用断言:

try:
    assert False
    raise Exception('Python assertions are not working. This tool relies on Python assertions to do its job. Possible causes are running with the "-O" flag or running a precompiled (".pyo" or ".pyc") module.')
except AssertionError:
    pass

For what it’s worth, if you’re dealing with code which relies on assert to function properly, then adding the following code will ensure that asserts are enabled:

try:
    assert False
    raise Exception('Python assertions are not working. This tool relies on Python assertions to do its job. Possible causes are running with the "-O" flag or running a precompiled (".pyo" or ".pyc") module.')
except AssertionError:
    pass

如何删除/删除virtualenv?

问题:如何删除/删除virtualenv?

我使用以下命令创建了一个环境: virtualenv venv --distribute

:我无法用下面的命令将其删除rmvirtualenv venv这是部分virtualenvwrapper中提到下面virtualenvwrapper答案

ls在当前目录上执行了,但仍然看到venv

我可以删除它的唯一方法似乎是: sudo rm -rf venv

请注意该环境处于非活动状态。我正在运行Ubuntu 11.10。有任何想法吗?我尝试重新启动系统无济于事。

I created an environment with the following command: virtualenv venv --distribute

I cannot remove it with the following command: rmvirtualenv venvThis is part of virtualenvwrapper as mentioned in answer below for virtualenvwrapper

I do an lson my current directory and I still see venv

The only way I can remove it seems to be: sudo rm -rf venv

Note that the environment is not active. I’m running Ubuntu 11.10. Any ideas? I’ve tried rebooting my system to no avail.


回答 0

而已!没有用于删除虚拟环境的命令。只需停用它,然后通过递归删除它即可消除应用程序的工件。

请注意,无论您使用哪种虚拟环境,都一样。virtualenvvenv,Python环境pyenvpipenv都在这里基于同样的原则。

That’s it! There is no command for deleting your virtual environment. Simply deactivate it and rid your application of its artifacts by recursively removing it.

Note that this is the same regardless of what kind of virtual environment you are using. virtualenv, venv, Anaconda environment, pyenv, pipenv are all based the same principle here.


回答 1

只是为了回应@skytreader先前评论的内容,它rmvirtualenv是由virtualenvwrapper而不是提供的命令virtualenv。也许您没有virtualenvwrapper安装?

有关更多详细信息,请参见VirtualEnvWrapper命令参考

Just to echo what @skytreader had previously commented, rmvirtualenv is a command provided by virtualenvwrapper, not virtualenv. Maybe you didn’t have virtualenvwrapper installed?

See VirtualEnvWrapper Command Reference for more details.


回答 2

采用 rmvirtualenv

在中删除环境$WORKON_HOME

句法:

rmvirtualenv ENVNAME

在删除当前环境之前,必须使用停用功能。

$ rmvirtualenv my_env

参考:http : //virtualenvwrapper.readthedocs.io/en/latest/command_ref.html

Use rmvirtualenv

Remove an environment, in the $WORKON_HOME.

Syntax:

rmvirtualenv ENVNAME

You must use deactivate before removing the current environment.

$ rmvirtualenv my_env

Reference: http://virtualenvwrapper.readthedocs.io/en/latest/command_ref.html


回答 3

您可以通过递归卸载所有依赖项来删除所有依赖项,然后删除venv。

编辑,包括艾萨克·特纳评论

source venv/bin/activate
pip freeze > requirements.txt
pip uninstall -r requirements.txt -y
deactivate
rm -r venv/

You can remove all the dependencies by recursively uninstalling all of them and then delete the venv.

Edit including Isaac Turner commentary

source venv/bin/activate
pip freeze > requirements.txt
pip uninstall -r requirements.txt -y
deactivate
rm -r venv/

回答 4

只需从系统中删除虚拟环境即可,无需特殊命令

rm -rf venv

Simply remove the virtual environment from the system.There’s no special command for it

rm -rf venv

回答 5

来自virtualenv的官方文档https://virtualenv.pypa.io/en/stable/userguide/

删除环境

只需删除虚拟环境,然后将其所有内容删除即可,只需删除虚拟环境即可:

(ENV)$ deactivate
$ rm -r /path/to/ENV

from virtualenv’s official document https://virtualenv.pypa.io/en/stable/userguide/

Removing an Environment

Removing a virtual environment is simply done by deactivating it and deleting the environment folder with all its contents:

(ENV)$ deactivate
$ rm -r /path/to/ENV

回答 6

如果您使用的是pyenv,则可以删除您的虚拟环境:

$ pyenv virtualenv-delete <name>

If you are using pyenv, it is possible to delete your virtual environment:

$ pyenv virtualenv-delete <name>

回答 7

以下命令对我有用。

rm -rf /path/to/virtualenv

The following command works for me.

rm -rf /path/to/virtualenv

回答 8

我曾经pyenv uninstall my_virt_env_name删除虚拟环境。

注意:我使用的是通过安装脚本安装的pyenv-virtualenv。

I used pyenv uninstall my_virt_env_name to delete the virual environment.

Note: I’m using pyenv-virtualenv installed through the install script.


回答 9

如果您是Windows用户并且正在使用conda在Anaconda提示符下管理环境,则可以执行以下操作:

确保停用虚拟环境或重新启动Anaconda Prompt。使用以下命令删除虚拟环境:

$ conda env remove --name $MyEnvironmentName

或者,您可以转到

C:\Users\USERNAME\AppData\Local\Continuum\anaconda3\envs\MYENVIRONMENTNAME

(这是默认文件路径)并手动删除文件夹。

If you are a Windows user and you are using conda to manage the environment in Anaconda prompt, you can do the following:

Make sure you deactivate the virtual environment or restart Anaconda Prompt. Use the following command to remove virtual environment:

$ conda env remove --name $MyEnvironmentName

Alternatively, you can go to the

C:\Users\USERNAME\AppData\Local\Continuum\anaconda3\envs\MYENVIRONMENTNAME

(that’s the default file path) and delete the folder manually.


回答 10

如果您是Windows用户,则位于C:\ Users \您的用户名\ Envs中。您可以从那里删除它。

也可以在命令提示符下输入rmvirtualenv环境名称。

我尝试使用命令提示符,所以它说已删除,但它仍然存在。所以我手动将其删除。

if you are windows user, then it’s in C:\Users\your_user_name\Envs. You can delete it from there.

Also try in command prompt rmvirtualenv environment name.

I tried with command prompt so it said deleted but it was still existed. So i manually delete it.


回答 11

deactivate是您要查找的命令。就像已经说过的一样,没有删除虚拟环境的命令。只需停用它!

deactivate is the command you are looking for. Like what has already been said, there is no command for deleting your virtual environment. Simply deactivate it!


回答 12

如果您是Windows用户,还可以通过以下步骤删除环境:C:/Users/username/Anaconda3/envs 在这里,您可以看到虚拟环境列表,并删除不再需要的环境。

If you’re a windows user, you can also delete the environment by going to: C:/Users/username/Anaconda3/envs Here you can see a list of virtual environment and delete the one that you no longer need.


回答 13

您可以按照以下步骤删除与virtualenv关联的所有文件,然后再次重新安装virtualenv并使用它

cd {python virtualenv folder}

find {broken virtualenv}/ -type l                             ## to list out all the links

deactivate                                           ## deactivate if virtualenv is active

find {broken virtualenv}/ -type l -delete                    ## to delete the broken links

virtualenv {broken virtualenv} --python=python3           ## recreate links to OS's python

workon {broken virtualenv}                       ## activate & workon the fixed virtualenv

pip3 install  ... {other packages required for the project}

You can follow these steps to remove all the files associated with virtualenv and then reinstall the virtualenv again and using it

cd {python virtualenv folder}

find {broken virtualenv}/ -type l                             ## to list out all the links

deactivate                                           ## deactivate if virtualenv is active

find {broken virtualenv}/ -type l -delete                    ## to delete the broken links

virtualenv {broken virtualenv} --python=python3           ## recreate links to OS's python

workon {broken virtualenv}                       ## activate & workon the fixed virtualenv

pip3 install  ... {other packages required for the project}


回答 14

步骤1:通过复制并粘贴以下命令来删除virtualenv virtualenvwrapper:

$ sudo pip uninstall virtualenv virtualenvwrapper

步骤2:转到.bashrc并删除所有virtualenv和virtualenvwrapper

打开终端:

$ sudo nano .bashrc

向下滚动,您将看到下面的代码,然后将其删除。

# virtualenv and virtualenvwrapper
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
source /usr/local/bin/virtualenvwrapper.sh

接下来,获取.bashrc:

$ source ~/.bashrc

最后步骤:在没有终端/外壳的情况下,转到/ home并查找.virtualenv(我忘记了名称,因此,如果您发现与之相似.virtualenv.venv只是删除它。这将起作用。

step 1: delete virtualenv virtualenvwrapper by copy and paste the following command below:

$ sudo pip uninstall virtualenv virtualenvwrapper

step 2: go to .bashrc and delete all virtualenv and virtualenvwrapper

open terminal:

$ sudo nano .bashrc

scroll down and you will see the code bellow then delete it.

# virtualenv and virtualenvwrapper
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
source /usr/local/bin/virtualenvwrapper.sh

next, source the .bashrc:

$ source ~/.bashrc

FINAL steps: without terminal/shell go to /home and find .virtualenv (I forgot the name so if your find similar to .virtualenv or .venv just delete it. That will work.


如何检查Pandas DataFrame中的值是否为NaN

问题:如何检查Pandas DataFrame中的值是否为NaN

在Python Pandas中,检查DataFrame是否具有一个(或多个)NaN值的最佳方法是什么?

我知道函数pd.isnan,但是这会为每个元素返回一个布尔值的DataFrame。此处的帖子也无法完全回答我的问题。

In Python Pandas, what’s the best way to check whether a DataFrame has one (or more) NaN values?

I know about the function pd.isnan, but this returns a DataFrame of booleans for each element. This post right here doesn’t exactly answer my question either.


回答 0

jwilner的反应是现场的。我一直在探索是否有更快的选择,因为根据我的经验,求平面数组的总和(奇怪)比计数快。这段代码看起来更快:

df.isnull().values.any()

例如:

In [2]: df = pd.DataFrame(np.random.randn(1000,1000))

In [3]: df[df > 0.9] = pd.np.nan

In [4]: %timeit df.isnull().any().any()
100 loops, best of 3: 14.7 ms per loop

In [5]: %timeit df.isnull().values.sum()
100 loops, best of 3: 2.15 ms per loop

In [6]: %timeit df.isnull().sum().sum()
100 loops, best of 3: 18 ms per loop

In [7]: %timeit df.isnull().values.any()
1000 loops, best of 3: 948 µs per loop

df.isnull().sum().sum()速度稍慢,但当然还有其他信息-的数量NaNs

jwilner‘s response is spot on. I was exploring to see if there’s a faster option, since in my experience, summing flat arrays is (strangely) faster than counting. This code seems faster:

df.isnull().values.any()

For example:

In [2]: df = pd.DataFrame(np.random.randn(1000,1000))

In [3]: df[df > 0.9] = pd.np.nan

In [4]: %timeit df.isnull().any().any()
100 loops, best of 3: 14.7 ms per loop

In [5]: %timeit df.isnull().values.sum()
100 loops, best of 3: 2.15 ms per loop

In [6]: %timeit df.isnull().sum().sum()
100 loops, best of 3: 18 ms per loop

In [7]: %timeit df.isnull().values.any()
1000 loops, best of 3: 948 µs per loop

df.isnull().sum().sum() is a bit slower, but of course, has additional information — the number of NaNs.


回答 1

您有两种选择。

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10,6))
# Make a few areas have NaN values
df.iloc[1:3,1] = np.nan
df.iloc[5,3] = np.nan
df.iloc[7:9,5] = np.nan

现在数据框看起来像这样:

          0         1         2         3         4         5
0  0.520113  0.884000  1.260966 -0.236597  0.312972 -0.196281
1 -0.837552       NaN  0.143017  0.862355  0.346550  0.842952
2 -0.452595       NaN -0.420790  0.456215  1.203459  0.527425
3  0.317503 -0.917042  1.780938 -1.584102  0.432745  0.389797
4 -0.722852  1.704820 -0.113821 -1.466458  0.083002  0.011722
5 -0.622851 -0.251935 -1.498837       NaN  1.098323  0.273814
6  0.329585  0.075312 -0.690209 -3.807924  0.489317 -0.841368
7 -1.123433 -1.187496  1.868894 -2.046456 -0.949718       NaN
8  1.133880 -0.110447  0.050385 -1.158387  0.188222       NaN
9 -0.513741  1.196259  0.704537  0.982395 -0.585040 -1.693810
  • 选项1df.isnull().any().any()-返回布尔值

您知道isnull()哪个会返回这样的数据帧:

       0      1      2      3      4      5
0  False  False  False  False  False  False
1  False   True  False  False  False  False
2  False   True  False  False  False  False
3  False  False  False  False  False  False
4  False  False  False  False  False  False
5  False  False  False   True  False  False
6  False  False  False  False  False  False
7  False  False  False  False  False   True
8  False  False  False  False  False   True
9  False  False  False  False  False  False

如果您这样做df.isnull().any(),则只能找到具有NaN值的列:

0    False
1     True
2    False
3     True
4    False
5     True
dtype: bool

还有一个.any()会告诉你,如果上述任何有True

> df.isnull().any().any()
True
  • 选项2df.isnull().sum().sum()-返回NaN值总数的整数:

这与操作相同.any().any(),首先对NaN列中的值数量求和,然后对这些值求和:

df.isnull().sum()
0    0
1    2
2    0
3    1
4    0
5    2
dtype: int64

最后,要获取DataFrame中NaN值的总数:

df.isnull().sum().sum()
5

You have a couple of options.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10,6))
# Make a few areas have NaN values
df.iloc[1:3,1] = np.nan
df.iloc[5,3] = np.nan
df.iloc[7:9,5] = np.nan

Now the data frame looks something like this:

          0         1         2         3         4         5
0  0.520113  0.884000  1.260966 -0.236597  0.312972 -0.196281
1 -0.837552       NaN  0.143017  0.862355  0.346550  0.842952
2 -0.452595       NaN -0.420790  0.456215  1.203459  0.527425
3  0.317503 -0.917042  1.780938 -1.584102  0.432745  0.389797
4 -0.722852  1.704820 -0.113821 -1.466458  0.083002  0.011722
5 -0.622851 -0.251935 -1.498837       NaN  1.098323  0.273814
6  0.329585  0.075312 -0.690209 -3.807924  0.489317 -0.841368
7 -1.123433 -1.187496  1.868894 -2.046456 -0.949718       NaN
8  1.133880 -0.110447  0.050385 -1.158387  0.188222       NaN
9 -0.513741  1.196259  0.704537  0.982395 -0.585040 -1.693810
  • Option 1: df.isnull().any().any() – This returns a boolean value

You know of the isnull() which would return a dataframe like this:

       0      1      2      3      4      5
0  False  False  False  False  False  False
1  False   True  False  False  False  False
2  False   True  False  False  False  False
3  False  False  False  False  False  False
4  False  False  False  False  False  False
5  False  False  False   True  False  False
6  False  False  False  False  False  False
7  False  False  False  False  False   True
8  False  False  False  False  False   True
9  False  False  False  False  False  False

If you make it df.isnull().any(), you can find just the columns that have NaN values:

0    False
1     True
2    False
3     True
4    False
5     True
dtype: bool

One more .any() will tell you if any of the above are True

> df.isnull().any().any()
True
  • Option 2: df.isnull().sum().sum() – This returns an integer of the total number of NaN values:

This operates the same way as the .any().any() does, by first giving a summation of the number of NaN values in a column, then the summation of those values:

df.isnull().sum()
0    0
1    2
2    0
3    1
4    0
5    2
dtype: int64

Finally, to get the total number of NaN values in the DataFrame:

df.isnull().sum().sum()
5

回答 2

要找出特定列中具有NaN的行:

nan_rows = df[df['name column'].isnull()]

To find out which rows have NaNs in a specific column:

nan_rows = df[df['name column'].isnull()]

回答 3

如果您需要知道带有“一个或多个NaNs”的行数:

df.isnull().T.any().T.sum()

或者,如果您需要拉出这些行并进行检查:

nan_rows = df[df.isnull().T.any().T]

If you need to know how many rows there are with “one or more NaNs”:

df.isnull().T.any().T.sum()

Or if you need to pull out these rows and examine them:

nan_rows = df[df.isnull().T.any().T]

回答 4

df.isnull().any().any() 应该这样做。

df.isnull().any().any() should do it.


回答 5

除了给霍布斯一个绝妙的答案外,我对Python和Pandas还很陌生,所以请指出我是否错。

要找出哪些行具有NaN:

nan_rows = df[df.isnull().any(1)]

通过将any()的轴指定为1来检查行中是否存在“ True”,将无需移置即可执行相同的操作。

Adding to Hobs brilliant answer, I am very new to Python and Pandas so please point out if I am wrong.

To find out which rows have NaNs:

nan_rows = df[df.isnull().any(1)]

would perform the same operation without the need for transposing by specifying the axis of any() as 1 to check if ‘True’ is present in rows.


回答 6

超级简单语法: df.isna().any(axis=None)

从v0.23.2开始,可以使用DataFrame.isna+ DataFrame.any(axis=None)其中axis=None指定整个DataFrame的逻辑归约。

# Setup
df = pd.DataFrame({'A': [1, 2, np.nan], 'B' : [np.nan, 4, 5]})
df
     A    B
0  1.0  NaN
1  2.0  4.0
2  NaN  5.0

df.isna()

       A      B
0  False   True
1  False  False
2   True  False

df.isna().any(axis=None)
# True

有用的选择

numpy.isnan
如果您正在运行旧版本的熊猫,则是另一个性能选择。

np.isnan(df.values)

array([[False,  True],
       [False, False],
       [ True, False]])

np.isnan(df.values).any()
# True

或者,检查总和:

np.isnan(df.values).sum()
# 2

np.isnan(df.values).sum() > 0
# True

Series.hasnans
您也可以迭代调用Series.hasnans。例如,要检查单个列是否具有NaN,

df['A'].hasnans
# True

并检查任何列有NaN的,你可以使用与理解any(这是一个短路操作)。

any(df[c].hasnans for c in df)
# True

这实际上非常快。

Super Simple Syntax: df.isna().any(axis=None)

Starting from v0.23.2, you can use DataFrame.isna + DataFrame.any(axis=None) where axis=None specifies logical reduction over the entire DataFrame.

# Setup
df = pd.DataFrame({'A': [1, 2, np.nan], 'B' : [np.nan, 4, 5]})
df
     A    B
0  1.0  NaN
1  2.0  4.0
2  NaN  5.0

df.isna()

       A      B
0  False   True
1  False  False
2   True  False

df.isna().any(axis=None)
# True

Useful Alternatives

numpy.isnan
Another performant option if you’re running older versions of pandas.

np.isnan(df.values)

array([[False,  True],
       [False, False],
       [ True, False]])

np.isnan(df.values).any()
# True

Alternatively, check the sum:

np.isnan(df.values).sum()
# 2

np.isnan(df.values).sum() > 0
# True

Series.hasnans
You can also iteratively call Series.hasnans. For example, to check if a single column has NaNs,

df['A'].hasnans
# True

And to check if any column has NaNs, you can use a comprehension with any (which is a short-circuiting operation).

any(df[c].hasnans for c in df)
# True

This is actually very fast.


回答 7

由于没有人提及,因此只有一个名为的变量hasnans

df[i].hasnansTrue如果pandas系列中的一个或多个值是NaN,False则输出为NaN(如果不是)。请注意,它不是功能。

熊猫版本“ 0.19.2”和“ 0.20.2”

Since none have mentioned, there is just another variable called hasnans.

df[i].hasnans will output to True if one or more of the values in the pandas Series is NaN, False if not. Note that its not a function.

pandas version ‘0.19.2’ and ‘0.20.2’


回答 8

由于pandas必须为此找到答案DataFrame.dropna(),因此我看了看他们是如何实现它的,并发现他们利用了它DataFrame.count(),它计算了中的所有非空值DataFrame。cf. 熊猫源代码。我尚未对该技术进行基准测试,但是我认为该库的作者可能已经对如何进行选择做出了明智的选择。

Since pandas has to find this out for DataFrame.dropna(), I took a look to see how they implement it and discovered that they made use of DataFrame.count(), which counts all non-null values in the DataFrame. Cf. pandas source code. I haven’t benchmarked this technique, but I figure the authors of the library are likely to have made a wise choice for how to do it.


回答 9

dfPandas DataFrame的名称,以及任何numpy.nan为空值的值。

  1. 如果要查看哪些列为空,哪些不为空(仅True和False)
    df.isnull().any()
  2. 如果只想查看具有空值的列
    df.loc[:, df.isnull().any()].columns
  3. 如果要查看每列中的空值计数
    df.isna().sum()
  4. 如果要查看每列中空值的百分比

    df.isna().sum()/(len(df))*100
  5. 如果要查看仅包含空值的列中的空值百分比: df.loc[:,list(df.loc[:,df.isnull().any()].columns)].isnull().sum()/(len(df))*100

编辑1:

如果要直观地查看数据丢失的位置:

import missingno
missingdata_df = df.columns[df.isnull().any()].tolist()
missingno.matrix(df[missingdata_df])

let df be the name of the Pandas DataFrame and any value that is numpy.nan is a null value.

  1. If you want to see which columns has nulls and which not(just True and False)
    df.isnull().any()
    
  2. If you want to see only the columns that has nulls
    df.loc[:, df.isnull().any()].columns
    
  3. If you want to see the count of nulls in every column
    df.isna().sum()
    
  4. If you want to see the percentage of nulls in every column

    df.isna().sum()/(len(df))*100
    
  5. If you want to see the percentage of nulls in columns only with nulls: df.loc[:,list(df.loc[:,df.isnull().any()].columns)].isnull().sum()/(len(df))*100

EDIT 1:

If you want to see where your data is missing visually:

import missingno
missingdata_df = df.columns[df.isnull().any()].tolist()
missingno.matrix(df[missingdata_df])

回答 10

仅使用 math.isnan(x),如果x是一个NaN(不是数字),则返回True,否则返回False。

Just using math.isnan(x), Return True if x is a NaN (not a number), and False otherwise.


回答 11

df.isnull().sum()

这将为您提供DataFrame各个列中存在的所有NaN值的计数。

df.isnull().sum()

This will give you count of all NaN values present in the respective coloums of the DataFrame.


回答 12

这是找到空值并替换为计算值的另一种有趣方式

    #Creating the DataFrame

    testdf = pd.DataFrame({'Tenure':[1,2,3,4,5],'Monthly':[10,20,30,40,50],'Yearly':[10,40,np.nan,np.nan,250]})
    >>> testdf2
       Monthly  Tenure  Yearly
    0       10       1    10.0
    1       20       2    40.0
    2       30       3     NaN
    3       40       4     NaN
    4       50       5   250.0

    #Identifying the rows with empty columns
    nan_rows = testdf2[testdf2['Yearly'].isnull()]
    >>> nan_rows
       Monthly  Tenure  Yearly
    2       30       3     NaN
    3       40       4     NaN

    #Getting the rows# into a list
    >>> index = list(nan_rows.index)
    >>> index
    [2, 3]

    # Replacing null values with calculated value
    >>> for i in index:
        testdf2['Yearly'][i] = testdf2['Monthly'][i] * testdf2['Tenure'][i]
    >>> testdf2
       Monthly  Tenure  Yearly
    0       10       1    10.0
    1       20       2    40.0
    2       30       3    90.0
    3       40       4   160.0
    4       50       5   250.0

Here is another interesting way of finding null and replacing with a calculated value

    #Creating the DataFrame

    testdf = pd.DataFrame({'Tenure':[1,2,3,4,5],'Monthly':[10,20,30,40,50],'Yearly':[10,40,np.nan,np.nan,250]})
    >>> testdf2
       Monthly  Tenure  Yearly
    0       10       1    10.0
    1       20       2    40.0
    2       30       3     NaN
    3       40       4     NaN
    4       50       5   250.0

    #Identifying the rows with empty columns
    nan_rows = testdf2[testdf2['Yearly'].isnull()]
    >>> nan_rows
       Monthly  Tenure  Yearly
    2       30       3     NaN
    3       40       4     NaN

    #Getting the rows# into a list
    >>> index = list(nan_rows.index)
    >>> index
    [2, 3]

    # Replacing null values with calculated value
    >>> for i in index:
        testdf2['Yearly'][i] = testdf2['Monthly'][i] * testdf2['Tenure'][i]
    >>> testdf2
       Monthly  Tenure  Yearly
    0       10       1    10.0
    1       20       2    40.0
    2       30       3    90.0
    3       40       4   160.0
    4       50       5   250.0

回答 13

我一直在使用以下内容并将其类型转换为字符串并检查nan值

   (str(df.at[index, 'column']) == 'nan')

这使我可以检查序列中的特定值,而不仅仅是返回该值是否包含在序列中。

I’ve been using the following and type casting it to a string and checking for the nan value

   (str(df.at[index, 'column']) == 'nan')

This allows me to check specific value in a series and not just return if this is contained somewhere within the series.


回答 14

或者你可以使用.info()DF,例如:

df.info(null_counts=True) 返回列中的非空行数,例如:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3276314 entries, 0 to 3276313
Data columns (total 10 columns):
n_matches                          3276314 non-null int64
avg_pic_distance                   3276314 non-null float64

Or you can use .info() on the DF such as :

df.info(null_counts=True) which returns the number of non_null rows in a columns such as:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3276314 entries, 0 to 3276313
Data columns (total 10 columns):
n_matches                          3276314 non-null int64
avg_pic_distance                   3276314 non-null float64

回答 15

最好是使用:

df.isna().any().any()

这就是为什么。因此isna()用于定义isnull(),但两者当然是完全相同的。

这甚至比接受的答案还要快,并且涵盖了所有2D熊猫阵列。

The best would be to use:

df.isna().any().any()

Here is why. So isna() is used to define isnull(), but both of these are identical of course.

This is even faster than the accepted answer and covers all 2D panda arrays.


回答 16

import missingno as msno
msno.matrix(df)  # just to visualize. no missing value.

在此处输入图片说明

import missingno as msno
msno.matrix(df)  # just to visualize. no missing value.

enter image description here


回答 17

df.apply(axis=0, func=lambda x : any(pd.isnull(x)))

将检查每个列是否包含Nan。

df.apply(axis=0, func=lambda x : any(pd.isnull(x)))

Will check for each column if it contains Nan or not.


回答 18

我们可以通过使用Seaborn模块热图生成热图来查看数据集中存在的空值

import pandas as pd
import seaborn as sns
dataset=pd.read_csv('train.csv')
sns.heatmap(dataset.isnull(),cbar=False)

We can see the null values present in the dataset by generating heatmap using seaborn moduleheatmap

import pandas as pd
import seaborn as sns
dataset=pd.read_csv('train.csv')
sns.heatmap(dataset.isnull(),cbar=False)

回答 19

您不仅可以检查是否存在“ NaN”,还可以使用以下命令获取每一列中“ NaN”的百分比,

df = pd.DataFrame({'col1':[1,2,3,4,5],'col2':[6,np.nan,8,9,10]})  
df  

   col1 col2  
0   1   6.0  
1   2   NaN  
2   3   8.0  
3   4   9.0  
4   5   10.0  


df.isnull().sum()/len(df)  
col1    0.0  
col2    0.2  
dtype: float64

You could not only check if any ‘NaN’ exist but also get the percentage of ‘NaN’s in each column using the following,

df = pd.DataFrame({'col1':[1,2,3,4,5],'col2':[6,np.nan,8,9,10]})  
df  

   col1 col2  
0   1   6.0  
1   2   NaN  
2   3   8.0  
3   4   9.0  
4   5   10.0  


df.isnull().sum()/len(df)  
col1    0.0  
col2    0.2  
dtype: float64

回答 20

根据要处理的数据类型,您还可以通过将dropna设置为False来在执行EDA时获取每一列的值计数。

for col in df:
   print df[col].value_counts(dropna=False)

对于分类变量,效果很好,当您拥有许多唯一值时,效果不是很好。

Depending on the type of data you’re dealing with, you could also just get the value counts of each column while performing your EDA by setting dropna to False.

for col in df:
   print df[col].value_counts(dropna=False)

Works well for categorical variables, not so much when you have many unique values.


“%matplotlib内联”的目的

问题:“%matplotlib内联”的目的

有人可以向我解释到底有什么用%matplotlib inline吗?

Could someone explain to me what exactly is the use of %matplotlib inline?


回答 0

%matplotlib是IPython中的魔术函数。为了方便起见,我在这里引用相关文档供您阅读:

IPython有一组预定义的“魔术函数”,您可以使用命令行样式的语法来调用它们。有两种魔术,面向行的和面向单元的。换行符以%字符作为前缀,其工作方式与OS命令行调用非常相似:它们作为行的其余部分作为参数,其中的参数传递时不带括号或引号。线魔术可以返回结果,并且可以在作业的右侧使用。单元格魔术的前缀为%%,并且它们是函数,它们不仅作为该行的其余部分作为参数,而且还作为单独的参数作为其下方的行的参数。

%matplotlib inline 将matplotlib的后端设置为’inline’后端

使用此后端,绘图命令的输出将在Jupyter笔记本之类的前端内联显示,直接位于生成它的代码单元下方。然后,生成的图也将存储在笔记本文档中。

使用“内联”后端时,您的matplotlib图将包含在笔记本中代码旁边。还可能值得阅读如何内联制作IPython笔记本matplotlib绘图,以获取有关如何在代码中使用它的参考。

如果你想交互,以及,你可以使用nbagg后端%matplotlib notebook(在IPython中3.X),如所描述这里

%matplotlib is a magic function in IPython. I’ll quote the relevant documentation here for you to read for convenience:

IPython has a set of predefined ‘magic functions’ that you can call with a command line style syntax. There are two kinds of magics, line-oriented and cell-oriented. Line magics are prefixed with the % character and work much like OS command-line calls: they get as an argument the rest of the line, where arguments are passed without parentheses or quotes. Lines magics can return results and can be used in the right hand side of an assignment. Cell magics are prefixed with a double %%, and they are functions that get as an argument not only the rest of the line, but also the lines below it in a separate argument.

%matplotlib inline sets the backend of matplotlib to the ‘inline’ backend:

With this backend, the output of plotting commands is displayed inline within frontends like the Jupyter notebook, directly below the code cell that produced it. The resulting plots will then also be stored in the notebook document.

When using the ‘inline’ backend, your matplotlib graphs will be included in your notebook, next to the code. It may be worth also reading How to make IPython notebook matplotlib plot inline for reference on how to use it in your code.

If you want interactivity as well, you can use the nbagg backend with %matplotlib notebook (in IPython 3.x), as described here.


回答 1

如果您正在运行IPython,它%matplotlib inline将使您的绘图输出出现并存储在笔记本中。

根据文件

要进行设置,matplotlib必须先执行%matplotlib magic command。这将执行必要的幕后设置,以使IPython能够与正确地并行工作matplotlib;但是,它实际上并不执行任何Python导入命令,也就是说,没有名称添加到命名空间。

由IPython提供的一个特别有趣的后端是 inline后端。此功能仅适用于Jupyter Notebook和Jupyter QtConsole。可以按以下方式调用它:

%matplotlib inline

使用此后端,绘图命令的输出将在Jupyter笔记本之类的前端内联显示,直接位于生成它的代码单元下方。然后,生成的图也将存储在笔记本文档中。

Provided you are running IPython, the %matplotlib inline will make your plot outputs appear and be stored within the notebook.

According to documentation

To set this up, before any plotting or import of matplotlib is performed you must execute the %matplotlib magic command. This performs the necessary behind-the-scenes setup for IPython to work correctly hand in hand with matplotlib; it does not, however, actually execute any Python import commands, that is, no names are added to the namespace.

A particularly interesting backend, provided by IPython, is the inline backend. This is available only for the Jupyter Notebook and the Jupyter QtConsole. It can be invoked as follows:

%matplotlib inline

With this backend, the output of plotting commands is displayed inline within frontends like the Jupyter notebook, directly below the code cell that produced it. The resulting plots will then also be stored in the notebook document.


回答 2

如果要将绘图添加到Jupyter笔记本,则%matplotlib inline是标准解决方案。还有其他魔术命令将matplotlib在Jupyter中交互使用。

%matplotlibplt现在任何绘图命令都将导致图形窗口打开,并且可以运行其他命令来更新绘图。某些更改不会自动绘制,以强制更新,使用plt.draw()

%matplotlib notebook:将导致交互式绘图嵌入到笔记本中,您可以缩放图形并调整其大小

%matplotlib inline:仅在笔记本中绘制静态图像

If you want to add plots to your Jupyter notebook, then %matplotlib inline is a standard solution. And there are other magic commands will use matplotlib interactively within Jupyter.

%matplotlib: any plt plot command will now cause a figure window to open, and further commands can be run to update the plot. Some changes will not draw automatically, to force an update, use plt.draw()

%matplotlib notebook: will lead to interactive plots embedded within the notebook, you can zoom and resize the figure

%matplotlib inline: only draw static images in the notebook


回答 3

从IPython 5.0和matplotlib 2.0开始,您可以避免使用IPython的特定魔术,而使用 matplotlib.pyplot.ion()/matplotlib.pyplot.ioff()具有在IPython之外工作的优点。

ipython文档

Starting with IPython 5.0 and matplotlib 2.0 you can avoid the use of IPython’s specific magic and use matplotlib.pyplot.ion()/matplotlib.pyplot.ioff() which have the advantages of working outside of IPython as well.

ipython docs


回答 4

如果您不知道后端是什么,可以阅读以下内容:https : //matplotlib.org/tutorials/introductory/usage.html#backends

有些人从python shell交互地使用matplotlib,并且在键入命令时弹出绘图窗口。有些人运行Jupyter笔记本并绘制内联图以进行快速数据分析。其他人则将matplotlib嵌入到wxpython或pygtk等图形用户界面中,以构建丰富的应用程序。有些人在批处理脚本中使用matplotlib从数值模拟生成后记图像,还有一些人运行Web应用程序服务器来动态提供图形。为了支持所有这些用例,matplotlib可以针对不同的输出,这些功能中的每一个都称为后端。“前端”是用户面对的代码,即绘图代码,而“后端”则是幕后的所有艰苦工作以制作图形。

因此,当您键入%matplotlib inline时,它将激活内联后端。如前几篇文章所述:

使用此后端,绘图命令的输出将在Jupyter笔记本之类的前端内联显示,直接位于生成它的代码单元下方。然后,生成的图也将存储在笔记本文档中。

If you don’t know what backend is , you can read this: https://matplotlib.org/tutorials/introductory/usage.html#backends

Some people use matplotlib interactively from the python shell and have plotting windows pop up when they type commands. Some people run Jupyter notebooks and draw inline plots for quick data analysis. Others embed matplotlib into graphical user interfaces like wxpython or pygtk to build rich applications. Some people use matplotlib in batch scripts to generate postscript images from numerical simulations, and still others run web application servers to dynamically serve up graphs. To support all of these use cases, matplotlib can target different outputs, and each of these capabilities is called a backend; the “frontend” is the user facing code, i.e., the plotting code, whereas the “backend” does all the hard work behind-the-scenes to make the figure.

So when you type %matplotlib inline , it activates the inline backend. As discussed in the previous posts :

With this backend, the output of plotting commands is displayed inline within frontends like the Jupyter notebook, directly below the code cell that produced it. The resulting plots will then also be stored in the notebook document.


回答 5

这只是意味着我们作为代码一部分创建的任何图形都将出现在同一笔记本中,而不是在单独的窗口中出现,如果我们不使用此魔术语句,则该窗口将发生。

It just means that any graph which we are creating as a part of our code will appear in the same notebook and not in separate window which would happen if we have not used this magic statement.


回答 6

解释清楚:

如果您不喜欢这样:

在此处输入图片说明

%matplotlib inline

在此处输入图片说明

然后在jupyter笔记本中保存它。

To explain it clear:

If you don’t like it like this:

enter image description here

add %matplotlib inline

enter image description here

and there you have it in your jupyter notebook.


回答 7

TL; DR

%matplotlib inline -显示输出内联


IPython内核具有通过执行代码来显示图的功能。IPython内核旨在与matplotlib绘图库无缝协作以提供此功能。

%matplotlib是一个魔术命令,它执行必要的幕后设置,以使IPython与IPython紧密配合matplotlib。它不执行任何Python导入命令,即没有名称添加到命名空间。

在单独的窗口中显示输出

%matplotlib

内联显示输出

(仅适用于Jupyter Notebook和Jupyter QtConsole)

%matplotlib inline

与交互式后端一起显示

(有效值'GTK3Agg', 'GTK3Cairo', 'MacOSX', 'nbAgg', 'Qt4Agg', 'Qt4Cairo', 'Qt5Agg', 'Qt5Cairo', 'TkAgg', 'TkCairo', 'WebAgg', 'WX', 'WXAgg', 'WXCairo', 'agg', 'cairo', 'pdf', 'pgf', 'ps', 'svg', 'template'

%matplotlib gtk

示例-GTK3Agg-对GTK 3.x画布的Agg渲染(需要PyGObject和pycairo或cairocffi)。

有关matplotlib交互式后端的更多详细信息:此处


与开始IPython 5.0matplotlib 2.0你能避免使用IPython中的特殊魔法和使用matplotlib.pyplot.ion()/ matplotlib.pyplot.ioff() 其中有工作的IPython之外还有优势。

参考:IPython丰富的输出-交互式绘图

TL;DR

%matplotlib inline – Displays output inline


IPython kernel has the ability to display plots by executing code. The IPython kernel is designed to work seamlessly with the matplotlib plotting library to provide this functionality.

%matplotlib is a magic command which performs the necessary behind-the-scenes setup for IPython to work correctly hand-in-hand with matplotlib; it does not execute any Python import commands, that is, no names are added to the namespace.

Display output in separate window

%matplotlib

Display output inline

(available only for the Jupyter Notebook and the Jupyter QtConsole)

%matplotlib inline

Display with interactive backends

(valid values 'GTK3Agg', 'GTK3Cairo', 'MacOSX', 'nbAgg', 'Qt4Agg', 'Qt4Cairo', 'Qt5Agg', 'Qt5Cairo', 'TkAgg', 'TkCairo', 'WebAgg', 'WX', 'WXAgg', 'WXCairo', 'agg', 'cairo', 'pdf', 'pgf', 'ps', 'svg', 'template')

%matplotlib gtk

Example – GTK3Agg – An Agg rendering to a GTK 3.x canvas (requires PyGObject and pycairo or cairocffi).

More details about matplotlib interactive backends: here


Starting with IPython 5.0 and matplotlib 2.0 you can avoid the use of IPython’s specific magic and use matplotlib.pyplot.ion()/matplotlib.pyplot.ioff() which have the advantages of working outside of IPython as well.

Refer: IPython Rich Output – Interactive Plotting


回答 8

如果您正在运行Jupyter Notebook,%matplotlib内联命令将使您的绘图输出出现在笔记本中,也可以存储。

Provided you are running Jupyter Notebook, the %matplotlib inline command will make your plot outputs appear in the notebook, also can be stored.


回答 9

不必写那个。没有(%matplotlib)魔术功能,对我来说效果很好。我正在使用Sypder编译器,这是Anaconda随附的。

It is not mandatory to write that. It worked fine for me without (%matplotlib) magic function. I am using Sypder compiler, one that comes with in Anaconda.


如何获取NumPy数组中N个最大值的索引?

问题:如何获取NumPy数组中N个最大值的索引?

NumPy提出了一种通过来获取数组最大值的索引的方法np.argmax

我想要类似的事情,但是返回N最大值的索引。

例如,如果我有一个数组,[1, 3, 2, 4, 5]function(array, n=3)将返回的索引[4, 3, 1]相对应的元素[5, 4, 3]

NumPy proposes a way to get the index of the maximum value of an array via np.argmax.

I would like a similar thing, but returning the indexes of the N maximum values.

For instance, if I have an array, [1, 3, 2, 4, 5], function(array, n=3) would return the indices [4, 3, 1] which correspond to the elements [5, 4, 3].


回答 0

我想出的最简单的方法是:

In [1]: import numpy as np

In [2]: arr = np.array([1, 3, 2, 4, 5])

In [3]: arr.argsort()[-3:][::-1]
Out[3]: array([4, 3, 1])

这涉及数组的完整排序。我想知道是否numpy提供了一种进行部分排序的内置方法。到目前为止,我还没有找到一个。

如果这种解决方案太慢(尤其是对于小型解决方案n),则可能值得在Cython编写代码

The simplest I’ve been able to come up with is:

In [1]: import numpy as np

In [2]: arr = np.array([1, 3, 2, 4, 5])

In [3]: arr.argsort()[-3:][::-1]
Out[3]: array([4, 3, 1])

This involves a complete sort of the array. I wonder if numpy provides a built-in way to do a partial sort; so far I haven’t been able to find one.

If this solution turns out to be too slow (especially for small n), it may be worth looking at coding something up in Cython.


回答 1

较新的NumPy版本(1.8及更高版本)具有argpartition为此要求的功能。要获取四个最大元素的索引,请执行

>>> a = np.array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> a
array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> ind = np.argpartition(a, -4)[-4:]
>>> ind
array([1, 5, 8, 0])
>>> a[ind]
array([4, 9, 6, 9])

与之不同的是argsort,此函数在最坏的情况下以线性时间运行,但是返回的索引未排序,从评估结果可以看出a[ind]。如果您也需要它,请对它们进行排序:

>>> ind[np.argsort(a[ind])]
array([1, 8, 5, 0])

要以这种方式获得排序前k个元素,需要O(n + k log k)时间。

Newer NumPy versions (1.8 and up) have a function called argpartition for this. To get the indices of the four largest elements, do

>>> a = np.array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> a
array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> ind = np.argpartition(a, -4)[-4:]
>>> ind
array([1, 5, 8, 0])
>>> a[ind]
array([4, 9, 6, 9])

Unlike argsort, this function runs in linear time in the worst case, but the returned indices are not sorted, as can be seen from the result of evaluating a[ind]. If you need that too, sort them afterwards:

>>> ind[np.argsort(a[ind])]
array([1, 8, 5, 0])

To get the top-k elements in sorted order in this way takes O(n + k log k) time.


回答 2

更简单了:

idx = (-arr).argsort()[:n]

其中,n是最大值的数量。

Simpler yet:

idx = (-arr).argsort()[:n]

where n is the number of maximum values.


回答 3

采用:

>>> import heapq
>>> import numpy
>>> a = numpy.array([1, 3, 2, 4, 5])
>>> heapq.nlargest(3, range(len(a)), a.take)
[4, 3, 1]

对于常规的Python列表:

>>> a = [1, 3, 2, 4, 5]
>>> heapq.nlargest(3, range(len(a)), a.__getitem__)
[4, 3, 1]

如果您使用Python 2,请使用xrange代替range

来源:heapq —堆队列算法

Use:

>>> import heapq
>>> import numpy
>>> a = numpy.array([1, 3, 2, 4, 5])
>>> heapq.nlargest(3, range(len(a)), a.take)
[4, 3, 1]

For regular Python lists:

>>> a = [1, 3, 2, 4, 5]
>>> heapq.nlargest(3, range(len(a)), a.__getitem__)
[4, 3, 1]

If you use Python 2, use xrange instead of range.

Source: heapq — Heap queue algorithm


回答 4

如果碰巧正在使用多维数组,则需要展平和分解索引:

def largest_indices(ary, n):
    """Returns the n largest indices from a numpy array."""
    flat = ary.flatten()
    indices = np.argpartition(flat, -n)[-n:]
    indices = indices[np.argsort(-flat[indices])]
    return np.unravel_index(indices, ary.shape)

例如:

>>> xs = np.sin(np.arange(9)).reshape((3, 3))
>>> xs
array([[ 0.        ,  0.84147098,  0.90929743],
       [ 0.14112001, -0.7568025 , -0.95892427],
       [-0.2794155 ,  0.6569866 ,  0.98935825]])
>>> largest_indices(xs, 3)
(array([2, 0, 0]), array([2, 2, 1]))
>>> xs[largest_indices(xs, 3)]
array([ 0.98935825,  0.90929743,  0.84147098])

If you happen to be working with a multidimensional array then you’ll need to flatten and unravel the indices:

def largest_indices(ary, n):
    """Returns the n largest indices from a numpy array."""
    flat = ary.flatten()
    indices = np.argpartition(flat, -n)[-n:]
    indices = indices[np.argsort(-flat[indices])]
    return np.unravel_index(indices, ary.shape)

For example:

>>> xs = np.sin(np.arange(9)).reshape((3, 3))
>>> xs
array([[ 0.        ,  0.84147098,  0.90929743],
       [ 0.14112001, -0.7568025 , -0.95892427],
       [-0.2794155 ,  0.6569866 ,  0.98935825]])
>>> largest_indices(xs, 3)
(array([2, 0, 0]), array([2, 2, 1]))
>>> xs[largest_indices(xs, 3)]
array([ 0.98935825,  0.90929743,  0.84147098])

回答 5

如果您不在乎可以使用的第K个最大元素的顺序,则argpartition它们的性能应比完整排序要好argsort

K = 4 # We want the indices of the four largest values
a = np.array([0, 8, 0, 4, 5, 8, 8, 0, 4, 2])
np.argpartition(a,-K)[-K:]
array([4, 1, 5, 6])

学分到这个问题

我进行了一些测试,随着数组的大小和K值的增加,它的argpartition表现似乎都胜过argsort了。

If you don’t care about the order of the K-th largest elements you can use argpartition, which should perform better than a full sort through argsort.

K = 4 # We want the indices of the four largest values
a = np.array([0, 8, 0, 4, 5, 8, 8, 0, 4, 2])
np.argpartition(a,-K)[-K:]
array([4, 1, 5, 6])

Credits go to this question.

I ran a few tests and it looks like argpartition outperforms argsort as the size of the array and the value of K increase.


回答 6

对于多维数组,可以使用axis关键字以沿期望的轴应用分区。

# For a 2D array
indices = np.argpartition(arr, -N, axis=1)[:, -N:]

对于抓取物品:

x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)

但是请注意,这不会返回排序结果。在这种情况下,您可以np.argsort()沿预期的轴使用:

indices = np.argsort(arr, axis=1)[:, -N:]

# Result
x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)

这是一个例子:

In [42]: a = np.random.randint(0, 20, (10, 10))

In [44]: a
Out[44]:
array([[ 7, 11, 12,  0,  2,  3,  4, 10,  6, 10],
       [16, 16,  4,  3, 18,  5, 10,  4, 14,  9],
       [ 2,  9, 15, 12, 18,  3, 13, 11,  5, 10],
       [14,  0,  9, 11,  1,  4,  9, 19, 18, 12],
       [ 0, 10,  5, 15,  9, 18,  5,  2, 16, 19],
       [14, 19,  3, 11, 13, 11, 13, 11,  1, 14],
       [ 7, 15, 18,  6,  5, 13,  1,  7,  9, 19],
       [11, 17, 11, 16, 14,  3, 16,  1, 12, 19],
       [ 2,  4, 14,  8,  6,  9, 14,  9,  1,  5],
       [ 1, 10, 15,  0,  1,  9, 18,  2,  2, 12]])

In [45]: np.argpartition(a, np.argmin(a, axis=0))[:, 1:] # 1 is because the first item is the minimum one.
Out[45]:
array([[4, 5, 6, 8, 0, 7, 9, 1, 2],
       [2, 7, 5, 9, 6, 8, 1, 0, 4],
       [5, 8, 1, 9, 7, 3, 6, 2, 4],
       [4, 5, 2, 6, 3, 9, 0, 8, 7],
       [7, 2, 6, 4, 1, 3, 8, 5, 9],
       [2, 3, 5, 7, 6, 4, 0, 9, 1],
       [4, 3, 0, 7, 8, 5, 1, 2, 9],
       [5, 2, 0, 8, 4, 6, 3, 1, 9],
       [0, 1, 9, 4, 3, 7, 5, 2, 6],
       [0, 4, 7, 8, 5, 1, 9, 2, 6]])

In [46]: np.argpartition(a, np.argmin(a, axis=0))[:, -3:]
Out[46]:
array([[9, 1, 2],
       [1, 0, 4],
       [6, 2, 4],
       [0, 8, 7],
       [8, 5, 9],
       [0, 9, 1],
       [1, 2, 9],
       [3, 1, 9],
       [5, 2, 6],
       [9, 2, 6]])

In [89]: a[np.repeat(np.arange(x), 3), ind.ravel()].reshape(x, 3)
Out[89]:
array([[10, 11, 12],
       [16, 16, 18],
       [13, 15, 18],
       [14, 18, 19],
       [16, 18, 19],
       [14, 14, 19],
       [15, 18, 19],
       [16, 17, 19],
       [ 9, 14, 14],
       [12, 15, 18]])

For multidimensional arrays you can use the axis keyword in order to apply the partitioning along the expected axis.

# For a 2D array
indices = np.argpartition(arr, -N, axis=1)[:, -N:]

And for grabbing the items:

x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)

But note that this won’t return a sorted result. In that case you can use np.argsort() along the intended axis:

indices = np.argsort(arr, axis=1)[:, -N:]

# Result
x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)

Here is an example:

In [42]: a = np.random.randint(0, 20, (10, 10))

In [44]: a
Out[44]:
array([[ 7, 11, 12,  0,  2,  3,  4, 10,  6, 10],
       [16, 16,  4,  3, 18,  5, 10,  4, 14,  9],
       [ 2,  9, 15, 12, 18,  3, 13, 11,  5, 10],
       [14,  0,  9, 11,  1,  4,  9, 19, 18, 12],
       [ 0, 10,  5, 15,  9, 18,  5,  2, 16, 19],
       [14, 19,  3, 11, 13, 11, 13, 11,  1, 14],
       [ 7, 15, 18,  6,  5, 13,  1,  7,  9, 19],
       [11, 17, 11, 16, 14,  3, 16,  1, 12, 19],
       [ 2,  4, 14,  8,  6,  9, 14,  9,  1,  5],
       [ 1, 10, 15,  0,  1,  9, 18,  2,  2, 12]])

In [45]: np.argpartition(a, np.argmin(a, axis=0))[:, 1:] # 1 is because the first item is the minimum one.
Out[45]:
array([[4, 5, 6, 8, 0, 7, 9, 1, 2],
       [2, 7, 5, 9, 6, 8, 1, 0, 4],
       [5, 8, 1, 9, 7, 3, 6, 2, 4],
       [4, 5, 2, 6, 3, 9, 0, 8, 7],
       [7, 2, 6, 4, 1, 3, 8, 5, 9],
       [2, 3, 5, 7, 6, 4, 0, 9, 1],
       [4, 3, 0, 7, 8, 5, 1, 2, 9],
       [5, 2, 0, 8, 4, 6, 3, 1, 9],
       [0, 1, 9, 4, 3, 7, 5, 2, 6],
       [0, 4, 7, 8, 5, 1, 9, 2, 6]])

In [46]: np.argpartition(a, np.argmin(a, axis=0))[:, -3:]
Out[46]:
array([[9, 1, 2],
       [1, 0, 4],
       [6, 2, 4],
       [0, 8, 7],
       [8, 5, 9],
       [0, 9, 1],
       [1, 2, 9],
       [3, 1, 9],
       [5, 2, 6],
       [9, 2, 6]])

In [89]: a[np.repeat(np.arange(x), 3), ind.ravel()].reshape(x, 3)
Out[89]:
array([[10, 11, 12],
       [16, 16, 18],
       [13, 15, 18],
       [14, 18, 19],
       [16, 18, 19],
       [14, 14, 19],
       [15, 18, 19],
       [16, 17, 19],
       [ 9, 14, 14],
       [12, 15, 18]])

回答 7

这将比完整排序要快,具体取决于原始数组的大小和所选内容的大小:

>>> A = np.random.randint(0,10,10)
>>> A
array([5, 1, 5, 5, 2, 3, 2, 4, 1, 0])
>>> B = np.zeros(3, int)
>>> for i in xrange(3):
...     idx = np.argmax(A)
...     B[i]=idx; A[idx]=0 #something smaller than A.min()
...     
>>> B
array([0, 2, 3])

当然,它涉及篡改原始阵列。您可以通过复制或替换原始值来解决(如果需要)的问题。…以您的使用案例中较便宜的价格为准

This will be faster than a full sort depending on the size of your original array and the size of your selection:

>>> A = np.random.randint(0,10,10)
>>> A
array([5, 1, 5, 5, 2, 3, 2, 4, 1, 0])
>>> B = np.zeros(3, int)
>>> for i in xrange(3):
...     idx = np.argmax(A)
...     B[i]=idx; A[idx]=0 #something smaller than A.min()
...     
>>> B
array([0, 2, 3])

It, of course, involves tampering with your original array. Which you could fix (if needed) by making a copy or replacing back the original values. …whichever is cheaper for your use case.


回答 8

方法np.argpartition仅返回k个最大的索引,执行局部排序,并且比np.argsort数组很大时要快(执行完整排序)。但是返回的索引不是按升序/降序排列的。让我们举一个例子:

在此处输入图片说明

我们可以看到,如果您要对前k个索引使用严格的升序,np.argpartition则不会返回您想要的结果。

除了在np.argpartition之后手动进行排序之外,我的解决方案是使用PyTorch(torch.topk一种用于神经网络构建的工具),为类似NumPy的API提供CPU和GPU支持。它与带有MKL的NumPy一样快,并且如果需要大型矩阵/矢量计算,则可以提供GPU增强。

严格的上升/下降前k个索引代码将是:

在此处输入图片说明

请注意,它torch.topk接受火炬张量,并返回type中的前k个值和前k个索引torch.Tensor。与np相似,torch.topk也接受轴参数,以便您可以处理多维数组/张量。

Method np.argpartition only returns the k largest indices, performs a local sort, and is faster than np.argsort(performing a full sort) when array is quite large. But the returned indices are NOT in ascending/descending order. Let’s say with an example:

Enter image description here

We can see that if you want a strict ascending order top k indices, np.argpartition won’t return what you want.

Apart from doing a sort manually after np.argpartition, my solution is to use PyTorch, torch.topk, a tool for neural network construction, providing NumPy-like APIs with both CPU and GPU support. It’s as fast as NumPy with MKL, and offers a GPU boost if you need large matrix/vector calculations.

Strict ascend/descend top k indices code will be:

Enter image description here

Note that torch.topk accepts a torch tensor, and returns both top k values and top k indices in type torch.Tensor. Similar with np, torch.topk also accepts an axis argument so that you can handle multi-dimensional arrays/tensors.


回答 9

采用:

from operator import itemgetter
from heapq import nlargest
result = nlargest(N, enumerate(your_list), itemgetter(1))

现在,result列表将包含N个元组(indexvalue),其中value已最大化。

Use:

from operator import itemgetter
from heapq import nlargest
result = nlargest(N, enumerate(your_list), itemgetter(1))

Now the result list would contain N tuples (index, value) where value is maximized.


回答 10

采用:

def max_indices(arr, k):
    '''
    Returns the indices of the k first largest elements of arr
    (in descending order in values)
    '''
    assert k <= arr.size, 'k should be smaller or equal to the array size'
    arr_ = arr.astype(float)  # make a copy of arr
    max_idxs = []
    for _ in range(k):
        max_element = np.max(arr_)
        if np.isinf(max_element):
            break
        else:
            idx = np.where(arr_ == max_element)
        max_idxs.append(idx)
        arr_[idx] = -np.inf
    return max_idxs

它也适用于2D阵列。例如,

In [0]: A = np.array([[ 0.51845014,  0.72528114],
                     [ 0.88421561,  0.18798661],
                     [ 0.89832036,  0.19448609],
                     [ 0.89832036,  0.19448609]])
In [1]: max_indices(A, 8)
Out[1]:
    [(array([2, 3], dtype=int64), array([0, 0], dtype=int64)),
     (array([1], dtype=int64), array([0], dtype=int64)),
     (array([0], dtype=int64), array([1], dtype=int64)),
     (array([0], dtype=int64), array([0], dtype=int64)),
     (array([2, 3], dtype=int64), array([1, 1], dtype=int64)),
     (array([1], dtype=int64), array([1], dtype=int64))]

In [2]: A[max_indices(A, 8)[0]][0]
Out[2]: array([ 0.89832036])

Use:

def max_indices(arr, k):
    '''
    Returns the indices of the k first largest elements of arr
    (in descending order in values)
    '''
    assert k <= arr.size, 'k should be smaller or equal to the array size'
    arr_ = arr.astype(float)  # make a copy of arr
    max_idxs = []
    for _ in range(k):
        max_element = np.max(arr_)
        if np.isinf(max_element):
            break
        else:
            idx = np.where(arr_ == max_element)
        max_idxs.append(idx)
        arr_[idx] = -np.inf
    return max_idxs

It also works with 2D arrays. For example,

In [0]: A = np.array([[ 0.51845014,  0.72528114],
                     [ 0.88421561,  0.18798661],
                     [ 0.89832036,  0.19448609],
                     [ 0.89832036,  0.19448609]])
In [1]: max_indices(A, 8)
Out[1]:
    [(array([2, 3], dtype=int64), array([0, 0], dtype=int64)),
     (array([1], dtype=int64), array([0], dtype=int64)),
     (array([0], dtype=int64), array([1], dtype=int64)),
     (array([0], dtype=int64), array([0], dtype=int64)),
     (array([2, 3], dtype=int64), array([1, 1], dtype=int64)),
     (array([1], dtype=int64), array([1], dtype=int64))]

In [2]: A[max_indices(A, 8)[0]][0]
Out[2]: array([ 0.89832036])

回答 11

bottleneck 如果仅为了获得N个最大值而对整个数组进行排序的开销太大,则具有部分排序函数。

我对这个模块一无所知。我只是谷歌搜索numpy partial sort

bottleneck has a partial sort function, if the expense of sorting the entire array just to get the N largest values is too great.

I know nothing about this module; I just googled numpy partial sort.


回答 12

以下是查看最大元素及其位置的非常简单的方法。这axis是域;axis= 0表示按列最大数量,而axis1表示2D情况下按行最大数量。对于更大的尺寸,则取决于您。

M = np.random.random((3, 4))
print(M)
print(M.max(axis=1), M.argmax(axis=1))

The following is a very easy way to see the maximum elements and its positions. Here axis is the domain; axis = 0 means column wise maximum number and axis = 1 means row wise max number for the 2D case. And for higher dimensions it depends upon you.

M = np.random.random((3, 4))
print(M)
print(M.max(axis=1), M.argmax(axis=1))

回答 13

我发现使用起来最直观np.unique

这个想法是,唯一方法返回输入值的索引。然后,根据最大唯一值和指标,可以重新创建原始值的位置。

multi_max = [1,1,2,2,4,0,0,4]
uniques, idx = np.unique(multi_max, return_inverse=True)
print np.squeeze(np.argwhere(idx == np.argmax(uniques)))
>> [4 7]

I found it most intuitive to use np.unique.

The idea is, that the unique method returns the indices of the input values. Then from the max unique value and the indicies, the position of the original values can be recreated.

multi_max = [1,1,2,2,4,0,0,4]
uniques, idx = np.unique(multi_max, return_inverse=True)
print np.squeeze(np.argwhere(idx == np.argmax(uniques)))
>> [4 7]

回答 14

我认为,最省时的方法是手动遍历数组,并保持k大小的最小堆大小,正如其他人提到的那样。

我还提出了一种蛮力方法:

top_k_index_list = [ ]
for i in range(k):
    top_k_index_list.append(np.argmax(my_array))
    my_array[top_k_index_list[-1]] = -float('inf')

在使用argmax获取其索引之后,将最大元素设置为较大的负值。然后下一次调用argmax将返回第二大元素。您可以记录这些元素的原始值,并根据需要恢复它们。

I think the most time efficiency way is manually iterate through the array and keep a k-size min-heap, as other people have mentioned.

And I also come up with a brute force approach:

top_k_index_list = [ ]
for i in range(k):
    top_k_index_list.append(np.argmax(my_array))
    my_array[top_k_index_list[-1]] = -float('inf')

Set the largest element to a large negative value after you use argmax to get its index. And then the next call of argmax will return the second largest element. And you can log the original value of these elements and recover them if you want.


回答 15

这段代码适用于numpy矩阵数组:

mat = np.array([[1, 3], [2, 5]]) # numpy matrix

n = 2  # n
n_largest_mat = np.sort(mat, axis=None)[-n:] # n_largest 
tf_n_largest = np.zeros((2,2), dtype=bool) # all false matrix
for x in n_largest_mat: 
  tf_n_largest = (tf_n_largest) | (mat == x) # true-false  

n_largest_elems = mat[tf_n_largest] # true-false indexing 

这会产生一个真假n_largest矩阵索引,该索引也可以从矩阵数组中提取n_largest个元素

This code works for a numpy matrix array:

mat = np.array([[1, 3], [2, 5]]) # numpy matrix

n = 2  # n
n_largest_mat = np.sort(mat, axis=None)[-n:] # n_largest 
tf_n_largest = np.zeros((2,2), dtype=bool) # all false matrix
for x in n_largest_mat: 
  tf_n_largest = (tf_n_largest) | (mat == x) # true-false  

n_largest_elems = mat[tf_n_largest] # true-false indexing 

This produces a true-false n_largest matrix indexing that also works to extract n_largest elements from a matrix array


为什么python在for和while循环之后使用’else’?

问题:为什么python在for和while循环之后使用’else’?

我了解此构造的工作原理:

for i in range(10):
    print(i)

    if i == 9:
        print("Too big - I'm giving up!")
        break;
else:
    print("Completed successfully")

但是我不明白为什么else在这里使用它作为关键字,因为这表明有问题的代码仅在for块未完成时才运行,这与它的工作相反!无论我如何考虑,我的大脑都无法从for陈述到else障碍的无缝发展。对我来说,continue还是continuewith更有意义(我正在尝试训练自己这样阅读)。

我想知道Python编码人员是如何在头脑中读取这个结构的(如果愿意,可以大声读出)。也许我缺少使这些代码块更容易理解的东西?

I understand how this construct works:

for i in range(10):
    print(i)

    if i == 9:
        print("Too big - I'm giving up!")
        break;
else:
    print("Completed successfully")

But I don’t understand why else is used as the keyword here, since it suggests the code in question only runs if the for block does not complete, which is the opposite of what it does! No matter how I think about it, my brain can’t progress seamlessly from the for statement to the else block. To me, continue or continuewith would make more sense (and I’m trying to train myself to read it as such).

I’m wondering how Python coders read this construct in their head (or aloud, if you like). Perhaps I’m missing something that would make such code blocks more easily decipherable?


回答 0

即使是经验丰富的Python程序员,这也是一个奇怪的构造。当与for循环结合使用时,它的基本含义是“在可迭代项中找到某个项目,否则,如果找不到任何项目,则执行…”。如:

found_obj = None
for obj in objects:
    if obj.key == search_key:
        found_obj = obj
        break
else:
    print('No object found.')

但是,只要您看到此构造,一个更好的选择就是将搜索封装在一个函数中:

def find_obj(search_key):
    for obj in objects:
        if obj.key == search_key:
            return obj

或使用列表理解:

matching_objs = [o for o in objects if o.key == search_key]
if matching_objs:
    print('Found {}'.format(matching_objs[0]))
else:
    print('No object found.')

它在语义上不等同于其他两个版本,但是在非性能关键代码中效果很好,在这里,您是否迭代整个列表都没有关系。其他人可能会不同意,但是我个人会避免在生产代码中使用for-else或while-else块。

另请参见[Python-思想] for … else线程的摘要

It’s a strange construct even to seasoned Python coders. When used in conjunction with for-loops it basically means “find some item in the iterable, else if none was found do …”. As in:

found_obj = None
for obj in objects:
    if obj.key == search_key:
        found_obj = obj
        break
else:
    print('No object found.')

But anytime you see this construct, a better alternative is to either encapsulate the search in a function:

def find_obj(search_key):
    for obj in objects:
        if obj.key == search_key:
            return obj

Or use a list comprehension:

matching_objs = [o for o in objects if o.key == search_key]
if matching_objs:
    print('Found {}'.format(matching_objs[0]))
else:
    print('No object found.')

It is not semantically equivalent to the other two versions, but works good enough in non-performance critical code where it doesn’t matter whether you iterate the whole list or not. Others may disagree, but I personally would avoid ever using the for-else or while-else blocks in production code.

See also [Python-ideas] Summary of for…else threads


回答 1

一个常见的构造是运行一个循环,直到找到某些东西,然后打破循环。问题是,如果我跳出循环或循环结束,则需要确定发生哪种情况。一种方法是创建一个标志或存储变量,这将使我进行第二次测试以查看循环是如何退出的。

例如,假设我需要搜索列表并处理每个项目,直到找到标记项目,然后停止处理。如果缺少标志项,则需要引发异常。

使用Python forelse构造

for i in mylist:
    if i == theflag:
        break
    process(i)
else:
    raise ValueError("List argument missing terminal flag.")

将此与不使用此语法糖的方法进行比较:

flagfound = False
for i in mylist:
    if i == theflag:
        flagfound = True
        break
    process(i)

if not flagfound:
    raise ValueError("List argument missing terminal flag.")

在第一种情况下,raise紧密绑定到它所使用的for循环。第二,绑定不那么牢固,并且在维护期间可能会引入错误。

A common construct is to run a loop until something is found and then to break out of the loop. The problem is that if I break out of the loop or the loop ends I need to determine which case happened. One method is to create a flag or store variable that will let me do a second test to see how the loop was exited.

For example assume that I need to search through a list and process each item until a flag item is found and then stop processing. If the flag item is missing then an exception needs to be raised.

Using the Python forelse construct you have

for i in mylist:
    if i == theflag:
        break
    process(i)
else:
    raise ValueError("List argument missing terminal flag.")

Compare this to a method that does not use this syntactic sugar:

flagfound = False
for i in mylist:
    if i == theflag:
        flagfound = True
        break
    process(i)

if not flagfound:
    raise ValueError("List argument missing terminal flag.")

In the first case the raise is bound tightly to the for loop it works with. In the second the binding is not as strong and errors may be introduced during maintenance.


回答 2

Raymond Hettinger的精彩演讲名为“ 将代码转换为美丽的惯用Python”,在其中他简要介绍了该for ... else构造的历史。相关部分是“在循环中区分多个出口点”,从15:50开始,持续大约三分钟。这里是要点:

  • for ... else结构是由Donald Knuth设计的,用于替换某些GOTO用例。
  • 重用该else关键字是有道理的,因为“这是Knuth所使用的,那时人们知道,所有[ for语句]都嵌入了an ifGOTOunder,而他们期望使用else;”。
  • 事后看来,它应该被称为“不间断”(或可能称为“不间断”),这样就不会造成混淆。*

因此,如果问题是“他们为什么不更改此关键字?” 那么Cat Plus Plus可能给出了最准确的答案 –在这一点上,它对现有代码的破坏性太大,无法实用。但是,如果您真正要问的问题是为什么else首先要重用,那么显然在当时看来是个好主意。

就个人而言,我喜欢# no break在线注释的妥协之else处,因为它们一眼就可能被误认为属于循环内。相当清晰简洁。Bjorn在回答结束时链接的摘要中简要提及了该选项:

为了完整起见,我应该提到的是,语法稍有变化,想要这种语法的程序员现在可以使用它:

for item in sequence:
    process(item)
else:  # no break
    suite

*视频那部分的奖励语录:“就像我们调用lambda makefunction一样,没人会问’lambda做什么?’”

There’s an excellent presentation by Raymond Hettinger, titled Transforming Code into Beautiful, Idiomatic Python, in which he briefly addresses the history of the for ... else construct. The relevant section is “Distinguishing multiple exit points in loops” starting at 15:50 and continuing for about three minutes. Here are the high points:

  • The for ... else construct was devised by Donald Knuth as a replacement for certain GOTO use cases;
  • Reusing the else keyword made sense because “it’s what Knuth used, and people knew, at that time, all [for statements] had embedded an if and GOTO underneath, and they expected the else;”
  • In hindsight, it should have been called “no break” (or possibly “nobreak”), and then it wouldn’t be confusing.*

So, if the question is, “Why don’t they change this keyword?” then Cat Plus Plus probably gave the most accurate answer – at this point, it would be too destructive to existing code to be practical. But if the question you’re really asking is why else was reused in the first place, well, apparently it seemed like a good idea at the time.

Personally, I like the compromise of commenting # no break in-line wherever the else could be mistaken, at a glance, as belonging inside the loop. It’s reasonably clear and concise. This option gets a brief mention in the summary that Bjorn linked at the end of his answer:

For completeness, I should mention that with a slight change in syntax, programmers who want this syntax can have it right now:

for item in sequence:
    process(item)
else:  # no break
    suite

* Bonus quote from that part of the video: “Just like if we called lambda makefunction, nobody would ask, ‘What does lambda do?'”


回答 3

因为他们不想在语言中引入新的关键字。每个人都窃取一个标识符并引起向后兼容性问题,因此通常是最后的选择。

Because they didn’t want to introduce a new keyword to the language. Each one steals an identifier and causes backwards compatibility problems, so it’s usually a last resort.


回答 4

简单起见,您可以这样想:

  • 如果breakfor循环中遇到命令,else则不会调用该部件。
  • 如果breakfor循环中未遇到该命令,else则将调用该部件。

换句话说,如果for循环迭代未被破坏breakelse则将调用该部分。

To make it simple, you can think of it like that;

  • If it encounters the break command in the for loop, the else part will not be called.
  • If it does not encounter the break command in the for loop, the else part will be called.

In other words, if for loop iteration is not “broken” with break, the else part will be called.


回答 5

我发现“了解” for / else所做的事情,最重要的是,何时使用它,最简单的方法是专注于break语句跳转到的位置。For / else构造是单个块。中断从块中跳出,因此跳过“ else”子句。如果else子句的内容仅位于for子句之后,则它将永远不会被跳过,因此必须通过将其放在if中来提供等效的逻辑。之前已经说过,但用这些话还不够,所以可能会对其他人有所帮助。尝试运行以下代码片段。为了澄清,我全心全意地赞成“不间断”的评论。

for a in range(3):
    print(a)
    if a==4: # change value to force break or not
        break
else: #no break  +10 for whoever thought of this decoration
    print('for completed OK')

print('statement after for loop')

The easiest way I found to ‘get’ what the for/else did, and more importantly, when to use it, was to concentrate on where the break statement jumps to. The For/else construct is a single block. The break jumps out of the block, and so jumps ‘over’ the else clause. If the contents of the else clause simply followed the for clause, it would never be jumped over, and so the equivalent logic would have to be provided by putting it in an if. This has been said before, but not quite in these words, so it may help somebody else. Try running the following code fragment. I’m wholeheartedly in favour of the ‘no break’ comment for clarity.

for a in range(3):
    print(a)
    if a==4: # change value to force break or not
        break
else: #no break  +10 for whoever thought of this decoration
    print('for completed OK')

print('statement after for loop')

回答 6

我认为文档对其他方面有很好的解释 ,请继续

[…]当循环通过用尽列表而终止(使用for)或条件变为假(使用while)时执行,但在循环由break语句终止时则不执行。”

来源:Python 2文档:控制流教程

I think documentation has a great explanation of else, continue

[…] it is executed when the loop terminates through exhaustion of the list (with for) or when the condition becomes false (with while), but not when the loop is terminated by a break statement.”

Source: Python 2 docs: Tutorial on control flow


回答 7

我读到类似:

如果仍然有运行循环的条件,则执行其他操作,否则执行其他操作。

I read it something like:

If still on the conditions to run the loop, do stuff, else do something else.


回答 8

由于已经回答了很多技术方面的问题,因此我的评论仅与产生此回收关键字的混乱有关。

作为Python是一种非常有说服力的编程语言,关键字的滥用更为臭名昭著。该else关键字恰如其分地描述决策树的流程的一部分,“如果你不能做到这一点,(否则)做到这一点。” 它是用我们自己的语言暗示的

相反,将此关键字与whileand for语句一起使用会引起混淆。原因是,我们作为程序员的职业使我们知道该else语句位于决策树之内。它的逻辑范围,一个有条件地返回要遵循的路径的包装器。同时,循环语句具有比喻明确的目标。在流程的不断迭代之后,可以达到目标。

if / else 指明可以遵循的道路。循环遵循一条路径,直到“目标”完成

问题是这个else词清楚地定义了条件中的最后一个选项。这个词的语义由Python和人类语言共享。但是,人类语言中的else词永远不会用来表示某人或某事在完成某件事后将要采取的行动。如果在完成过程中出现问题(更像是休息),则将使用它语句)。

最后,关键字将保留在Python中。显然,这是错误的,当每个程序员尝试提出一个故事来理解其用法(如某种助记符设备)时,这一点就更加清楚。如果他们选择了关键字,我会很喜欢的then。我相信这个关键字非常适合迭代流程,即循环后的收益

这类似于某些孩子在按照每个步骤组装玩具后所遇到的情况:那么,爸爸是什么?

Since the technical part has been pretty much answered, my comment is just in relation with the confusion that produce this recycled keyword.

Being Python a very eloquent programming language, the misuse of a keyword is more notorious. The else keyword perfectly describes part of the flow of a decision tree, “if you can’t do this, (else) do that”. It’s implied in our own language.

Instead, using this keyword with while and for statements creates confusion. The reason, our career as programmers has taught us that the else statement resides within a decision tree; its logical scope, a wrapper that conditionally return a path to follow. Meanwhile, loop statements have a figurative explicit goal to reach something. The goal is met after continuous iterations of a process.

if / else indicate a path to follow. Loops follow a path until the “goal” is completed.

The issue is that else is a word that clearly define the last option in a condition. The semantics of the word are both shared by Python and Human Language. But the else word in Human Language is never used to indicate the actions someone or something will take after something is completed. It will be used if, in the process of completing it, an issue rises (more like a break statement).

At the end, the keyword will remain in Python. It’s clear it was mistake, clearer when every programmer tries to come up with a story to understand its usage like some mnemonic device. I’d have loved if they have chosen instead the keyword then. I believe that this keyword fits perfectly in that iterative flow, the payoff after the loop.

It resembles that situation that some child has after following every step in assembling a toy: And THEN what Dad?


回答 9

我将其读为“当iterable完全耗尽时,执行将在完成完成后继续执行下for一条语句,否则将执行else子句。” 因此,当迭代被中断时break,将不会执行。

I read it like “When the iterable is exhausted completely, and the execution is about to proceed to the next statement after finishing the for, the else clause will be executed.” Thus, when the iteration is broken by break, this will not be executed.


回答 10

我同意,它更像是“不是[[条件]打破休息条件]”。

我知道这是一个老话题,但是我现在正在研究相同的问题,而且我不确定有人以我理解的方式抓住了这个问题的答案。

对我来说,有三种“读取” elsein For... elseWhile... else语句的方法,所有这些方法都是等效的:

  1. else == if the loop completes normally (without a break or error)
  2. else == if the loop does not encounter a break
  3. else == else not (condition raising break) (大概有这种情况,否则您将不会循环)

因此,从本质上讲,循环中的“ else”实际上是一个“ elif …”,其中“ …”是(1)不间断,相当于(2)NOT [引起中断的条件]。

我认为关键是else没有’break’就没有意义,因此a for...else包括:

for:
    do stuff
    conditional break # implied by else
else not break:
    do more stuff

因此,for...else循环的基本元素如下,您将以普通英语阅读它们:

for:
    do stuff
    condition:
        break
else: # read as "else not break" or "else not condition"
    do more stuff

正如其他张贴者所说的那样,当您能够找到循环要查找的内容时,通常会出现中断,因此else:变成“如果未找到目标项目该怎么办”。

您还可以一起使用异常处理,中断和for循环。

for x in range(0,3):
    print("x: {}".format(x))
    if x == 2:
        try:
            raise AssertionError("ASSERTION ERROR: x is {}".format(x))
        except:
            print(AssertionError("ASSERTION ERROR: x is {}".format(x)))
            break
else:
    print("X loop complete without error")

结果

x: 0
x: 1
x: 2
ASSERTION ERROR: x is 2
----------
# loop not completed (hit break), so else didn't run

一个简单的例子,打破休息。

for y in range(0,3):
    print("y: {}".format(y))
    if y == 2: # will be executed
        print("BREAK: y is {}\n----------".format(y))
        break
else: # not executed because break is hit
    print("y_loop completed without break----------\n")

结果

y: 0
y: 1
y: 2
BREAK: y is 2
----------
# loop not completed (hit break), so else didn't run

一个简单的示例,其中没有中断,没有引发中断的条件,也没有遇到错误。

for z in range(0,3):
     print("z: {}".format(z))
     if z == 4: # will not be executed
         print("BREAK: z is {}\n".format(y))
         break
     if z == 4: # will not be executed
         raise AssertionError("ASSERTION ERROR: x is {}".format(x))
else:
     print("z_loop complete without break or error\n----------\n")

结果

z: 0
z: 1
z: 2
z_loop complete without break or error
----------

I agree, it’s more like an ‘elif not [condition(s) raising break]’.

I know this is an old thread, but I am looking into the same question right now, and I’m not sure anyone has captured the answer to this question in the way I understand it.

For me, there are three ways of “reading” the else in For... else or While... else statements, all of which are equivalent, are:

  1. else == if the loop completes normally (without a break or error)
  2. else == if the loop does not encounter a break
  3. else == else not (condition raising break) (presumably there is such a condition, or you wouldn’t have a loop)

So, essentially, the “else” in a loop is really an “elif …” where ‘…’ is (1) no break, which is equivalent to (2) NOT [condition(s) raising break].

I think the key is that the else is pointless without the ‘break’, so a for...else includes:

for:
    do stuff
    conditional break # implied by else
else not break:
    do more stuff

So, essential elements of a for...else loop are as follows, and you would read them in plainer English as:

for:
    do stuff
    condition:
        break
else: # read as "else not break" or "else not condition"
    do more stuff

As the other posters have said, a break is generally raised when you are able to locate what your loop is looking for, so the else: becomes “what to do if target item not located”.

Example

You can also use exception handling, breaks, and for loops all together.

for x in range(0,3):
    print("x: {}".format(x))
    if x == 2:
        try:
            raise AssertionError("ASSERTION ERROR: x is {}".format(x))
        except:
            print(AssertionError("ASSERTION ERROR: x is {}".format(x)))
            break
else:
    print("X loop complete without error")

Result

x: 0
x: 1
x: 2
ASSERTION ERROR: x is 2
----------
# loop not completed (hit break), so else didn't run

Example

Simple example with a break being hit.

for y in range(0,3):
    print("y: {}".format(y))
    if y == 2: # will be executed
        print("BREAK: y is {}\n----------".format(y))
        break
else: # not executed because break is hit
    print("y_loop completed without break----------\n")

Result

y: 0
y: 1
y: 2
BREAK: y is 2
----------
# loop not completed (hit break), so else didn't run

Example

Simple example where there no break, no condition raising a break, and no error are encountered.

for z in range(0,3):
     print("z: {}".format(z))
     if z == 4: # will not be executed
         print("BREAK: z is {}\n".format(y))
         break
     if z == 4: # will not be executed
         raise AssertionError("ASSERTION ERROR: x is {}".format(x))
else:
     print("z_loop complete without break or error\n----------\n")

Result

z: 0
z: 1
z: 2
z_loop complete without break or error
----------

回答 11

else这里,关键字可能会引起混淆,正如许多人指出的那样nobreaknotbreak是比较合适的。

为了for ... else ...逻辑上理解,请将其与try...except...else而不是进行比较if...else...,大多数python程序员都熟悉以下代码:

try:
    do_something()
except:
    print("Error happened.") # The try block threw an exception
else:
    print("Everything is find.") # The try block does things just find.

同样,可以认为break是一种特殊的Exception

for x in iterable:
    do_something(x)
except break:
    pass # Implied by Python's loop semantics
else:
    print('no break encountered')  # No break statement was encountered

区别是python隐含的except break,您无法将其写出,因此它变为:

for x in iterable:
    do_something(x)
else:
    print('no break encountered')  # No break statement was encountered

是的,我知道这种比较可能很困难并且很累,但是确实可以澄清这种混淆。

The else keyword can be confusing here, and as many people have pointed out, something like nobreak, notbreak is more appropriate.

In order to understand for ... else ... logically, compare it with try...except...else, not if...else..., most of python programmers are familiar with the following code:

try:
    do_something()
except:
    print("Error happened.") # The try block threw an exception
else:
    print("Everything is find.") # The try block does things just find.

Similarly, think of break as a special kind of Exception:

for x in iterable:
    do_something(x)
except break:
    pass # Implied by Python's loop semantics
else:
    print('no break encountered')  # No break statement was encountered

The difference is python implies except break and you can not write it out, so it becomes:

for x in iterable:
    do_something(x)
else:
    print('no break encountered')  # No break statement was encountered

Yes, I know this comparison can be difficult and tiresome, but it does clarify the confusion.


回答 12

elsefor不中断循环时,将执行语句块中的代码。

for x in xrange(1,5):
    if x == 5:
        print 'find 5'
        break
else:
    print 'can not find 5!'
#can not find 5!

文档:中断并继续执行语句,否则循环中的子句

循环语句可以包含else子句;当循环通过用尽列表而终止(使用for)或条件变为假(使用while)时,将执行此命令,但当循环由break语句终止时,则不会执行该命令。以下循环示例搜索质数:

>>> for n in range(2, 10):
...     for x in range(2, n):
...         if n % x == 0:
...             print(n, 'equals', x, '*', n//x)
...             break
...     else:
...         # loop fell through without finding a factor
...         print(n, 'is a prime number')
...
2 is a prime number
3 is a prime number
4 equals 2 * 2
5 is a prime number
6 equals 2 * 3
7 is a prime number
8 equals 2 * 4
9 equals 3 * 3

(是的,这是正确的代码。仔细观察:else子句属于for循环,而不是if语句。)

与循环一起使用时,else子句与try语句的else子句比if语句具有更多的共同点:try语句的else子句在没有异常发生时运行,而循环的else子句在没有中断时发生运行。有关try语句和异常的更多信息,请参见处理异常。

也从C借用的continue语句继续循环的下一个迭代:

>>> for num in range(2, 10):
...     if num % 2 == 0:
...         print("Found an even number", num)
...         continue
...     print("Found a number", num)
Found an even number 2
Found a number 3
Found an even number 4
Found a number 5
Found an even number 6
Found a number 7
Found an even number 8
Found a number 9

Codes in else statement block will be executed when the for loop was not be broke.

for x in xrange(1,5):
    if x == 5:
        print 'find 5'
        break
else:
    print 'can not find 5!'
#can not find 5!

From the docs: break and continue Statements, and else Clauses on Loops

Loop statements may have an else clause; it is executed when the loop terminates through exhaustion of the list (with for) or when the condition becomes false (with while), but not when the loop is terminated by a break statement. This is exemplified by the following loop, which searches for prime numbers:

>>> for n in range(2, 10):
...     for x in range(2, n):
...         if n % x == 0:
...             print(n, 'equals', x, '*', n//x)
...             break
...     else:
...         # loop fell through without finding a factor
...         print(n, 'is a prime number')
...
2 is a prime number
3 is a prime number
4 equals 2 * 2
5 is a prime number
6 equals 2 * 3
7 is a prime number
8 equals 2 * 4
9 equals 3 * 3

(Yes, this is the correct code. Look closely: the else clause belongs to the for loop, not the if statement.)

When used with a loop, the else clause has more in common with the else clause of a try statement than it does that of if statements: a try statement’s else clause runs when no exception occurs, and a loop’s else clause runs when no break occurs. For more on the try statement and exceptions, see Handling Exceptions.

The continue statement, also borrowed from C, continues with the next iteration of the loop:

>>> for num in range(2, 10):
...     if num % 2 == 0:
...         print("Found an even number", num)
...         continue
...     print("Found a number", num)
Found an even number 2
Found a number 3
Found an even number 4
Found a number 5
Found an even number 6
Found a number 7
Found an even number 8
Found a number 9

回答 13

这是一种我上面没有见过其他人提到的思考方式:

首先,请记住,for循环基本上只是while循环周围的语法糖。例如循环

for item in sequence:
    do_something(item)

可以(近似)重写为

item = None
while sequence.hasnext():
    item = sequence.next()
    do_something(item)

其次,请记住,while循环基本上只是重复的if块!您始终可以将while循环读为“如果满足此条件,则执行主体,然后返回并再次检查”。

因此,while / else完全有道理:它与if / else完全相同,具有附加的循环功能,直到条件变为false为止,而不仅仅是检查条件一次。

然后for / else也很有意义:由于所有for循环只是while循环之上的语法糖,您只需要弄清楚底层while循环的隐式条件是什么,然后else对应于何时条件变为False。

Here’s a way to think about it that I haven’t seen anyone else mention above:

First, remember that for-loops are basically just syntactic sugar around while-loops. For example, the loop

for item in sequence:
    do_something(item)

can be rewritten (approximately) as

item = None
while sequence.hasnext():
    item = sequence.next()
    do_something(item)

Second, remember that while-loops are basically just repeated if-blocks! You can always read a while-loop as “if this condition is true, execute the body, then come back and check again”.

So while/else makes perfect sense: It’s the exact same structure as if/else, with the added functionality of looping until the condition becomes false instead of just checking the condition once.

And then for/else makes perfect sense too: because all for-loops are just syntactic sugar on top of while-loops, you just need to figure out what the underlying while-loop’s implicit conditional is, and then the else corresponds to when that condition becomes False.


回答 14

很好的答案是:

  • 可以解释历史,并且
  • 这样可以正确引用,以简化您的翻译/理解。

我在这里的注释来自Donald Knuth曾经说过的(抱歉无法找到参考),其中有一个while-else与if-else不能区分的构造,即(在Python中):

x = 2
while x > 3:
    print("foo")
    break
else:
    print("boo")

具有与以下相同的流量(不包括低级别差异):

x = 2
if x > 3:
    print("foo")
else:
    print("boo")

关键是,if-else可以被视为while-else的语法糖,而while-else break在其if块的末尾具有隐含的含义。相反的含义是,while循环是对的扩展if,这是更常见的(只是重复/循环条件检查),因为if通常在之前进行过讲授while。但是,这是不正确的,因为这意味着每次条件为false时else都会执行while-else块。

为了简化您的理解,请考虑以下方式:

如果不使用breakreturn等,则循环仅在条件不再为真时结束,并且在这种情况下,else块也将执行一次。如果是Python for,则必须考虑C型for循环(有条件)或将其转换为while

另一个注意事项:

过早breakreturn等内部循环,使不可能的条件,成为虚假的,因为执行跃升循环出来,而条件是真实的,它永远不会回来再检查一遍。

Great answers are:

  • this which explain the history, and
  • this gives the right citation to ease yours translation/understanding.

My note here comes from what Donald Knuth once said (sorry can’t find reference) that there is a construct where while-else is indistinguishable from if-else, namely (in Python):

x = 2
while x > 3:
    print("foo")
    break
else:
    print("boo")

has the same flow (excluding low level differences) as:

x = 2
if x > 3:
    print("foo")
else:
    print("boo")

The point is that if-else can be considered as syntactic sugar for while-else which has implicit break at the end of its if block. The opposite implication, that while loop is extension to if, is more common (it’s just repeated/looped conditional check), because if is often taught before while. However that isn’t true because that would mean else block in while-else would be executed each time when condition is false.

To ease your understanding think of it that way:

Without break, return, etc., loop ends only when condition is no longer true and in such case else block will also execute once. In case of Python for you must consider C-style for loops (with conditions) or translate them to while.

Another note:

Premature break, return, etc. inside loop makes impossible for condition to become false because execution jumped out of the loop while condition was true and it would never come back to check it again.


回答 15

您可以将其想像为 else其余内容或其他内容中未在循环中完成的。

You could think of it like, else as in the rest of the stuff, or the other stuff, that wasn’t done in the loop.


回答 16

for i in range(3):
    print(i)

    if i == 2:
        print("Too big - I'm giving up!")
        break;
else:
    print("Completed successfully")

“ else”在这里非常简单,只是意味着

1,“如果for clause完成”

for i in range(3):
    print(i)

    if i == 2:
        print("Too big - I'm giving up!")
        break;
if "for clause is completed":
    print("Completed successfully")

想要写“ for子句已完成”这样的长语句,所以要引入“ else”。

else 这本质上是一种假设。

2,但是,怎么样 for clause is not run at all

In [331]: for i in range(0):
     ...:     print(i)
     ...: 
     ...:     if i == 9:
     ...:         print("Too big - I'm giving up!")
     ...:         break
     ...: else:
     ...:     print("Completed successfully")
     ...:     
Completed successfully

因此,完全可以说是逻辑组合:

if "for clause is completed" or "not run at all":
     do else stuff

或这样说:

if "for clause is not partially run":
    do else stuff

或者这样:

if "for clause not encounter a break":
    do else stuff
for i in range(3):
    print(i)

    if i == 2:
        print("Too big - I'm giving up!")
        break;
else:
    print("Completed successfully")

“else” here is crazily simple, just mean

1, “if for clause is completed”

for i in range(3):
    print(i)

    if i == 2:
        print("Too big - I'm giving up!")
        break;
if "for clause is completed":
    print("Completed successfully")

It’s wielding to write such long statements as “for clause is completed”, so they introduce “else”.

else here is a if in its nature.

2, However, How about for clause is not run at all

In [331]: for i in range(0):
     ...:     print(i)
     ...: 
     ...:     if i == 9:
     ...:         print("Too big - I'm giving up!")
     ...:         break
     ...: else:
     ...:     print("Completed successfully")
     ...:     
Completed successfully

So it’s completely statement is logic combination:

if "for clause is completed" or "not run at all":
     do else stuff

or put it this way:

if "for clause is not partially run":
    do else stuff

or this way:

if "for clause not encounter a break":
    do else stuff

回答 17

除了搜索之外,这是另一个惯用例。假设您要等待条件为真,例如,要在远程服务器上打开端口以及一些超时。然后,您可以利用这样的while...else构造:

import socket
import time

sock = socket.socket()
timeout = time.time() + 15
while time.time() < timeout:
    if sock.connect_ex(('127.0.0.1', 80)) is 0:
        print('Port is open now!')
        break
    print('Still waiting...')
else:
    raise TimeoutError()

Here’s another idiomatic use case besides searching. Let’s say you wanted to wait for a condition to be true, e.g. a port to be open on a remote server, along with some timeout. Then you could utilize a while...else construct like so:

import socket
import time

sock = socket.socket()
timeout = time.time() + 15
while time.time() < timeout:
    if sock.connect_ex(('127.0.0.1', 80)) is 0:
        print('Port is open now!')
        break
    print('Still waiting...')
else:
    raise TimeoutError()

回答 18

我只是想自己重新理解它。我发现以下帮助!

•将else视为与if循环内部配对(而不是与for)配对-如果满足条件,则打破循环,否则执行此操作-除非它else与多个ifs 配对!
•如果根本if不满意,请执行else
•多重ifS可实际上也被认为是ifelifs ^!

I was just trying to make sense of it again myself. I found that the following helps!

• Think of the else as being paired with the if inside the loop (instead of with the for) – if condition is met then break the loop, else do this – except it’s one else paired with multiple ifs!
• If no ifs were satisfied at all, then do the else.
• The multiple ifs can also actually be thought of as ifelifs!


回答 19

我认为结构为(如果)在其他B,以及(如果)-else是一个特殊的if-else粗略。可能有助于了解其他

A和B最多执行一次,这与if-else结构相同。

for(if)可以认为是特殊的if,它会循环执行以尝试满足if条件。一旦满足if条件,则A 中断否则,B.

I consider the structure as for (if) A else B, and for(if)-else is a special if-else, roughly. It may help to understand else.

A and B is executed at most once, which is the same as if-else structure.

for(if) can be considered as a special if, which does a loop to try to meet the if condition. Once the if condition is met, A and break; Else, B.


回答 20

Python在for和while循环之后使用else循环,因此,如果没有任何内容适用于循环,则会发生其他情况。例如:

test = 3
while test == 4:
     print("Hello")
else:
     print("Hi")

输出将是一遍又一遍的“ Hi”(如果我是对的)。

Python uses an else after for and while loops so that if nothing applies to the loop, something else happens. For example:

test = 3
while test == 4:
     print("Hello")
else:
     print("Hi")

The output would be ‘Hi’ over and over again (if I’m correct).