标签归档:Python

scikit-learn中的class_weight参数如何工作?

问题:scikit-learn中的class_weight参数如何工作?

我在理解class_weightscikit-learn的Logistic回归中的参数如何运行时遇到很多麻烦。

情况

我想使用逻辑回归对非常不平衡的数据集进行二进制分类。这些类别分别标记为0(负)和1(正),观察到的数据比例约为19:1,大多数样本的结果均为负。

第一次尝试:手动准备训练数据

我将我拥有的数据分为不相交的数据集进行训练和测试(大约80/20)。然后,我手工对训练数据进行了随机采样,得到的训练数据比例与19:1不同。从2:1-> 16:1。

然后,我对这些不同的训练数据子集进行了逻辑回归训练,并根据不同的训练比例绘制了召回率(= TP /(TP + FN))。当然,召回率是根据不连续的TEST样本(观察到的比例为19:1)计算的。注意,尽管我在不同的训练数据上训练了不同的模型,但我在相同(不相交)的测试数据上计算了所有模型的召回率。

结果符合预期:以2:1的训练比例召回率约为60%,到16:1时召回率很快下降。比例为2:1-> 6:1,召回率在5%以上。

第二次尝试:网格搜索

接下来,我想测试不同的正则化参数,因此我使用了GridSearchCV并制作了一个包含C参数值和参数值的网格class_weight。要将我的n:m否定:肯定的训练样本比例转换成class_weight我的词典语言,我认为我只是指定了几个字典,如下所示:

{ 0:0.67, 1:0.33 } #expected 2:1
{ 0:0.75, 1:0.25 } #expected 3:1
{ 0:0.8, 1:0.2 }   #expected 4:1

并且我还包括Noneauto

这次的结果是完全错误的。class_weight除了的每个值,我所有的召回都很小(<0.05)auto。因此,我只能假设我对如何设置class_weight字典的理解是错误的。有趣的是,class_weight对于的所有值,网格搜索中“自动” 的值约为59%C,我猜想它与1:1平衡吗?

我的问题

  1. 您如何正确使用class_weight训练数据与实际提供的数据取得不同的平衡?具体来说,我传递给哪个字典class_weight来使用n:m比例的负数:正数训练样本?

  2. 如果您将各种class_weight字典传递给GridSearchCV,则在交叉验证期间,它将根据字典重新平衡训练折叠数据,但使用真实给定的样本比例来计算我在测试折叠上的得分函数吗?这很关键,因为任何度量标准仅对来自观察到的比例的数据有用。

  3. 就比例而言,auto价值是class_weight什么?我阅读了文档,并假设“与数据频率成反比地平衡数据”只是意味着将其设为1:1。这样对吗?如果没有,有人可以澄清吗?

I am having a lot of trouble understanding how the class_weight parameter in scikit-learn’s Logistic Regression operates.

The Situation

I want to use logistic regression to do binary classification on a very unbalanced data set. The classes are labelled 0 (negative) and 1 (positive) and the observed data is in a ratio of about 19:1 with the majority of samples having negative outcome.

First Attempt: Manually Preparing Training Data

I split the data I had into disjoint sets for training and testing (about 80/20). Then I randomly sampled the training data by hand to get training data in different proportions than 19:1; from 2:1 -> 16:1.

I then trained logistic regression on these different training data subsets and plotted recall (= TP/(TP+FN)) as a function of the different training proportions. Of course, the recall was computed on the disjoint TEST samples which had the observed proportions of 19:1. Note, although I trained the different models on different training data, I computed recall for all of them on the same (disjoint) test data.

The results were as expected: the recall was about 60% at 2:1 training proportions and fell off rather fast by the time it got to 16:1. There were several proportions 2:1 -> 6:1 where the recall was decently above 5%.

Second Attempt: Grid Search

Next, I wanted to test different regularization parameters and so I used GridSearchCV and made a grid of several values of the C parameter as well as the class_weight parameter. To translate my n:m proportions of negative:positive training samples into the dictionary language of class_weight I thought that I just specify several dictionaries as follows:

{ 0:0.67, 1:0.33 } #expected 2:1
{ 0:0.75, 1:0.25 } #expected 3:1
{ 0:0.8, 1:0.2 }   #expected 4:1

and I also included None and auto.

This time the results were totally wacked. All my recalls came out tiny (< 0.05) for every value of class_weight except auto. So I can only assume that my understanding of how to set the class_weight dictionary is wrong. Interestingly, the class_weight value of ‘auto’ in the grid search was around 59% for all values of C, and I guessed it balances to 1:1?

My Questions

  1. How do you properly use class_weight to achieve different balances in training data from what you actually give it? Specifically, what dictionary do I pass to class_weight to use n:m proportions of negative:positive training samples?

  2. If you pass various class_weight dictionaries to GridSearchCV, during cross-validation will it rebalance the training fold data according to the dictionary but use the true given sample proportions for computing my scoring function on the test fold? This is critical since any metric is only useful to me if it comes from data in the observed proportions.

  3. What does the auto value of class_weight do as far as proportions? I read the documentation and I assume “balances the data inversely proportional to their frequency” just means it makes it 1:1. Is this correct? If not, can someone clarify?


回答 0

首先,仅靠召回可能并不好。通过将所有内容都归为肯定类,您可以简单地实现100%的召回率。我通常建议使用AUC选择参数,然后找到您感兴趣的工作点阈值(例如给定的精度水平)。

对于如何class_weight作品:它惩罚失误的样品class[i]class_weight[i]的,而不是1。所以高类的重量意味着要更多地强调的一类。从您看来,类0的频率比类1的频率高19倍。因此,应class_weight相对于类0 增加类1的频率,例如{0:.1,1:.9}。如果class_weight不等于1,则基本上会更改正则化参数。

对于class_weight="auto"工作原理,您可以看一下这个讨论。在开发版本中,您可以使用class_weight="balanced",它更容易理解:从本质上讲,它意味着复制较小的类,直到您拥有与较大类相同的样本为止,但是是以隐式方式进行的。

First off, it might not be good to just go by recall alone. You can simply achieve a recall of 100% by classifying everything as the positive class. I usually suggest using AUC for selecting parameters, and then finding a threshold for the operating point (say a given precision level) that you are interested in.

For how class_weight works: It penalizes mistakes in samples of class[i] with class_weight[i] instead of 1. So higher class-weight means you want to put more emphasis on a class. From what you say it seems class 0 is 19 times more frequent than class 1. So you should increase the class_weight of class 1 relative to class 0, say {0:.1, 1:.9}. If the class_weight doesn’t sum to 1, it will basically change the regularization parameter.

For how class_weight="auto" works, you can have a look at this discussion. In the dev version you can use class_weight="balanced", which is easier to understand: it basically means replicating the smaller class until you have as many samples as in the larger one, but in an implicit way.


回答 1

第一个答案有助于理解其工作原理。但是我想了解我应该如何在实践中使用它。

摘要

  • 对于没有噪声的中度不平衡数据,应用类权重没有太大差异
  • 对于带有噪声且严重失衡的中等失衡数据,最好应用类权重
  • class_weight="balanced"在您不想手动优化的情况下,param的效果不错
  • class_weight="balanced"您捕捉更真实事件(高TRUE召回),而且你更有可能得到虚假警报(降低TRUE精度)
    • 结果,由于所有误报,总的TRUE百分比可能高于实际值
    • 如果误报是个问题,AUC可能会误导您
  • 无需将决策阈值更改为不平衡百分比,即使是严重的不平衡,也可以保持0.5(或取决于您所需的值)

NB

使用RF或GBM时,结果可能会有所不同。sklearn没有 class_weight="balanced" GBM,但是lightgbmLGBMClassifier(is_unbalance=False)

# scikit-learn==0.21.3
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, classification_report
import numpy as np
import pandas as pd

# case: moderate imbalance
X, y = datasets.make_classification(n_samples=50*15, n_features=5, n_informative=2, n_redundant=0, random_state=1, weights=[0.8]) #,flip_y=0.1,class_sep=0.5)
np.mean(y) # 0.2

LogisticRegression(C=1e9).fit(X,y).predict(X).mean() # 0.184
(LogisticRegression(C=1e9).fit(X,y).predict_proba(X)[:,1]>0.5).mean() # 0.184 => same as first
LogisticRegression(C=1e9,class_weight={0:0.5,1:0.5}).fit(X,y).predict(X).mean() # 0.184 => same as first
LogisticRegression(C=1e9,class_weight={0:2,1:8}).fit(X,y).predict(X).mean() # 0.296 => seems to make things worse?
LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X).mean() # 0.292 => seems to make things worse?

roc_auc_score(y,LogisticRegression(C=1e9).fit(X,y).predict(X)) # 0.83
roc_auc_score(y,LogisticRegression(C=1e9,class_weight={0:2,1:8}).fit(X,y).predict(X)) # 0.86 => about the same
roc_auc_score(y,LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X)) # 0.86 => about the same

# case: strong imbalance
X, y = datasets.make_classification(n_samples=50*15, n_features=5, n_informative=2, n_redundant=0, random_state=1, weights=[0.95])
np.mean(y) # 0.06

LogisticRegression(C=1e9).fit(X,y).predict(X).mean() # 0.02
(LogisticRegression(C=1e9).fit(X,y).predict_proba(X)[:,1]>0.5).mean() # 0.02 => same as first
LogisticRegression(C=1e9,class_weight={0:0.5,1:0.5}).fit(X,y).predict(X).mean() # 0.02 => same as first
LogisticRegression(C=1e9,class_weight={0:1,1:20}).fit(X,y).predict(X).mean() # 0.25 => huh??
LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X).mean() # 0.22 => huh??
(LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict_proba(X)[:,1]>0.5).mean() # same as last

roc_auc_score(y,LogisticRegression(C=1e9).fit(X,y).predict(X)) # 0.64
roc_auc_score(y,LogisticRegression(C=1e9,class_weight={0:1,1:20}).fit(X,y).predict(X)) # 0.84 => much better
roc_auc_score(y,LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X)) # 0.85 => similar to manual
roc_auc_score(y,(LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict_proba(X)[:,1]>0.5).astype(int)) # same as last

print(classification_report(y,LogisticRegression(C=1e9).fit(X,y).predict(X)))
pd.crosstab(y,LogisticRegression(C=1e9).fit(X,y).predict(X),margins=True)
pd.crosstab(y,LogisticRegression(C=1e9).fit(X,y).predict(X),margins=True,normalize='index') # few prediced TRUE with only 28% TRUE recall and 86% TRUE precision so 6%*28%~=2%

print(classification_report(y,LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X)))
pd.crosstab(y,LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X),margins=True)
pd.crosstab(y,LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X),margins=True,normalize='index') # 88% TRUE recall but also lot of false positives with only 23% TRUE precision, making total predicted % TRUE > actual % TRUE

The first answer is good for understanding how it works. But I wanted to understand how I should be using it in practice.

SUMMARY

  • for moderately imbalanced data WITHOUT noise, there is not much of a difference in applying class weights
  • for moderately imbalanced data WITH noise and strongly imbalanced, it is better to apply class weights
  • param class_weight="balanced" works decent in the absence of you wanting to optimize manually
  • with class_weight="balanced" you capture more true events (higher TRUE recall) but also you are more likely to get false alerts (lower TRUE precision)
    • as a result, the total % TRUE might be higher than actual because of all the false positives
    • AUC might misguide you here if the false alarms are an issue
  • no need to change decision threshold to the imbalance %, even for strong imbalance, ok to keep 0.5 (or somewhere around that depending on what you need)

NB

The result might differ when using RF or GBM. sklearn does not have class_weight="balanced" for GBM but lightgbm has LGBMClassifier(is_unbalance=False)

CODE

# scikit-learn==0.21.3
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, classification_report
import numpy as np
import pandas as pd

# case: moderate imbalance
X, y = datasets.make_classification(n_samples=50*15, n_features=5, n_informative=2, n_redundant=0, random_state=1, weights=[0.8]) #,flip_y=0.1,class_sep=0.5)
np.mean(y) # 0.2

LogisticRegression(C=1e9).fit(X,y).predict(X).mean() # 0.184
(LogisticRegression(C=1e9).fit(X,y).predict_proba(X)[:,1]>0.5).mean() # 0.184 => same as first
LogisticRegression(C=1e9,class_weight={0:0.5,1:0.5}).fit(X,y).predict(X).mean() # 0.184 => same as first
LogisticRegression(C=1e9,class_weight={0:2,1:8}).fit(X,y).predict(X).mean() # 0.296 => seems to make things worse?
LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X).mean() # 0.292 => seems to make things worse?

roc_auc_score(y,LogisticRegression(C=1e9).fit(X,y).predict(X)) # 0.83
roc_auc_score(y,LogisticRegression(C=1e9,class_weight={0:2,1:8}).fit(X,y).predict(X)) # 0.86 => about the same
roc_auc_score(y,LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X)) # 0.86 => about the same

# case: strong imbalance
X, y = datasets.make_classification(n_samples=50*15, n_features=5, n_informative=2, n_redundant=0, random_state=1, weights=[0.95])
np.mean(y) # 0.06

LogisticRegression(C=1e9).fit(X,y).predict(X).mean() # 0.02
(LogisticRegression(C=1e9).fit(X,y).predict_proba(X)[:,1]>0.5).mean() # 0.02 => same as first
LogisticRegression(C=1e9,class_weight={0:0.5,1:0.5}).fit(X,y).predict(X).mean() # 0.02 => same as first
LogisticRegression(C=1e9,class_weight={0:1,1:20}).fit(X,y).predict(X).mean() # 0.25 => huh??
LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X).mean() # 0.22 => huh??
(LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict_proba(X)[:,1]>0.5).mean() # same as last

roc_auc_score(y,LogisticRegression(C=1e9).fit(X,y).predict(X)) # 0.64
roc_auc_score(y,LogisticRegression(C=1e9,class_weight={0:1,1:20}).fit(X,y).predict(X)) # 0.84 => much better
roc_auc_score(y,LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X)) # 0.85 => similar to manual
roc_auc_score(y,(LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict_proba(X)[:,1]>0.5).astype(int)) # same as last

print(classification_report(y,LogisticRegression(C=1e9).fit(X,y).predict(X)))
pd.crosstab(y,LogisticRegression(C=1e9).fit(X,y).predict(X),margins=True)
pd.crosstab(y,LogisticRegression(C=1e9).fit(X,y).predict(X),margins=True,normalize='index') # few prediced TRUE with only 28% TRUE recall and 86% TRUE precision so 6%*28%~=2%

print(classification_report(y,LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X)))
pd.crosstab(y,LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X),margins=True)
pd.crosstab(y,LogisticRegression(C=1e9,class_weight="balanced").fit(X,y).predict(X),margins=True,normalize='index') # 88% TRUE recall but also lot of false positives with only 23% TRUE precision, making total predicted % TRUE > actual % TRUE

如何在Python中打印Unicode字符?

问题:如何在Python中打印Unicode字符?

我想制作一本字典,其中英语单词指向俄语和法语翻译。

如何在Python中打印出unicode字符?另外,如何将Unicode字符存储在变量中?

I want to make a dictionary where English words point to Russian and French translations.

How do I print out unicode characters in Python? Also, how do you store unicode chars in a variable?


回答 0

要在Python源代码中包含Unicode字符,可以在字符串的形式中使用Unicode转义字符\u0123,并在字符串文字前加上’u’前缀。

这是在Python交互式控制台中运行的示例:

>>> print u'\u0420\u043e\u0441\u0441\u0438\u044f'
Россия

Python Unicode文档中所述,这样声明的字符串是Unicode类型的变量。

如果运行上述命令不能正确显示文本,则可能是您的终端无法显示Unicode字符。

有关从文件读取Unicode数据的信息,请参见以下答案:

使用Python从文件中读取字符

To include Unicode characters in your Python source code, you can use Unicode escape characters in the form \u0123 in your string. In Python 2.x, you also need to prefix the string literal with ‘u’.

Here’s an example running in the Python 2.x interactive console:

>>> print u'\u0420\u043e\u0441\u0441\u0438\u044f'
Россия

In Python 2, prefixing a string with ‘u’ declares them as Unicode-type variables, as described in the Python Unicode documentation.

In Python 3, the ‘u’ prefix is now optional:

>>> print('\u0420\u043e\u0441\u0441\u0438\u044f')
Россия

If running the above commands doesn’t display the text correctly for you, perhaps your terminal isn’t capable of displaying Unicode characters.

These examples use Unicode escapes (\u...), which allows you to print Unicode characters while keeping your source code as plain ASCII. This can help when working with the same source code on different systems. You can also use Unicode characters directly in your Python source code (e.g. print u'Россия' in Python 2), if you are confident all your systems handle Unicode files properly.

For information about reading Unicode data from a file, see this answer:

Character reading from file in Python


回答 1

在Python中打印unicode字符:

直接从python解释器打印unicode字符:

el@apollo:~$ python
Python 2.7.3
>>> print u'\u2713'

Unicode字符u'\u2713'是一个复选标记。口译员将复选标记打印在屏幕上。

从python脚本打印unicode字符:

把它放在test.py中:

#!/usr/bin/python
print("here is your checkmark: " + u'\u2713');

像这样运行它:

el@apollo:~$ python test.py
here is your checkmark: 

如果没有为您显示复选标记,则问题可能出在其他地方,例如终端设置或您正在使用流重定向进行的操作。

将unicode字符存储在文件中:

将此保存到文件:foo.py:

#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
import codecs
import sys 
UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)
print(u'e with obfuscation: é')

运行它,并将输出管道传输到文件:

python foo.py > tmp.txt

打开tmp.txt并查看内部,您会看到以下内容:

el@apollo:~$ cat tmp.txt 
e with obfuscation: é

因此,您已将带有混淆标记的unicode e保存到文件中。

Print a unicode character in Python:

Print a unicode character directly from python interpreter:

el@apollo:~$ python
Python 2.7.3
>>> print u'\u2713'
✓

Unicode character u'\u2713' is a checkmark. The interpreter prints the checkmark on the screen.

Print a unicode character from a python script:

Put this in test.py:

#!/usr/bin/python
print("here is your checkmark: " + u'\u2713');

Run it like this:

el@apollo:~$ python test.py
here is your checkmark: ✓

If it doesn’t show a checkmark for you, then the problem could be elsewhere, like the terminal settings or something you are doing with stream redirection.

Store unicode characters in a file:

Save this to file: foo.py:

#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
import codecs
import sys 
UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)
print(u'e with obfuscation: é')

Run it and pipe output to file:

python foo.py > tmp.txt

Open tmp.txt and look inside, you see this:

el@apollo:~$ cat tmp.txt 
e with obfuscation: é

Thus you have saved unicode e with a obfuscation mark on it to a file.


回答 2

如果您尝试使用print()Unicode并出现ascii编解码器错误,请查看此页面该页面的TLDR export PYTHONIOENCODING=UTF-8在启动python之前执行(此变量控制控制台尝试将字节数据编码为的字节序列)。在内部,Python3默认使用UTF-8(请参阅Unicode HOWTO),所以这不是问题;您可以将Unicode放入字符串中,如其他答案和注释所示。当您尝试将这些数据发送到控制台时,就会发生问题。Python认为您的控制台只能处理ascii。其他一些答案说:“首先将其写入文件”,但请注意,它们为此指定了编码(UTF-8)(因此,Python不会在书写上进行任何更改),然后使用一种读取方法该文件仅散出字节而无需考虑编码,因此起作用。

If you’re trying to print() Unicode, and getting ascii codec errors, check out this page, the TLDR of which is do export PYTHONIOENCODING=UTF-8 before firing up python (this variable controls what sequence of bytes the console tries to encode your string data as). Internally, Python3 uses UTF-8 by default (see the Unicode HOWTO) so that’s not the problem; you can just put Unicode in strings, as seen in the other answers and comments. It’s when you try and get this data out to your console that the problem happens. Python thinks your console can only handle ascii. Some of the other answers say, “Write it to a file, first” but note they specify the encoding (UTF-8) for doing so (so, Python doesn’t change anything in writing), and then use a method for reading the file that just spits out the bytes without any regard for encoding, which is why that works.


回答 3

在Python 2中,您u可以在中u"猫"使用声明unicode字符串,并分别使用decode()encode()与unicode进行相互转换。

在Python 3中,这要容易得多。在这里可以找到非常好的概述。那场演讲为我澄清了很多事情。

In Python 2, you declare unicode strings with a u, as in u"猫" and use decode() and encode() to translate to and from unicode, respectively.

It’s quite a bit easier in Python 3. A very good overview can be found here. That presentation clarified a lot of things for me.


回答 4

考虑到这是Google搜索此主题时的第一个堆栈溢出结果,因此值得一提的u是,在python 3中Unicode字符串的前缀是可选的。(从最上面的答案复制了Python 2示例)

Python 3(两者均可):

print('\u0420\u043e\u0441\u0441\u0438\u044f')
print(u'\u0420\u043e\u0441\u0441\u0438\u044f')

Python 2:

print u'\u0420\u043e\u0441\u0441\u0438\u044f'

Considering that this is the first stack overflow result when google searching this topic, it bears mentioning that prefixing u to unicode strings is optional in Python 3. (Python 2 example was copied from the top answer)

Python 3 (both work):

print('\u0420\u043e\u0441\u0441\u0438\u044f')
print(u'\u0420\u043e\u0441\u0441\u0438\u044f')

Python 2:

print u'\u0420\u043e\u0441\u0441\u0438\u044f'

回答 5

我在Windows中使用Portable Winpython,它包含IPython QT控制台,我可以实现以下目标。

>>>print ("結婚")
結婚

>>>print ("おはよう")
おはよう

>>>str = "結婚"


>>>print (str)
結婚

您的控制台解释器应支持unicode才能显示unicode字符。

I use Portable winpython in Windows, it includes IPython QT console, I could achieve the following.

>>>print ("結婚")
結婚

>>>print ("おはよう")
おはよう

>>>str = "結婚"


>>>print (str)
結婚

your console interpreter should support unicode in order to show unicode characters.


回答 6

尚未添加的一件事

在Python 2中,如果要打印具有unicode并使用的变量,.format()请执行此操作(将要格式化的基本字符串设置为u''

>>> text = "Université de Montréal"
>>> print(u"This is unicode: {}".format(text))
>>> This is unicode: Université de Montréal

Just one more thing that hasn’t been added yet

In Python 2, if you want to print a variable that has unicode and use .format(), then do this (make the base string that is being formatted a unicode string with u'':

>>> text = "Université de Montréal"
>>> print(u"This is unicode: {}".format(text))
>>> This is unicode: Université de Montréal

回答 7

这修复了python中的UTF-8打印:

UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)

This fixes UTF-8 printing in python:

UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)

回答 8

‘+’替换为‘000’。例如,“ U + 1F600”将变为“ U0001F600”,并在Unicode代码前添加“ \”并打印。例:

>>> print("Learning : ", "\U0001F40D")
Learning :  🐍
>>> 

检查这也许会帮助 python unicode emoji

Replace ‘+’ with ‘000’. For example, ‘U+1F600’ will become ‘U0001F600’ and prepend the Unicode code with “\” and print. Example:

>>> print("Learning : ", "\U0001F40D")
Learning :  🐍
>>> 

Check this maybe it will help python unicode emoji


python print end =”

问题:python print end =”

我有这个需要运行的python脚本 gdal_retile.py

但是我在这条线上有一个exceptions:

if Verbose:
   print("Building internam Index for %d tile(s) ..." % len(inputTiles), end=' ')

end=''是无效的语法。我很好奇为什么以及作者可能打算做什么。

如果您还没猜到,我是python的新手。


我认为问题的根本原因是这些导入失败,因此必须包含此导入 from __future__ import print_function

try: 
   from osgeo import gdal
   from osgeo import ogr
   from osgeo import osr
   from osgeo.gdalconst import *
except:
   import gdal
   import ogr
   import osr
   from gdalconst import *

I have this python script where I need to run gdal_retile.py, but I get an exception on this line:

if Verbose:
   print("Building internam Index for %d tile(s) ..." % len(inputTiles), end=' ')

The end=' ' is invalid syntax. I am curious as to why, and what the author probably meant to do.

I’m new to python if you haven’t already guessed.


I think the root cause of the problem is that these imports are failing and therefore one must contain this import from __future__ import print_function

try: 
   from osgeo import gdal
   from osgeo import ogr
   from osgeo import osr
   from osgeo.gdalconst import *
except:
   import gdal
   import ogr
   import osr
   from gdalconst import *

回答 0

您确定使用的是Python 3.x吗?该语法在Python 2.x中不可用,因为print它仍然是一条语句。

print("foo" % bar, end=" ")

在Python 2.x中与

print ("foo" % bar, end=" ")

要么

print "foo" % bar, end=" "

即作为调用以元组为参数进行打印。

显然这是错误的语法(文字不带关键字参数)。在Python 3.x print中,它是一个实际函数,因此它也带有关键字参数。

Python 2.x中正确的习惯用法end=" "是:

print "foo" % bar,

(请注意最后一个逗号,这使它以空格而不是换行符结束)

如果要进一步控制输出,请考虑sys.stdout直接使用。这不会对输出产生任何特殊的影响。

当然,在最新版本的Python 2.x(2.5应该有它,不确定是2.4)中,您可以使用__future__模块在脚本文件中启用它:

from __future__ import print_function

这同样与unicode_literals和其他一些好东西(with_statement等)。但是,这在真正的旧版本(即在引入此功能之前创建)Python 2.x中不起作用。

Are you sure you are using Python 3.x? The syntax isn’t available in Python 2.x because print is still a statement.

print("foo" % bar, end=" ")

in Python 2.x is identical to

print ("foo" % bar, end=" ")

or

print "foo" % bar, end=" "

i.e. as a call to print with a tuple as argument.

That’s obviously bad syntax (literals don’t take keyword arguments). In Python 3.x print is an actual function, so it takes keyword arguments, too.

The correct idiom in Python 2.x for end=" " is:

print "foo" % bar,

(note the final comma, this makes it end the line with a space rather than a linebreak)

If you want more control over the output, consider using sys.stdout directly. This won’t do any special magic with the output.

Of course in somewhat recent versions of Python 2.x (2.5 should have it, not sure about 2.4), you can use the __future__ module to enable it in your script file:

from __future__ import print_function

The same goes with unicode_literals and some other nice things (with_statement, for example). This won’t work in really old versions (i.e. created before the feature was introduced) of Python 2.x, though.


回答 1

这个怎么样:

#Only for use in Python 2.6.0a2 and later
from __future__ import print_function

这使您可以使用Python 3.0样式print功能,而无需手动编辑所有出现的print:)

How about this:

#Only for use in Python 2.6.0a2 and later
from __future__ import print_function

This allows you to use the Python 3.0 style print function without having to hand-edit all occurrences of print :)


回答 2

在python 2.7中,这是您的操作方式

mantra = 'Always look on the bright side of life'
for c in mantra: print c,

#output
A l w a y s   l o o k   o n   t h e   b r i g h t   s i d e   o f   l i f e

在python 3.x中

myjob= 'hacker'
for c in myjob: print (c, end=' ')
#output 
h a c k e r 

In python 2.7 here is how you do it

mantra = 'Always look on the bright side of life'
for c in mantra: print c,

#output
A l w a y s   l o o k   o n   t h e   b r i g h t   s i d e   o f   l i f e

In python 3.x

myjob= 'hacker'
for c in myjob: print (c, end=' ')
#output 
h a c k e r 

回答 3

首先,您在开头缺少引号,但这可能是复制/粘贴错误。

在Python 3.x中,end=' '零件将在显示的字符串之后而不是换行符后放置一个空格。要在Python 2.x中执行相同的操作,请在末尾加逗号:

print "Building internam Index for %d tile(s) ..." % len(inputTiles),

First of all, you’re missing a quote at the beginning but this is probably a copy/paste error.

In Python 3.x, the end=' ' part will place a space after the displayed string instead of a newline. To do the same thing in Python 2.x, you’d put a comma at the end:

print "Building internam Index for %d tile(s) ..." % len(inputTiles),

回答 4

我认为他正在使用Python 3.0,而您正在使用Python 2.6。

I think he’s using Python 3.0 and you’re using Python 2.6.


回答 5

这只是一个版本。由于Python 3.x的print实际上是一个函数,因此它现在像任何普通函数一样接受参数。

end=' '仅仅是说,你要声明,而不是一个新行字符结束后的空间。在Python 2.x中,您必须通过在print语句的末尾放置一个逗号来做到这一点。

例如,在Python 3.x环境中:

while i<5:
    print(i)
    i=i+1

将给出以下输出:

0
1
2
3
4

如:

while i<5:
    print(i, end = ' ')
    i=i+1

将给出作为输出:

0 1 2 3 4

This is just a version thing. Since Python 3.x the print is actually a function, so it now takes arguments like any normal function.

The end=' ' is just to say that you want a space after the end of the statement instead of a new line character. In Python 2.x you would have to do this by placing a comma at the end of the print statement.

For example, when in a Python 3.x environment:

while i<5:
    print(i)
    i=i+1

Will give the following output:

0
1
2
3
4

Where as:

while i<5:
    print(i, end = ' ')
    i=i+1

Will give as output:

0 1 2 3 4

回答 6

看来您只是缺少一个开双引号。尝试:

if Verbose:
   print("Building internam Index for %d tile(s) ..." % len(inputTiles), end=' ')

It looks like you’re just missing an opening double-quote. Try:

if Verbose:
   print("Building internam Index for %d tile(s) ..." % len(inputTiles), end=' ')

回答 7

我认为作者可能意味着:

if Verbose:
   print("Building internam Index for %d tile(s) ..." % len(inputTiles), end=' ')

他在之后缺少首字母报价print(

请注意,从Python 3.0开始,它print是与语句相对的函数,如果您使用的是旧版Python,则等效项为:

print "Building internam Index for %d tile(s) ..." % len(inputTiles)

end参数表示该行' '位于末尾,而不是换行符。Python的早期版本中的等效项是:

print "Building internam Index for %d tile(s) ..." % len(inputTiles),

(感谢Ignacio)。

I think the author probably meant:

if Verbose:
   print("Building internam Index for %d tile(s) ..." % len(inputTiles), end=' ')

He’s missing an initial quote after print(.

Note that as of Python 3.0, print is a function as opposed to a statement, if you’re using older versions of Python the equivalent would be:

print "Building internam Index for %d tile(s) ..." % len(inputTiles)

The end parameter means that the line gets ' ' at the end rather than a newline character. The equivalent in earlier versions of Python is:

print "Building internam Index for %d tile(s) ..." % len(inputTiles),

(thanks Ignacio).


回答 8

用 :: python3 filename.py

我有这样的错误,是因为我在驱动器上安装了两个版本的python,即python2.7和python3。以下是我的代码:

#!usr/bin/python

f = open('lines.txt')
for line in f.readlines():
        print(line,end ='')

当我通过命令运行它时,python lines.py出现以下错误

#!usr/bin/python

f = open('lines.txt')
for line in f.readlines():
        print(line,end ='')

当我通过命令python3 lines.py成功运行它时

USE :: python3 filename.py

I had such error , this occured because i have two versions of python installed on my drive namely python2.7 and python3 . Following was my code :

#!usr/bin/python

f = open('lines.txt')
for line in f.readlines():
        print(line,end ='')

when i run it by the command python lines.py I got the following error

#!usr/bin/python

f = open('lines.txt')
for line in f.readlines():
        print(line,end ='')

when I run it by the command python3 lines.py I executed successfully


回答 9

对于python 2.7,我有相同的问题,只需使用“ from __future__ import print_function ”不带引号即可解决此问题。这确保Python 2.6和更高版本的Python 2.x可以使用Python 3.x打印功能。

For python 2.7 I had the same issue Just use “from __future__ import print_function” without quotes to resolve this issue.This Ensures Python 2.6 and later Python 2.x can use Python 3.x print function.


回答 10

如果您使用的是python 2.7,请尝试以下一项:

from __future__ import print_function

Try this one if you are working with python 2.7:

from __future__ import print_function

回答 11

甚至我今天也遇到同样的错误。我经历了一件有趣的事情。如果您使用的是python 3.x并且仍然收到错误,则可能是原因:

您在同一驱动器上安装了多个python版本。当您按f5按钮时,将弹出python shell窗口(版本<3.x)

今天我遇到了同样的错误,并注意到了那件事。相信我,当我从适当的Shell窗口(版本3.x)执行代码时,我得到了令人满意的结果

Even I was getting that same error today. And I’ve experienced an interesting thing. If you’re using python 3.x and still getting the error, it might be a reason:

You have multiple python versions installed on same drive. And when you’re presing the f5 button the python shell window (of ver. < 3.x) pops up

I was getting same error today, and noticed that thing. Trust me, when I execute my code from proper shell window (of ver. 3.x), I got satisfactory results


回答 12

我们需要在使用之前导入标头end='',因为标头不包含在python的正常运行时中。

from __future__ import print_function

现在应该可以正常工作了

we need to import a header before using end='', as it is not included in the python’s normal runtime.

from __future__ import print_function

it shall work perfectly now


回答 13

兼容两者 的Python 2和3

sys.stdout.write('mytext')

仅与Python 2兼容

print 'mytext',

仅与Python 3兼容

print('mytext', end='')

Compatible with both Python 2 & 3:

sys.stdout.write('mytext')

Compatible with only Python 2

print 'mytext',

Compatible with only Python 3

print('mytext', end='')

while(1)vs. while(True)—为什么会有区别(在python 2字节码中)?

问题:while(1)vs. while(True)—为什么会有区别(在python 2字节码中)?

这个问题引起了有关perl中无限循环的问题:while(1)Vs。for(;;)是否存在速度差异?,我决定在python中运行类似的比较。我期望编译器会为while(True): pass和生成相同的字节码while(1): pass,但是python2.7实际上不是这种情况。

以下脚本:

import dis

def while_one():
    while 1:
        pass

def while_true():
    while True:
        pass

print("while 1")
print("----------------------------")
dis.dis(while_one)

print("while True")
print("----------------------------")
dis.dis(while_true)

产生以下结果:

while 1
----------------------------
  4           0 SETUP_LOOP               3 (to 6)

  5     >>    3 JUMP_ABSOLUTE            3
        >>    6 LOAD_CONST               0 (None)
              9 RETURN_VALUE        
while True
----------------------------
  8           0 SETUP_LOOP              12 (to 15)
        >>    3 LOAD_GLOBAL              0 (True)
              6 JUMP_IF_FALSE            4 (to 13)
              9 POP_TOP             

  9          10 JUMP_ABSOLUTE            3
        >>   13 POP_TOP             
             14 POP_BLOCK           
        >>   15 LOAD_CONST               0 (None)
             18 RETURN_VALUE        

使用while True明显更复杂。为什么是这样?

在其他情况下,python的行为就像True等于1:

>>> True == 1
True

>>> True + True
2

为什么要while区分两者?

我注意到python3确实使用相同的操作评估语句:

while 1
----------------------------
  4           0 SETUP_LOOP               3 (to 6) 

  5     >>    3 JUMP_ABSOLUTE            3 
        >>    6 LOAD_CONST               0 (None) 
              9 RETURN_VALUE         
while True
----------------------------
  8           0 SETUP_LOOP               3 (to 6) 

  9     >>    3 JUMP_ABSOLUTE            3 
        >>    6 LOAD_CONST               0 (None) 
              9 RETURN_VALUE         

python3中的布尔值评估方式是否有变化?

Intrigued by this question about infinite loops in perl: while (1) Vs. for (;;) Is there a speed difference?, I decided to run a similar comparison in python. I expected that the compiler would generate the same byte code for while(True): pass and while(1): pass, but this is actually not the case in python2.7.

The following script:

import dis

def while_one():
    while 1:
        pass

def while_true():
    while True:
        pass

print("while 1")
print("----------------------------")
dis.dis(while_one)

print("while True")
print("----------------------------")
dis.dis(while_true)

produces the following results:

while 1
----------------------------
  4           0 SETUP_LOOP               3 (to 6)

  5     >>    3 JUMP_ABSOLUTE            3
        >>    6 LOAD_CONST               0 (None)
              9 RETURN_VALUE        
while True
----------------------------
  8           0 SETUP_LOOP              12 (to 15)
        >>    3 LOAD_GLOBAL              0 (True)
              6 JUMP_IF_FALSE            4 (to 13)
              9 POP_TOP             

  9          10 JUMP_ABSOLUTE            3
        >>   13 POP_TOP             
             14 POP_BLOCK           
        >>   15 LOAD_CONST               0 (None)
             18 RETURN_VALUE        

Using while True is noticeably more complicated. Why is this?

In other contexts, python acts as though True equals 1:

>>> True == 1
True

>>> True + True
2

Why does while distinguish the two?

I noticed that python3 does evaluate the statements using identical operations:

while 1
----------------------------
  4           0 SETUP_LOOP               3 (to 6) 

  5     >>    3 JUMP_ABSOLUTE            3 
        >>    6 LOAD_CONST               0 (None) 
              9 RETURN_VALUE         
while True
----------------------------
  8           0 SETUP_LOOP               3 (to 6) 

  9     >>    3 JUMP_ABSOLUTE            3 
        >>    6 LOAD_CONST               0 (None) 
              9 RETURN_VALUE         

Is there a change in python3 to the way booleans are evaluated?


回答 0

在Python 2.x中,True它不是关键字,而只是一个在类型中定义为1 的内置全局常量bool。因此,解释器仍然必须加载的内容True。换句话说,True是可重新分配的:

Python 2.7 (r27:82508, Jul  3 2010, 21:12:11) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> True = 4
>>> True
4

在Python 3.x中,它实际上变成了关键字和实常数:

Python 3.1.2 (r312:79147, Jul 19 2010, 21:03:37) 
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> True = 4
  File "<stdin>", line 1
SyntaxError: assignment to keyword

因此,解释器可以将while True:循环替换为无限循环。

In Python 2.x, True is not a keyword, but just a built-in global constant that is defined to 1 in the bool type. Therefore the interpreter still has to load the contents of True. In other words, True is reassignable:

Python 2.7 (r27:82508, Jul  3 2010, 21:12:11) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> True = 4
>>> True
4

In Python 3.x it truly becomes a keyword and a real constant:

Python 3.1.2 (r312:79147, Jul 19 2010, 21:03:37) 
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> True = 4
  File "<stdin>", line 1
SyntaxError: assignment to keyword

thus the interpreter can replace the while True: loop with an infinite loop.


回答 1

这不太正确

因此,解释器可以用无限循环代替while True:循环。

因为仍然可以打破循环。但是,确实如此的循环else子句永远不会在Python 3中访问。而且,简化值查找使其与while 1在Python 2中一样快运行,这也是事实。

性能比较

演示一个不平凡的while循环的时间差:

建立

def while1():
    x = 0
    while 1:
        x += 1
        if x == 10:
            break

def whileTrue():
    x = 0
    while True:
        x += 1
        if x == 10:
            break

Python 2

>>> import timeit
>>> min(timeit.repeat(while1))
0.49712109565734863
>>> min(timeit.repeat(whileTrue))
0.756627082824707

Python 3

>>> import timeit
>>> min(timeit.repeat(while1))
0.6462970309949014
>>> min(timeit.repeat(whileTrue))
0.6450748789939098

说明

为了解释差异,在Python 2中:

>>> import keyword
>>> 'True' in keyword.kwlist
False

但在Python 3中:

>>> import keyword
>>> 'True' in keyword.kwlist
True
>>> True = 'true?'
  File "<stdin>", line 1
SyntaxError: can't assign to keyword

由于True是Python 3中的关键字,因此解释程序不必查找该值即可查看是否有人用其他值替换了它。但是由于一个人可以赋值True给另一个值,所以解释器每次都必须查找它。

Python 2总结

如果您在Python 2中有一个紧密而长期运行的循环,则可能应该使用while 1:而不是while True:

Python 3总结

while True:如果您没有打破循环的条件,请使用。

This isn’t quite right,

thus the interpreter can replace the while True: loop with an infinite loop.

as one can still break out of the loop. But it is true that such a loop’s else clause would never be accessed in Python 3. And it is also true that simplifying the value lookup makes it run just as quickly as while 1 in Python 2.

Performance Comparison

Demonstrating the difference in time for a somewhat nontrivial while loop:

Setup

def while1():
    x = 0
    while 1:
        x += 1
        if x == 10:
            break

def whileTrue():
    x = 0
    while True:
        x += 1
        if x == 10:
            break

Python 2

>>> import timeit
>>> min(timeit.repeat(while1))
0.49712109565734863
>>> min(timeit.repeat(whileTrue))
0.756627082824707

Python 3

>>> import timeit
>>> min(timeit.repeat(while1))
0.6462970309949014
>>> min(timeit.repeat(whileTrue))
0.6450748789939098

Explanation

To explain the difference, in Python 2:

>>> import keyword
>>> 'True' in keyword.kwlist
False

but in Python 3:

>>> import keyword
>>> 'True' in keyword.kwlist
True
>>> True = 'true?'
  File "<stdin>", line 1
SyntaxError: can't assign to keyword

Since True is a keyword in Python 3, the interpreter doesn’t have to look up the value to see if someone replaced it with some other value. But since one can assign True to another value, the interpreter has to look it up every time.

Conclusion for Python 2

If you have a tight, long-running loop in Python 2, you probably should use while 1: instead of while True:.

Conclusion for Python 3

Use while True: if you have no condition for breaking out of your loop.


回答 2

这是一个已有7年历史的问题,已经有了一个不错的答案,但是这个问题的误解(任何一个答案都没有解决)使它可能会混淆其他一些标记为重复的问题。

在其他情况下,python的行为就好像True等于1:

>>> True == 1
True

>>> True + True
2

为什么同时区分两者?

实际上,while这里根本没有做任何不同的事情。它区分1,并True以完全同样的方式,+例如做。


这是2.7:

>>> dis.dis('True == 1')
  1           0 LOAD_GLOBAL              0 (True)
              3 LOAD_CONST               1 (1)
              6 COMPARE_OP               2 (==)
              9 RETURN_VALUE

>>> dis.dis('True == 1')
  1           0 LOAD_GLOBAL              0 (True)
              3 LOAD_GLOBAL              0 (True)
              6 BINARY_ADD
              9 RETURN_VALUE

现在比较:

>>> dis.dis('1 + 1')
  1           0 LOAD_CONST               1 (2)
              3 RETURN_VALUE

LOAD_GLOBAL (True)为每个发出一个,True优化器对全局没有任何作用。因此,while区分1True出于完全相同的原因+。(并且==不区分它们,因为优化器不会优化比较。)


现在比较3.6:

>>> dis.dis('True == 1')
  1           0 LOAD_CONST               0 (True)
              2 LOAD_CONST               1 (1)
              4 COMPARE_OP               2 (==)
              6 RETURN_VALUE

>>> dis.dis('True + True')
  1           0 LOAD_CONST               1 (2)
              2 RETURN_VALUE

在这里,它LOAD_CONST (True)为关键字发出一个,优化器可以利用它。因此,True + 1 区分,出于完全相同的原因while True。(并且==仍然不能区分它们,因为优化器不会优化比较。)


同时,如果未对代码进行优化,则解释器最终将在所有这三种情况下进行处理,True并且1完全相同。bool是的子类int,并且从继承其大多数方法int,并且True内部整数值为1。因此,无论您要进行while测试(__bool__在3.x,__nonzero__2.x中),比较(__eq__)还是算术(__add__),无论您使用True还是,都在调用相同的方法1

This is a 7-year-old question that already has a great answer, but a misconception in the question, which isn’t addressed in any of the answers, makes it potentially confusing for some of the other questions marked as duplicates.

In other contexts, python acts as though True equals 1:

>>> True == 1
True

>>> True + True
2

Why does while distinguish the two?

In fact, while isn’t doing anything different here at all. It distinguishes 1 and True in exactly the same way that the + example does.


Here’s 2.7:

>>> dis.dis('True == 1')
  1           0 LOAD_GLOBAL              0 (True)
              3 LOAD_CONST               1 (1)
              6 COMPARE_OP               2 (==)
              9 RETURN_VALUE

>>> dis.dis('True == 1')
  1           0 LOAD_GLOBAL              0 (True)
              3 LOAD_GLOBAL              0 (True)
              6 BINARY_ADD
              9 RETURN_VALUE

Now compare:

>>> dis.dis('1 + 1')
  1           0 LOAD_CONST               1 (2)
              3 RETURN_VALUE

It’s emitting a LOAD_GLOBAL (True) for each True, and there’s nothing the optimizer can do with a global. So, while distinguishes 1 and True for the exact same reason that + does. (And == doesn’t distinguish them because the optimizer doesn’t optimize out comparisons.)


Now compare 3.6:

>>> dis.dis('True == 1')
  1           0 LOAD_CONST               0 (True)
              2 LOAD_CONST               1 (1)
              4 COMPARE_OP               2 (==)
              6 RETURN_VALUE

>>> dis.dis('True + True')
  1           0 LOAD_CONST               1 (2)
              2 RETURN_VALUE

Here, it’s emitting a LOAD_CONST (True) for the keyword, which the optimizer can take advantage of. So, True + 1 doesn’t distinguish, for exactly the same reason while True doesn’t. (And == still doesn’t distinguish them because the optimizer doesn’t optimize out comparisons.)


Meanwhile, if the code isn’t optimized out, the interpreter ends up treating True and 1 exactly the same in all three of these cases. bool is a subclass of int, and inherits most of its methods from int, and True has an internal integer value of 1. So, whether you’re doing a while test (__bool__ in 3.x, __nonzero__ in 2.x), a comparison (__eq__), or arithmetic (__add__), you’re calling the same method whether you use True or 1.


创建Django模型或更新(如果存在)

问题:创建Django模型或更新(如果存在)

我想创建一个模型对象,例如Person,如果不存在person的id,或者我将得到该person对象。

创建新人员的代码如下:

class Person(models.Model):
    identifier = models.CharField(max_length = 10)
    name = models.CharField(max_length = 20)
    objects = PersonManager()

class PersonManager(models.Manager):
    def create_person(self, identifier):
        person = self.create(identifier = identifier)
        return person

但是我不知道在哪里检查并获取现有的人对象。

I want to create a model object, like Person, if person’s id doesn’t not exist, or I will get that person object.

The code to create a new person as following:

class Person(models.Model):
    identifier = models.CharField(max_length = 10)
    name = models.CharField(max_length = 20)
    objects = PersonManager()

class PersonManager(models.Manager):
    def create_person(self, identifier):
        person = self.create(identifier = identifier)
        return person

But I don’t know where to check and get the existing person object.


回答 0

如果您正在寻找“如果存在则更新,否则会创建”用例,请参阅@Zags绝佳答案


Django中已经有一个get_or_createhttps://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create

对你来说可能是:

id = 'some identifier'
person, created = Person.objects.get_or_create(identifier=id)

if created:
   # means you have created a new person
else:
   # person just refers to the existing one

If you’re looking for “update if exists else create” use case, please refer to @Zags excellent answer


Django already has a get_or_create, https://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create

For you it could be :

id = 'some identifier'
person, created = Person.objects.get_or_create(identifier=id)

if created:
   # means you have created a new person
else:
   # person just refers to the existing one

回答 1

目前尚不清楚您的问题是否要求使用get_or_create方法(至少可从Django 1.3获得)或update_or_create方法(Django 1.7中的新增功能)。这取决于您要如何更新用户对象。

用法示例如下:

# In both cases, the call will get a person object with matching
# identifier or create one if none exists; if a person is created,
# it will be created with name equal to the value in `name`.

# In this case, if the Person already exists, its existing name is preserved
person, created = Person.objects.get_or_create(
        identifier=identifier, defaults={"name": name}
)

# In this case, if the Person already exists, its name is updated
person, created = Person.objects.update_or_create(
        identifier=identifier, defaults={"name": name}
)

It’s unclear whether your question is asking for the get_or_create method (available from at least Django 1.3) or the update_or_create method (new in Django 1.7). It depends on how you want to update the user object.

Sample use is as follows:

# In both cases, the call will get a person object with matching
# identifier or create one if none exists; if a person is created,
# it will be created with name equal to the value in `name`.

# In this case, if the Person already exists, its existing name is preserved
person, created = Person.objects.get_or_create(
        identifier=identifier, defaults={"name": name}
)

# In this case, if the Person already exists, its name is updated
person, created = Person.objects.update_or_create(
        identifier=identifier, defaults={"name": name}
)

回答 2

Django支持此功能,请检查get_or_create

person, created = Person.objects.get_or_create(name='abc')
if created:
    # A new person object created
else:
    # person object already exists

Django has support for this, check get_or_create

person, created = Person.objects.get_or_create(name='abc')
if created:
    # A new person object created
else:
    # person object already exists

回答 3

对于仅少量的对象,update_or_create可以很好地工作,但是如果您要处理的是大型集合,则扩展性就不好。update_or_create始终首先运行SELECT,然后再运行UPDATE。

for the_bar in bars:
    updated_rows = SomeModel.objects.filter(bar=the_bar).update(foo=100)
        if not updated_rows:
            # if not exists, create new
            SomeModel.objects.create(bar=the_bar, foo=100)

这充其量仅会运行第一个更新查询,并且仅当它匹配零行时才运行另一个INSERT查询。如果您希望大多数行实际存在,那将大大提高您的性能。

但这全都取决于您的用例。如果您期望大部分插入内容,则可以选择bulk_create()命令。

For only a small amount of objects the update_or_create works well, but if you’re doing over a large collection it won’t scale well. update_or_create always first runs a SELECT and thereafter an UPDATE.

for the_bar in bars:
    updated_rows = SomeModel.objects.filter(bar=the_bar).update(foo=100)
        if not updated_rows:
            # if not exists, create new
            SomeModel.objects.create(bar=the_bar, foo=100)

This will at best only run the first update-query, and only if it matched zero rows run another INSERT-query. Which will greatly increase your performance if you expect most of the rows to actually be existing.

It all comes down to your use case though. If you are expecting mostly inserts then perhaps the bulk_create() command could be an option.


回答 4

我想我要添加一个答案,因为您的问题标题看起来像是在询问如何创建或更新,而不是按照问题正文中的描述进行获取或创建。

如果您确实想创建或更新对象,则默认情况下,.save()方法已经具有以下行为,来自docs

Django提取了使用INSERT或UPDATE SQL语句的需求。具体来说,当您调用save()时,Django遵循以下算法:

如果对象的主键属性设置为计算结果为True的值(即,无或空字符串以外的值),则Django将执行UPDATE。如果未设置对象的主键属性,或者UPDATE未更新任何内容,则Django执行INSERT。

值得注意的是,当他们说“如果UPDATE不更新任何内容”时,它们实际上是指您为对象提供的ID在数据库中不存在的情况。

Thought I’d add an answer since your question title looks like it is asking how to create or update, rather than get or create as described in the question body.

If you did want to create or update an object, the .save() method already has this behaviour by default, from the docs:

Django abstracts the need to use INSERT or UPDATE SQL statements. Specifically, when you call save(), Django follows this algorithm:

If the object’s primary key attribute is set to a value that evaluates to True (i.e., a value other than None or the empty string), Django executes an UPDATE. If the object’s primary key attribute is not set or if the UPDATE didn’t update anything, Django executes an INSERT.

It’s worth noting that when they say ‘if the UPDATE didn’t update anything’ they are essentially referring to the case where the id you gave the object doesn’t already exist in the database.


回答 5

如果创建时的输入之一是主键,那么这就足够了:

Person.objects.get_or_create(id=1)

如果存在,它将自动更新,因为不允许两个具有相同主键的数据。

If one of the input when you create is a primary key, this will be enough:

Person.objects.get_or_create(id=1)

It will automatically update if exist since two data with the same primary key is not allowed.


写一个字典到txt文件并读回去?

问题:写一个字典到txt文件并读回去?

我正在尝试将字典写入txt文件。然后使用键入键以读取dict值raw_input。我觉得我只差一步,但现在已经找了一段时间。

我得到这个错误

File "name.py", line 24, in reading
    print whip[name]
TypeError: string indices must be integers, not str

我的代码:

#!/usr/bin/env python
from sys import exit

class Person(object):
    def __init__(self):
        self.name = ""
        self.address = ""
        self.phone = ""
        self.age = ""
        self.whip = {}

    def writing(self):
        self.whip[p.name] = p.age, p.address, p.phone
        target = open('deed.txt', 'a')
        target.write(str(self.whip))
        print self.whip

    def reading(self):
        self.whip = open('deed.txt', 'r').read()
        name = raw_input("> ")
        if name in self.whip:
            print self.whip[name]

p = Person()

while True:
    print "Type:\n\t*read to read data base\n\t*write to write to data base\n\t*exit to exit"
    action = raw_input("\n> ")
    if "write" in action:
        p.name = raw_input("Name?\n> ")
        p.phone = raw_input("Phone Number?\n> ")
        p.age = raw_input("Age?\n> ")
        p.address = raw_input("Address?\n>")
        p.writing()
    elif "read" in action:
        p.reading()
    elif "exit" in action:
        exit(0)

I am trying to write a dictionary to a txt file. Then read the dict values by typing the keys with raw_input. I feel like I am just missing one step but I have been looking for a while now.

I get this error

File "name.py", line 24, in reading
    print whip[name]
TypeError: string indices must be integers, not str

My code:

#!/usr/bin/env python
from sys import exit

class Person(object):
    def __init__(self):
        self.name = ""
        self.address = ""
        self.phone = ""
        self.age = ""
        self.whip = {}

    def writing(self):
        self.whip[p.name] = p.age, p.address, p.phone
        target = open('deed.txt', 'a')
        target.write(str(self.whip))
        print self.whip

    def reading(self):
        self.whip = open('deed.txt', 'r').read()
        name = raw_input("> ")
        if name in self.whip:
            print self.whip[name]

p = Person()

while True:
    print "Type:\n\t*read to read data base\n\t*write to write to data base\n\t*exit to exit"
    action = raw_input("\n> ")
    if "write" in action:
        p.name = raw_input("Name?\n> ")
        p.phone = raw_input("Phone Number?\n> ")
        p.age = raw_input("Age?\n> ")
        p.address = raw_input("Address?\n>")
        p.writing()
    elif "read" in action:
        p.reading()
    elif "exit" in action:
        exit(0)

回答 0

您的代码几乎正确!没错,您只差一步。当您读入文件时,您将以字符串形式读取它;但是您想将字符串转换成字典。

您看到的错误消息是因为self.whip是字符串,而不是字典。

我首先写道,您可以只将字符串输入其中,dict()但这不起作用!您需要做其他事情。

这是最简单的方法:将字符串输入eval()。像这样:

def reading(self):
    s = open('deed.txt', 'r').read()
    self.whip = eval(s)

您可以一行完成它,但是我认为这种方式看起来很混乱:

def reading(self):
    self.whip = eval(open('deed.txt', 'r').read())

但是eval()有时不推荐。问题是,eval()它将评估任何字符串,如果有人欺骗您运行了真正棘手的字符串,则可能会发生不良情况。在这种情况下,您只是eval()在自己的文件上运行,因此应该可以。

但是因为eval()有用,所以有人替代了它,使其更安全。这被称为literal_eval,您可以从名为的Python模块中获取该代码ast

import ast

def reading(self):
    s = open('deed.txt', 'r').read()
    self.whip = ast.literal_eval(s)

ast.literal_eval() 只会评估转换为基本Python类型的字符串,因此,棘手的字符串无法对您的计算机造成不良影响。

编辑

实际上,Python中的最佳实践是使用with语句来确保正确关闭文件。重写以上内容以使用以下with语句:

import ast

def reading(self):
    with open('deed.txt', 'r') as f:
        s = f.read()
        self.whip = ast.literal_eval(s)

在最流行的Python(称为“ CPython”)中,您通常不需要该with语句,因为内置的“垃圾收集”功能将确定您已完成文件并会为您关闭文件。但是其他Python实现,例如“ Jython”(用于Java VM的Python)或“ PyPy”(具有即时代码优化的非常酷的实验系统)可能无法为您关闭文件。养成使用的习惯是很好的with,而且我认为这使代码非常容易理解。

Your code is almost right! You are right, you are just missing one step. When you read in the file, you are reading it as a string; but you want to turn the string back into a dictionary.

The error message you saw was because self.whip was a string, not a dictionary.

I first wrote that you could just feed the string into dict() but that doesn’t work! You need to do something else.

Example

Here is the simplest way: feed the string into eval(). Like so:

def reading(self):
    s = open('deed.txt', 'r').read()
    self.whip = eval(s)

You can do it in one line, but I think it looks messy this way:

def reading(self):
    self.whip = eval(open('deed.txt', 'r').read())

But eval() is sometimes not recommended. The problem is that eval() will evaluate any string, and if someone tricked you into running a really tricky string, something bad might happen. In this case, you are just running eval() on your own file, so it should be okay.

But because eval() is useful, someone made an alternative to it that is safer. This is called literal_eval and you get it from a Python module called ast.

import ast

def reading(self):
    s = open('deed.txt', 'r').read()
    self.whip = ast.literal_eval(s)

ast.literal_eval() will only evaluate strings that turn into the basic Python types, so there is no way that a tricky string can do something bad on your computer.

EDIT

Actually, best practice in Python is to use a with statement to make sure the file gets properly closed. Rewriting the above to use a with statement:

import ast

def reading(self):
    with open('deed.txt', 'r') as f:
        s = f.read()
        self.whip = ast.literal_eval(s)

In the most popular Python, known as “CPython”, you usually don’t need the with statement as the built-in “garbage collection” features will figure out that you are done with the file and will close it for you. But other Python implementations, like “Jython” (Python for the Java VM) or “PyPy” (a really cool experimental system with just-in-time code optimization) might not figure out to close the file for you. It’s good to get in the habit of using with, and I think it makes the code pretty easy to understand.


回答 1

您是否尝试过json模块?JSON格式与python字典非常相似。它是人类可读/可写的:

>>> import json
>>> d = {"one":1, "two":2}
>>> json.dump(d, open("text.txt",'w'))

此代码转储到文本文件

$ cat text.txt 
{"two": 2, "one": 1}

您也可以从JSON文件加载:

>>> d2 = json.load(open("text.txt"))
>>> print d2
{u'two': 2, u'one': 1}

Have you tried the json module? JSON format is very similar to python dictionary. And it’s human readable/writable:

>>> import json
>>> d = {"one":1, "two":2}
>>> json.dump(d, open("text.txt",'w'))

This code dumps to a text file

$ cat text.txt 
{"two": 2, "one": 1}

Also you can load from a JSON file:

>>> d2 = json.load(open("text.txt"))
>>> print d2
{u'two': 2, u'one': 1}

回答 2

要将Python对象存储在文件中,请使用以下pickle模块:

import pickle

a = {
  'a': 1,
  'b': 2
}

with open('file.txt', 'wb') as handle:
  pickle.dump(a, handle)

with open('file.txt', 'rb') as handle:
  b = pickle.loads(handle.read())

print a == b # True

请注意,我从未设置b = a,而是a将其腌制到一个文件中,然后将其解开b

至于你的错误:

self.whip = open('deed.txt', 'r').read()

self.whip是一个字典对象。deed.txt包含文本,因此当您将的内容加载deed.txt到中时self.whip,它便self.whip成为其自身的字符串表示形式。

您可能希望将字符串求值返回给Python对象:

self.whip = eval(open('deed.txt', 'r').read())

注意eval听起来如何evil。那是故意的。请改用pickle模块。

To store Python objects in files, use the pickle module:

import pickle

a = {
  'a': 1,
  'b': 2
}

with open('file.txt', 'wb') as handle:
  pickle.dump(a, handle)

with open('file.txt', 'rb') as handle:
  b = pickle.loads(handle.read())

print a == b # True

Notice that I never set b = a, but instead pickled a to a file and then unpickled it into b.

As for your error:

self.whip = open('deed.txt', 'r').read()

self.whip was a dictionary object. deed.txt contains text, so when you load the contents of deed.txt into self.whip, self.whip becomes the string representation of itself.

You’d probably want to evaluate the string back into a Python object:

self.whip = eval(open('deed.txt', 'r').read())

Notice how eval sounds like evil. That’s intentional. Use the pickle module instead.


回答 3

我创建了自己的函数,这些函数非常好用:

def writeDict(dict, filename, sep):
    with open(filename, "a") as f:
        for i in dict.keys():            
            f.write(i + " " + sep.join([str(x) for x in dict[i]]) + "\n")

它将首先存储键名,然后存储所有值。请注意,在这种情况下,我的字典包含整数,因此将其转换为int。这很可能是您需要根据情况更改的部分。

def readDict(filename, sep):
    with open(filename, "r") as f:
        dict = {}
        for line in f:
            values = line.split(sep)
            dict[values[0]] = {int(x) for x in values[1:len(values)]}
        return(dict)

I created my own functions which work really nicely:

def writeDict(dict, filename, sep):
    with open(filename, "a") as f:
        for i in dict.keys():            
            f.write(i + " " + sep.join([str(x) for x in dict[i]]) + "\n")

It will store the keyname first, followed by all values. Note that in this case my dict contains integers so that’s why it converts to int. This is most likely the part you need to change for your situation.

def readDict(filename, sep):
    with open(filename, "r") as f:
        dict = {}
        for line in f:
            values = line.split(sep)
            dict[values[0]] = {int(x) for x in values[1:len(values)]}
        return(dict)

回答 4

嗨,有一种方法可以读写字典到文件,您可以将字典转换为JSON格式并快速读写,只需执行以下操作:

为了你的约会:

 import json

 your_dictionary = {"some_date" : "date"}
 f = open('destFile.txt', 'w+')
 f.write(json.dumps(your_dictionary))

读取您的数据:

 import json

 f = open('destFile.txt', 'r')
 your_dictionary = json.loads(f.read())

Hi there is a way to write and read the dictionary to file you can turn your dictionary to JSON format and read and write quickly just do this :

To write your date:

 import json

 your_dictionary = {"some_date" : "date"}
 f = open('destFile.txt', 'w+')
 f.write(json.dumps(your_dictionary))

and to read your data:

 import json

 f = open('destFile.txt', 'r')
 your_dictionary = json.loads(f.read())

回答 5

您可以遍历键值对并将其写入文件

pair = {'name': name,'location': location}
with open('F:\\twitter.json', 'a') as f:
     f.writelines('{}:{}'.format(k,v) for k, v in pair.items())
     f.write('\n')

You can iterate through the key-value pair and write it into file

pair = {'name': name,'location': location}
with open('F:\\twitter.json', 'a') as f:
     f.writelines('{}:{}'.format(k,v) for k, v in pair.items())
     f.write('\n')

函数式编程中的“ pythonic”等同于“ fold”函数是什么?

问题:函数式编程中的“ pythonic”等同于“ fold”函数是什么?

在Haskell中,实现以下目标的最惯用的方法是:

foldl (+) 0 [1,2,3,4,5]
--> 15

或等效的Ruby:

[1,2,3,4,5].inject(0) {|m,x| m + x}
#> 15

显然,Python提供了reduce与fold完全相同的功能,但实际上是如上所述的fold的实现,但是,有人告诉我,“ pythonic”编程方式是避免使用lambda术语和高阶函数,并尽可能使用列表理解。因此,有没有一种首选的方式来折叠列表或不是Python reduce函数的类似列表的结构,或者是reduce惯用的方式来实现此目的?

What is the most idiomatic way to achieve something like the following, in Haskell:

foldl (+) 0 [1,2,3,4,5]
--> 15

Or its equivalent in Ruby:

[1,2,3,4,5].inject(0) {|m,x| m + x}
#> 15

Obviously, Python provides the reduce function, which is an implementation of fold, exactly as above, however, I was told that the ‘pythonic’ way of programming was to avoid lambda terms and higher-order functions, preferring list-comprehensions where possible. Therefore, is there a preferred way of folding a list, or list-like structure in Python that isn’t the reduce function, or is reduce the idiomatic way of achieving this?


回答 0

用Python方式对数组求和的方法是使用sum。为了其他目的,有时可以使用reduce(来自functools模块)和operator模块的某种组合,例如:

def product(xs):
    return reduce(operator.mul, xs, 1)

请注意,reduce实际上这是一个foldl用Haskell表示的。没有执行折叠的特殊语法,没有内置函数foldr,实际上reduce与非关联运算符一起使用被认为是不良样式。

使用高阶函数是相当Python的;它很好地利用了Python的原理,即一切都是对象,包括函数和类。没错,lambda被某些Pythonista所反对,但这主要是因为它们变得复杂时往往不太可读。

The Pythonic way of summing an array is using sum. For other purposes, you can sometimes use some combination of reduce (from the functools module) and the operator module, e.g.:

def product(xs):
    return reduce(operator.mul, xs, 1)

Be aware that reduce is actually a foldl, in Haskell terms. There is no special syntax to perform folds, there’s no builtin foldr, and actually using reduce with non-associative operators is considered bad style.

Using higher-order functions is quite pythonic; it makes good use of Python’s principle that everything is an object, including functions and classes. You are right that lambdas are frowned upon by some Pythonistas, but mostly because they tend not to be very readable when they get complex.


回答 1

哈斯克尔

foldl (+) 0 [1,2,3,4,5]

Python

reduce(lambda a,b: a+b, [1,2,3,4,5], 0)

显然,这是一个说明问题的简单例子。在Python中,您只需要这样做sum([1,2,3,4,5]),甚至Haskell纯粹主义者通常也会更喜欢sum [1,2,3,4,5]

对于没有明显便利函数的非平凡场景,惯用的pythonic方法是显式写出for循环并使用可变变量分配,而不是使用reduceor fold

那根本不是功能样式,而是“ pythonic”方式。Python并非为功能纯正者设计。了解Python如何支持流控制的异常,以了解非功能性惯用python的情况。

Haskell

foldl (+) 0 [1,2,3,4,5]

Python

reduce(lambda a,b: a+b, [1,2,3,4,5], 0)

Obviously, that is a trivial example to illustrate a point. In Python you would just do sum([1,2,3,4,5]) and even Haskell purists would generally prefer sum [1,2,3,4,5].

For non-trivial scenarios when there is no obvious convenience function, the idiomatic pythonic approach is to explicitly write out the for loop and use mutable variable assignment instead of using reduce or a fold.

That is not at all the functional style, but that is the “pythonic” way. Python is not designed for functional purists. See how Python favors exceptions for flow control to see how non-functional idiomatic python is.


回答 2

在Python 3中,reduce已被删除:版本说明。不过,您可以使用functools模块

import operator, functools
def product(xs):
    return functools.reduce(operator.mul, xs, 1)

另一方面,文档表达了对for-loop而不是的偏好reduce,因此:

def product(xs):
    result = 1
    for i in xs:
        result *= i
    return result

In Python 3, the reduce has been removed: Release notes. Nevertheless you can use the functools module

import operator, functools
def product(xs):
    return functools.reduce(operator.mul, xs, 1)

On the other hand, the documentation expresses preference towards for-loop instead of reduce, hence:

def product(xs):
    result = 1
    for i in xs:
        result *= i
    return result

回答 3

您也可以重新发明轮子:

def fold(f, l, a):
    """
    f: the function to apply
    l: the list to fold
    a: the accumulator, who is also the 'zero' on the first call
    """ 
    return a if(len(l) == 0) else fold(f, l[1:], f(a, l[0]))

print "Sum:", fold(lambda x, y : x+y, [1,2,3,4,5], 0)

print "Any:", fold(lambda x, y : x or y, [False, True, False], False)

print "All:", fold(lambda x, y : x and y, [False, True, False], True)

# Prove that result can be of a different type of the list's elements
print "Count(x==True):", 
print fold(lambda x, y : x+1 if(y) else x, [False, True, True], 0)

You can reinvent the wheel as well:

def fold(f, l, a):
    """
    f: the function to apply
    l: the list to fold
    a: the accumulator, who is also the 'zero' on the first call
    """ 
    return a if(len(l) == 0) else fold(f, l[1:], f(a, l[0]))

print "Sum:", fold(lambda x, y : x+y, [1,2,3,4,5], 0)

print "Any:", fold(lambda x, y : x or y, [False, True, False], False)

print "All:", fold(lambda x, y : x and y, [False, True, False], True)

# Prove that result can be of a different type of the list's elements
print "Count(x==True):", 
print fold(lambda x, y : x+1 if(y) else x, [False, True, True], 0)

回答 4

并不是真正回答这个问题,而是折叠和折叠的一线:

a = [8,3,4]

## Foldl
reduce(lambda x,y: x**y, a)
#68719476736

## Foldr
reduce(lambda x,y: y**x, a[::-1])
#14134776518227074636666380005943348126619871175004951664972849610340958208L

Not really answer to the question, but one-liners for foldl and foldr:

a = [8,3,4]

## Foldl
reduce(lambda x,y: x**y, a)
#68719476736

## Foldr
reduce(lambda x,y: y**x, a[::-1])
#14134776518227074636666380005943348126619871175004951664972849610340958208L

回答 5

开始Python 3.8并引入赋值表达式(PEP 572):=运算符),这使您可以为表达式的结果命名,我们可以使用列表推导来复制其他语言称为fold / foldleft / reduce的操作:

给定一个列表,一个约简函数和一个累加器:

items = [1, 2, 3, 4, 5]
f = lambda acc, x: acc * x
accumulator = 1

我们可以折叠itemsf在为了获得所得accumulation

[accumulator := f(accumulator, x) for x in items]
# accumulator = 120

或以压缩形式形成:

acc = 1; [acc := acc * x for x in [1, 2, 3, 4, 5]]
# acc = 120

请注意,这实际上也是“ scanleft”操作,因为列表理解的结果表示每个步骤的累加状态:

acc = 1
scanned = [acc := acc * x for x in [1, 2, 3, 4, 5]]
# scanned = [1, 2, 6, 24, 120]
# acc = 120

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), which gives the possibility to name the result of an expression, we can use a list comprehension to replicate what other languages call fold/foldleft/reduce operations:

Given a list, a reducing function and an accumulator:

items = [1, 2, 3, 4, 5]
f = lambda acc, x: acc * x
accumulator = 1

we can fold items with f in order to obtain the resulting accumulation:

[accumulator := f(accumulator, x) for x in items]
# accumulator = 120

or in a condensed formed:

acc = 1; [acc := acc * x for x in [1, 2, 3, 4, 5]]
# acc = 120

Note that this is actually also a “scanleft” operation as the result of the list comprehension represents the state of the accumulation at each step:

acc = 1
scanned = [acc := acc * x for x in [1, 2, 3, 4, 5]]
# scanned = [1, 2, 6, 24, 120]
# acc = 120

回答 6

对这个(减少)问题的实际答案是:只需使用一个循环!

initial_value = 0
for x in the_list:
    initial_value += x #or any function.

这将比reduce更快,而PyPy之类的东西可以优化循环。

顺便说一句,总和的情况应该用sum函数来解决

The actual answer to this (reduce) problem is: Just use a loop!

initial_value = 0
for x in the_list:
    initial_value += x #or any function.

This will be faster than a reduce and things like PyPy can optimize loops like that.

BTW, the sum case should be solved with the sum function


回答 7

我相信这个问题的一些回答者错过了该fold功能作为抽象工具的广泛含义。是的,sum可以对整数列表执行相同的操作,但这是不重要的情况。fold更通用。当您具有一系列形状各异的数据结构并想要清晰地表达聚合时,此功能很有用。因此,不必for每次都用一个聚合变量建立一个循环并手动重新计算它,fold函数(或reduce似乎对应的Python版本)允许程序员通过简单地提供以下内容来更清楚地表达聚合的意图:两件事情:

  • 聚合的默认起始值​​或“种子”值。
  • 该函数采用聚合的当前值(以“种子”开头)和列表中的下一个元素,并返回下一个聚合值。

I believe some of the respondents of this question have missed the broader implication of the fold function as an abstract tool. Yes, sum can do the same thing for a list of integers, but this is a trivial case. fold is more generic. It is useful when you have a sequence of data structures of varying shape and want to cleanly express an aggregation. So instead of having to build up a for loop with an aggregate variable and manually recompute it each time, a fold function (or the Python version, which reduce appears to correspond to) allows the programmer to express the intent of the aggregation much more plainly by simply providing two things:

  • A default starting or “seed” value for the aggregation.
  • A function that takes the current value of the aggregation (starting with the “seed”) and the next element in the list, and returns the next aggregation value.

回答 8

我参加聚会可能已经很晚了,但是我们可以foldr使用简单的lambda演算和curried函数创建自定义项。这是我在python中实现的foldr。

def foldr(func):
    def accumulator(acc):
        def listFunc(l):
            if l:
                x = l[0]
                xs = l[1:]
                return func(x)(foldr(func)(acc)(xs))
            else:
                return acc
        return listFunc
    return accumulator  


def curried_add(x):
    def inner(y):
        return x + y
    return inner

def curried_mult(x):
    def inner(y):
        return x * y
    return inner

print foldr(curried_add)(0)(range(1, 6))
print foldr(curried_mult)(1)(range(1, 6))

即使实现是递归的(可能很慢),它也会分别打印值15120

I may be quite late to the party, but we can create custom foldr using simple lambda calculus and curried function. Here is my implementation of foldr in python.

def foldr(func):
    def accumulator(acc):
        def listFunc(l):
            if l:
                x = l[0]
                xs = l[1:]
                return func(x)(foldr(func)(acc)(xs))
            else:
                return acc
        return listFunc
    return accumulator  


def curried_add(x):
    def inner(y):
        return x + y
    return inner

def curried_mult(x):
    def inner(y):
        return x * y
    return inner

print foldr(curried_add)(0)(range(1, 6))
print foldr(curried_mult)(1)(range(1, 6))

Even though the implementation is recursive (might be slow), it will print the values 15 and 120 respectively


如何为容器对象实现__iter __(self)(Python)

问题:如何为容器对象实现__iter __(self)(Python)

我已经编写了一个自定义容器对象。

根据此页面,我需要在我的对象上实现此方法:

__iter__(self)

但是,在跟踪Python参考手册中指向Iterator Types的链接时,没有给出如何实现自己的示例。

有人可以发布一个片段(或链接到资源)来显示如何执行此操作吗?

我正在写的容器是一个映射(即通过唯一键存储值)。dict可以这样迭代:

for k, v in mydict.items()

在这种情况下,我需要能够在迭代器中返回两个元素(一个元组?)。尚不清楚如何实现这样的迭代器(尽管已经提供了好几个答案)。有人可以进一步说明如何为类似地图的容器对象实现迭代器吗?(即充当字典的自定义类)?

I have written a custom container object.

According to this page, I need to implement this method on my object:

__iter__(self)

However, upon following up the link to Iterator Types in the Python reference manual, there are no examples given of how to implement your own.

Can someone post a snippet (or link to a resource), that shows how to do this?

The container I am writing, is a map (i.e. stores values by unique keys). dicts can be iterated like this:

for k, v in mydict.items()

In this case I need to be able to return two elements (a tuple?) in the iterator. It is still not clear how to implement such an iterator (despite the several answers that have been kindly provided). Could someone please shed some more light on how to implement an iterator for a map-like container object? (i.e. a custom class that acts like a dict)?


回答 0

我通常会使用一个生成器函数。每次使用yield语句时,都会在序列中添加一个项目。

下面的代码将创建一个迭代器,该迭代器生成五个,然后生成some_list中的每个项目。

def __iter__(self):
   yield 5
   yield from some_list

3.3 yield from之前的版本不存在,因此您必须执行以下操作:

def __iter__(self):
   yield 5
   for x in some_list:
      yield x

I normally would use a generator function. Each time you use a yield statement, it will add an item to the sequence.

The following will create an iterator that yields five, and then every item in some_list.

def __iter__(self):
   yield 5
   yield from some_list

Pre-3.3, yield from didn’t exist, so you would have to do:

def __iter__(self):
   yield 5
   for x in some_list:
      yield x

回答 1

另一种选择是从适当的抽象基类继承自`集合模块记录在这里

如果容器是其自己的迭代器,则可以从继承 collections.Iterator。您只需要实现该next方法即可。

一个例子是:

>>> from collections import Iterator
>>> class MyContainer(Iterator):
...     def __init__(self, *data):
...         self.data = list(data)
...     def next(self):
...         if not self.data:
...             raise StopIteration
...         return self.data.pop()
...         
...     
... 
>>> c = MyContainer(1, "two", 3, 4.0)
>>> for i in c:
...     print i
...     
... 
4.0
3
two
1

在查看collections模块时,请考虑从继承SequenceMapping或者如果更合适,则从另一个抽象基类继承。这是一个Sequence子类的示例:

>>> from collections import Sequence
>>> class MyContainer(Sequence):
...     def __init__(self, *data):
...         self.data = list(data)
...     def __getitem__(self, index):
...         return self.data[index]
...     def __len__(self):
...         return len(self.data)
...         
...     
... 
>>> c = MyContainer(1, "two", 3, 4.0)
>>> for i in c:
...     print i
...     
... 
1
two
3
4.0

注意:感谢Glenn Maynard提请我注意需要澄清一方面迭代器与另一方面是可迭代容器而不是迭代器的容器之间的区别。

Another option is to inherit from the appropriate abstract base class from the `collections module as documented here.

In case the container is its own iterator, you can inherit from collections.Iterator. You only need to implement the next method then.

An example is:

>>> from collections import Iterator
>>> class MyContainer(Iterator):
...     def __init__(self, *data):
...         self.data = list(data)
...     def next(self):
...         if not self.data:
...             raise StopIteration
...         return self.data.pop()
...         
...     
... 
>>> c = MyContainer(1, "two", 3, 4.0)
>>> for i in c:
...     print i
...     
... 
4.0
3
two
1

While you are looking at the collections module, consider inheriting from Sequence, Mapping or another abstract base class if that is more appropriate. Here is an example for a Sequence subclass:

>>> from collections import Sequence
>>> class MyContainer(Sequence):
...     def __init__(self, *data):
...         self.data = list(data)
...     def __getitem__(self, index):
...         return self.data[index]
...     def __len__(self):
...         return len(self.data)
...         
...     
... 
>>> c = MyContainer(1, "two", 3, 4.0)
>>> for i in c:
...     print i
...     
... 
1
two
3
4.0

NB: Thanks to Glenn Maynard for drawing my attention to the need to clarify the difference between iterators on the one hand and containers that are iterables rather than iterators on the other.


回答 2

__iter__()如果已经定义了next()方法(生成器对象),通常只是返回self:

这是生成器的虚拟示例:

class Test(object):

    def __init__(self, data):
       self.data = data

    def next(self):
        if not self.data:
           raise StopIteration
        return self.data.pop()

    def __iter__(self):
        return self

__iter__()也可以这样使用:http : //mail.python.org/pipermail/tutor/2006-January/044455.html

usually __iter__() just return self if you have already define the next() method (generator object):

here is a Dummy example of a generator :

class Test(object):

    def __init__(self, data):
       self.data = data

    def next(self):
        if not self.data:
           raise StopIteration
        return self.data.pop()

    def __iter__(self):
        return self

but __iter__() can also be used like this: http://mail.python.org/pipermail/tutor/2006-January/044455.html


回答 3

如果您的对象包含一组要绑定对象迭代器的数据,则可以作弊并执行以下操作:

>>> class foo:
    def __init__(self, *params):
           self.data = params
    def __iter__(self):
        if hasattr(self.data[0], "__iter__"):
            return self.data[0].__iter__()
        return self.data.__iter__()
>>> d=foo(6,7,3,8, "ads", 6)
>>> for i in d:
    print i
6
7
3
8
ads
6

If your object contains a set of data you want to bind your object’s iter to, you can cheat and do this:

>>> class foo:
    def __init__(self, *params):
           self.data = params
    def __iter__(self):
        if hasattr(self.data[0], "__iter__"):
            return self.data[0].__iter__()
        return self.data.__iter__()
>>> d=foo(6,7,3,8, "ads", 6)
>>> for i in d:
    print i
6
7
3
8
ads
6

回答 4

在“可迭代的接口”的python由两个方法__next__()__iter__()。该__next__函数是最重要的,因为它定义了迭代器的行为-也就是说,该函数确定下一步应返回什么值。该__iter__()方法用于重置迭代的起点。通常,您会发现,__iter__()__init__()用于设置起点时, 它只能返回自身。

请参阅以下代码,以定义实现Reitable接口的类反向,并定义任何序列类中任何实例的迭代器。该__next__()方法从序列的末尾开始,并以相反的顺序返回值。请注意,来自实现“序列接口”的类的实例必须定义__len__()__getitem__()方法。

class Reverse:
    """Iterator for looping over a sequence backwards."""
    def __init__(self, seq):
        self.data = seq
        self.index = len(seq)

    def __iter__(self):
        return self

    def __next__(self):
        if self.index == 0:
            raise StopIteration
        self.index = self.index - 1
        return self.data[self.index]

>>> rev = Reverse('spam')
>>> next(rev)   # note no need to call iter()
'm'
>>> nums = Reverse(range(1,10))
>>> next(nums)
9

The “iterable interface” in python consists of two methods __next__() and __iter__(). The __next__ function is the most important, as it defines the iterator behavior – that is, the function determines what value should be returned next. The __iter__() method is used to reset the starting point of the iteration. Often, you will find that __iter__() can just return self when __init__() is used to set the starting point.

See the following code for defining a Class Reverse which implements the “iterable interface” and defines an iterator over any instance from any sequence class. The __next__() method starts at the end of the sequence and returns values in reverse order of the sequence. Note that instances from a class implementing the “sequence interface” must define a __len__() and a __getitem__() method.

class Reverse:
    """Iterator for looping over a sequence backwards."""
    def __init__(self, seq):
        self.data = seq
        self.index = len(seq)

    def __iter__(self):
        return self

    def __next__(self):
        if self.index == 0:
            raise StopIteration
        self.index = self.index - 1
        return self.data[self.index]

>>> rev = Reverse('spam')
>>> next(rev)   # note no need to call iter()
'm'
>>> nums = Reverse(range(1,10))
>>> next(nums)
9

回答 5

要回答有关映射的问题:您提供的内容__iter__应迭代映射的。以下是一个简单的示例,它创建了一个映射x -> x * x并在扩展ABC映射的Python3上工作。

import collections.abc

class MyMap(collections.abc.Mapping):
    def __init__(self, n):
        self.n = n

    def __getitem__(self, key): # given a key, return it's value
        if 0 <= key < self.n:
            return key * key
        else:
            raise KeyError('Invalid key')

    def __iter__(self): # iterate over all keys
        for x in range(self.n):
            yield x

    def __len__(self):
        return self.n

m = MyMap(5)
for k, v in m.items():
    print(k, '->', v)
# 0 -> 0
# 1 -> 1
# 2 -> 4
# 3 -> 9
# 4 -> 16

To answer the question about mappings: your provided __iter__ should iterate over the keys of the mapping. The following is a simple example that creates a mapping x -> x * x and works on Python3 extending the ABC mapping.

import collections.abc

class MyMap(collections.abc.Mapping):
    def __init__(self, n):
        self.n = n

    def __getitem__(self, key): # given a key, return it's value
        if 0 <= key < self.n:
            return key * key
        else:
            raise KeyError('Invalid key')

    def __iter__(self): # iterate over all keys
        for x in range(self.n):
            yield x

    def __len__(self):
        return self.n

m = MyMap(5)
for k, v in m.items():
    print(k, '->', v)
# 0 -> 0
# 1 -> 1
# 2 -> 4
# 3 -> 9
# 4 -> 16

回答 6

如果您不想像dict别人建议的那样继承,这是对如何实现__iter__自定义字典的粗略示例的问题的直接答案:

class Attribute:
    def __init__(self, key, value):
        self.key = key
        self.value = value

class Node(collections.Mapping):
    def __init__(self):
        self.type  = ""
        self.attrs = [] # List of Attributes

    def __iter__(self):
        for attr in self.attrs:
            yield attr.key

它使用了一个生成器,在这里对此进行了详细描述。

由于我们继承自Mapping,因此您还需要实现__getitem____len__

    def __getitem__(self, key):
        for attr in self.attrs:
            if key == attr.key:
                return attr.value
        raise KeyError

    def __len__(self):
        return len(self.attrs)

In case you don’t want to inherit from dict as others have suggested, here is direct answer to the question on how to implement __iter__ for a crude example of a custom dict:

class Attribute:
    def __init__(self, key, value):
        self.key = key
        self.value = value

class Node(collections.Mapping):
    def __init__(self):
        self.type  = ""
        self.attrs = [] # List of Attributes

    def __iter__(self):
        for attr in self.attrs:
            yield attr.key

That uses a generator, which is well described here.

Since we’re inheriting from Mapping, you need to also implement __getitem__ and __len__:

    def __getitem__(self, key):
        for attr in self.attrs:
            if key == attr.key:
                return attr.value
        raise KeyError

    def __len__(self):
        return len(self.attrs)

回答 7

在某些情况下可能有效的一种选择是使您的自定义类继承dict。如果它像字典一样,这似乎是一个合理的选择。也许应该一个命令。这样,您可以免费获得类似dict的迭代。

class MyDict(dict):
    def __init__(self, custom_attribute):
        self.bar = custom_attribute

mydict = MyDict('Some name')
mydict['a'] = 1
mydict['b'] = 2

print mydict.bar
for k, v in mydict.items():
    print k, '=>', v

输出:

Some name
a => 1
b => 2

One option that might work for some cases is to make your custom class inherit from dict. This seems like a logical choice if it acts like a dict; maybe it should be a dict. This way, you get dict-like iteration for free.

class MyDict(dict):
    def __init__(self, custom_attribute):
        self.bar = custom_attribute

mydict = MyDict('Some name')
mydict['a'] = 1
mydict['b'] = 2

print mydict.bar
for k, v in mydict.items():
    print k, '=>', v

Output:

Some name
a => 1
b => 2

回答 8

dict继承的示例,修改其iter,例如,2在for循环中跳过键

# method 1
class Dict(dict):
    def __iter__(self):
        keys = self.keys()
        for i in keys:
            if i == 2:
                continue
            yield i

# method 2
class Dict(dict):
    def __iter__(self):
        for i in super(Dict, self).__iter__():
            if i == 2:
                continue
            yield i

example for inhert from dict, modify its iter, for example, skip key 2 when in for loop

# method 1
class Dict(dict):
    def __iter__(self):
        keys = self.keys()
        for i in keys:
            if i == 2:
                continue
            yield i

# method 2
class Dict(dict):
    def __iter__(self):
        for i in super(Dict, self).__iter__():
            if i == 2:
                continue
            yield i

TensorFlow中的tf.app.flags的目的是什么?

问题:TensorFlow中的tf.app.flags的目的是什么?

我在Tensorflow中阅读一些示例代码,发现以下代码

flags = tf.app.flags
FLAGS = flags.FLAGS
flags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')
flags.DEFINE_integer('max_steps', 2000, 'Number of steps to run trainer.')
flags.DEFINE_integer('hidden1', 128, 'Number of units in hidden layer 1.')
flags.DEFINE_integer('hidden2', 32, 'Number of units in hidden layer 2.')
flags.DEFINE_integer('batch_size', 100, 'Batch size.  '
                 'Must divide evenly into the dataset sizes.')
flags.DEFINE_string('train_dir', 'data', 'Directory to put the training data.')
flags.DEFINE_boolean('fake_data', False, 'If true, uses fake data '
                 'for unit testing.')

tensorflow/tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py

但我找不到有关的用法的任何文档tf.app.flags

我发现该标志的实现在 tensorflow/tensorflow/python/platform/default/_flags.py

显然,这tf.app.flags是以某种方式用于配置网络的,所以为什么在API文档中没有呢?谁能解释这是怎么回事?

I am reading some example codes in Tensorflow, I found following code

flags = tf.app.flags
FLAGS = flags.FLAGS
flags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')
flags.DEFINE_integer('max_steps', 2000, 'Number of steps to run trainer.')
flags.DEFINE_integer('hidden1', 128, 'Number of units in hidden layer 1.')
flags.DEFINE_integer('hidden2', 32, 'Number of units in hidden layer 2.')
flags.DEFINE_integer('batch_size', 100, 'Batch size.  '
                 'Must divide evenly into the dataset sizes.')
flags.DEFINE_string('train_dir', 'data', 'Directory to put the training data.')
flags.DEFINE_boolean('fake_data', False, 'If true, uses fake data '
                 'for unit testing.')

in tensorflow/tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py

But I can’t find any docs about this usage of tf.app.flags.

And I found the implementation of this flags is in the tensorflow/tensorflow/python/platform/default/_flags.py

Obviously, this tf.app.flags is somehow used to configure a network, so why is it not in the API docs? Can anyone explain what is going on here?


回答 0

tf.app.flags模块目前是python-gflags的 一个瘦包装,因此该项目文档是如何使用它的最佳资源argparse,它实现了一部分功能python-gflags

请注意,该模块当前已打包为方便编写演示应用程序使用,从技术上讲,它不是公共API的一部分,因此将来可能会更改。

我们建议您使用argparse或任何您喜欢的库来实现自己的标志解析。

编辑:tf.app.flags模块实际上并未使用实现python-gflags,但它使用了类似的API。

The tf.app.flags module is presently a thin wrapper around python-gflags, so the documentation for that project is the best resource for how to use it argparse, which implements a subset of the functionality in python-gflags.

Note that this module is currently packaged as a convenience for writing demo apps, and is not technically part of the public API, so it may change in future.

We recommend that you implement your own flag parsing using argparse or whatever library you prefer.

EDIT: The tf.app.flags module is not in fact implemented using python-gflags, but it uses a similar API.


回答 1

tf.app.flags模块是Tensorflow提供的功能,用于为Tensorflow程序实现命令行标志。例如,您遇到的代码将执行以下操作:

flags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')

第一个参数定义标志的名称,第二个参数定义默认值,以防执行文件时未指定标志。

因此,如果运行以下命令:

$ python fully_connected_feed.py --learning_rate 1.00

那么学习率将设置为1.00,如果未指定该标志,则将保持0.01。

本文所述,文档可能不存在,因为这可能是Google内部要求开发人员使用的文档。

此外,如文章中所述,使用Tensorflow标志比其他Python软件包提供的标志功能有多个优势,例如argparse在处理Tensorflow模型时尤其如此,最重要的是可以向代码提供Tensorflow特定信息,例如信息有关使用哪个GPU的信息。

The tf.app.flags module is a functionality provided by Tensorflow to implement command line flags for your Tensorflow program. As an example, the code you came across would do the following:

flags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')

The first parameter defines the name of the flag while the second defines the default value in case the flag is not specified while executing the file.

So if you run the following:

$ python fully_connected_feed.py --learning_rate 1.00

then the learning rate is set to 1.00 and will remain 0.01 if the flag is not specified.

As mentioned in this article, the docs are probably not present because this might be something that Google requires internally for its developers to use.

Also, as mentioned in the post, there are several advantages of using Tensorflow flags over flag functionality provided by other Python packages such as argparse especially when dealing with Tensorflow models, the most important being that you can supply Tensorflow specific information to the code such as information about which GPU to use.


回答 2

在Google,他们使用标记系统来设置参数的默认值。它类似于argparse。他们使用自己的标记系统,而不是argparse或sys.argv。

资料来源:我以前在那里工作过。

At Google, they use flag systems to set default values for arguments. It’s similar to argparse. They use their own flag system instead of argparse or sys.argv.

Source: I worked there before.


回答 3

使用时tf.app.run(),可以使用方便地在线程之间传输变量tf.app.flags。请参阅此内容以进一步使用tf.app.flags

When you use tf.app.run(), you can transfer the variable very conveniently between threads using tf.app.flags. See this for further usage of tf.app.flags.


回答 4

经过多次尝试后,我发现它可以打印所有FLAGS键以及实际值-

for key in tf.app.flags.FLAGS.flag_values_dict():

  print(key, FLAGS[key].value)

After trying many times I found this to print all FLAGS key as well as actual value –

for key in tf.app.flags.FLAGS.flag_values_dict():

  print(key, FLAGS[key].value)

如何在Python中记录类属性?[关闭]

问题:如何在Python中记录类属性?[关闭]

我正在编写一个轻量级的类,其属性旨在可公开访问,并且有时仅在特定的实例中被覆盖。就此而言,Python语言中没有为类属性或任何类型的属性创建文档字符串的规定。记录这些属性的预期方式和受支持方式是什么?目前,我正在做这种事情:

class Albatross(object):
    """A bird with a flight speed exceeding that of an unladen swallow.

    Attributes:
    """

    flight_speed = 691
    __doc__ += """
        flight_speed (691)
          The maximum speed that such a bird can attain.
    """

    nesting_grounds = "Raymond Luxury-Yacht"
    __doc__ += """
        nesting_grounds ("Raymond Luxury-Yacht")
          The locale where these birds congregate to reproduce.
    """

    def __init__(self, **keyargs):
        """Initialize the Albatross from the keyword arguments."""
        self.__dict__.update(keyargs)

这将导致该类的docstring包含初始的标准docstring部分,以及通过对的扩展分配为每个属性添加的行__doc__

尽管docstring样式指南中似乎并未明确禁止使用这种样式,但也没有提到它是一种选择。这样做的好处是,它提供了一种在定义时连同属性一起记录属性的方法,同时仍然创建了一个可显示的类docstring,并且避免了编写注释以重申该docstring中的信息。我仍然对必须两次写入属性感到恼火。我正在考虑使用文档字符串中值的字符串表示形式来至少避免重复默认值。

这是对特设社区惯例的严重违反吗?可以吗 有没有更好的办法?例如,可以创建一个包含属性值和文档字符串的字典,然后__dict__在类声明的末尾将内容添加到该类和文档字符串中。这样可以减少两次键入属性名称和值的需要。 编辑:我认为,最后一个想法实际上是不可能的,至少没有没有根据数据动态构建整个类的想法,除非有其他原因,否则这似乎是一个糟糕的主意。

我是python的新手,仍然在研究编码风格的细节,因此也欢迎无关的批评。

I’m writing a lightweight class whose attributes are intended to be publicly accessible, and only sometimes overridden in specific instantiations. There’s no provision in the Python language for creating docstrings for class attributes, or any sort of attributes, for that matter. What is the expected and supported way, should there be one, to document these attributes? Currently I’m doing this sort of thing:

class Albatross(object):
    """A bird with a flight speed exceeding that of an unladen swallow.

    Attributes:
    """

    flight_speed = 691
    __doc__ += """
        flight_speed (691)
          The maximum speed that such a bird can attain.
    """

    nesting_grounds = "Raymond Luxury-Yacht"
    __doc__ += """
        nesting_grounds ("Raymond Luxury-Yacht")
          The locale where these birds congregate to reproduce.
    """

    def __init__(self, **keyargs):
        """Initialize the Albatross from the keyword arguments."""
        self.__dict__.update(keyargs)

This will result in the class’s docstring containing the initial standard docstring section, as well as the lines added for each attribute via augmented assignment to __doc__.

Although this style doesn’t seem to be expressly forbidden in the docstring style guidelines, it’s also not mentioned as an option. The advantage here is that it provides a way to document attributes alongside their definitions, while still creating a presentable class docstring, and avoiding having to write comments that reiterate the information from the docstring. I’m still kind of annoyed that I have to actually write the attributes twice; I’m considering using the string representations of the values in the docstring to at least avoid duplication of the default values.

Is this a heinous breach of the ad hoc community conventions? Is it okay? Is there a better way? For example, it’s possible to create a dictionary containing values and docstrings for the attributes and then add the contents to the class __dict__ and docstring towards the end of the class declaration; this would alleviate the need to type the attribute names and values twice. edit: this last idea is, I think, not actually possible, at least not without dynamically building the entire class from data, which seems like a really bad idea unless there’s some other reason to do that.

I’m pretty new to python and still working out the details of coding style, so unrelated critiques are also welcome.


回答 0

为避免混淆:术语property在python中具有特定含义。您所说的是所谓的类属性。由于始终在类中对它们进行操作,因此我发现将它们记录在类的文档字符串中是有意义的。像这样:

class Albatross(object):
    """A bird with a flight speed exceeding that of an unladen swallow.

    Attributes:
        flight_speed     The maximum speed that such a bird can attain.
        nesting_grounds  The locale where these birds congregate to reproduce.
    """
    flight_speed = 691
    nesting_grounds = "Throatwarbler Man Grove"

我认为这比示例中的方法容易得多。如果我确实希望属性值的副本出现在doc字符串中,则可以将它们放在每个属性的描述的旁边或下方。

请记住,在Python中,文档字符串是其文档对象的实际成员,而不仅仅是源代码注释。由于类属性变量本身不是对象而是对象的引用,因此它们无法保存自己的文档字符串。我想您可以为引用中的doc字符串辩护,也许是描述“应该在这里做什么”而不是“实际在这里”,但是我发现在包含类的doc字符串中这样做很容易。

To avoid confusion: the term property has a specific meaning in python. What you’re talking about is what we call class attributes. Since they are always acted upon through their class, I find that it makes sense to document them within the class’ doc string. Something like this:

class Albatross(object):
    """A bird with a flight speed exceeding that of an unladen swallow.

    Attributes:
        flight_speed     The maximum speed that such a bird can attain.
        nesting_grounds  The locale where these birds congregate to reproduce.
    """
    flight_speed = 691
    nesting_grounds = "Throatwarbler Man Grove"

I think that’s a lot easier on the eyes than the approach in your example. If I really wanted a copy of the attribute values to appear in the doc string, I would put them beside or below the description of each attribute.

Keep in mind that in Python, doc strings are actual members of the objects they document, not merely source code annotations. Since class attribute variables are not objects themselves but references to objects, they have no way of holding doc strings of their own. I guess you could make a case for doc strings on references, perhaps to describe “what should go here” instead of “what is actually here”, but I find it easy enough to do that in the containing class doc string.


回答 1

您在“ 什么是文档字符串 ”部分中引用了PEP257:文档字符串约定

Python代码其他地方出现的字符串文字也可以用作文档。它们无法被Python字节码编译器识别,并且不能作为运行时对象属性(即未分配给__doc__)访问,但是软件工具可以提取两种类型的额外docstring:

在模块,类或__init__方法的顶级进行简单赋值后立即出现的字符串文字称为“属性文档字符串”。

这在PEP 258:属性文档字符串中有更详细的说明。正如上面的解释。属性不是可以拥有__doc__的对象,因此它们不会出现在help()或pydoc中。这些文档字符串只能用于生成的文档。

它们在Sphinx中与指令autoattribute一起使用

Sphinx可以在赋值之前的一行上使用注释,或者在赋值之后的特殊注释或定义之后的文档字符串中使用这些注释,这些注释将自动记录在案。

You cite the PEP257: Docstring Conventions, in the section What is a docstring it is stated:

String literals occurring elsewhere in Python code may also act as documentation. They are not recognized by the Python bytecode compiler and are not accessible as runtime object attributes (i.e. not assigned to __doc__), but two types of extra docstrings may be extracted by software tools:

String literals occurring immediately after a simple assignment at the top level of a module, class, or __init__ method are called “attribute docstrings”.

And this is explained in more details in PEP 258: Attribute docstrings. As explains above ʇsәɹoɈ. an attribute is not an object that can own a __doc__ so they won’t appear in help() or pydoc. These docstrings can only be used for generated documentation.

They are used in Sphinx with the directive autoattribute

Sphinx can use comments on a line before an assignment or a special comment following an assignment or a docstring after the definition which will be autodocumented.


回答 2

您可以滥用此属性。属性包含getter,setter,deleter 和docstring。天真的,这会变得很冗长:

class C:
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """Docstring goes here."""
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

然后,您将拥有一个属于Cx的文档字符串:

In [24]: print(C.x.__doc__)
Docstring goes here.

要对许多属性执行此操作比较麻烦,但是您可以设想一个辅助函数myprop:

def myprop(x, doc):
    def getx(self):
        return getattr(self, '_' + x)

    def setx(self, val):
        setattr(self, '_' + x, val)

    def delx(self):
        delattr(self, '_' + x)

    return property(getx, setx, delx, doc)

class C:
    a = myprop("a", "Hi, I'm A!")
    b = myprop("b", "Hi, I'm B!")

In [44]: c = C()

In [46]: c.b = 42

In [47]: c.b
Out[47]: 42

In [49]: print(C.b.__doc__)
Hi, I'm B!

然后,以交互方式调用Python help将得到:

Help on class C in module __main__:

class C
 |  Data descriptors defined here:
 |  
 |  a
 |      Hi, I'm A!
 |  
 |  b
 |      Hi, I'm B!

我认为这应该是您所追求的。

编辑:现在我意识到,也许我们可以完全避免将第一个参数传递给它myprop,因为内部名称无关紧要。如果后续的调用myprop可以通过某种方式彼此通信,则它可以自动确定一个较长且不太可能的内部属性名称。我敢肯定有实现此目的的方法,但是我不确定他们是否值得。

You could abuse properties to this effect. Properties contain a getter, a setter, a deleter, and a docstring. Naively, this would get very verbose:

class C:
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """Docstring goes here."""
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

Then you will have a docstring belonging to C.x:

In [24]: print(C.x.__doc__)
Docstring goes here.

To do this for many attributes is cumbersome, but you could envision a helper function myprop:

def myprop(x, doc):
    def getx(self):
        return getattr(self, '_' + x)

    def setx(self, val):
        setattr(self, '_' + x, val)

    def delx(self):
        delattr(self, '_' + x)

    return property(getx, setx, delx, doc)

class C:
    a = myprop("a", "Hi, I'm A!")
    b = myprop("b", "Hi, I'm B!")

In [44]: c = C()

In [46]: c.b = 42

In [47]: c.b
Out[47]: 42

In [49]: print(C.b.__doc__)
Hi, I'm B!

Then, calling Pythons interactive help will give:

Help on class C in module __main__:

class C
 |  Data descriptors defined here:
 |  
 |  a
 |      Hi, I'm A!
 |  
 |  b
 |      Hi, I'm B!

which I think should be pretty much what you’re after.

Edit: I realise now that we can perhaps avoid to need to pass the first argument to myprop at all, because the internal name doesn’t matter. If subsequent calls of myprop can somehow communicate with each other, it could automatically decide upon a long and unlikely internal attribute name. I’m sure there are ways to implement this, but I’m not sure if they’re worth it.