标签归档:Python

用sklearn缩放的pandas数据框列

问题:用sklearn缩放的pandas数据框列

我有一个带有混合类型列的pandas数据框,我想将sklearn的min_max_scaler应用于某些列。理想情况下,我想就地进行这些转换,但还没有找到一种方法来进行。我编写了以下有效的代码:

import pandas as pd
import numpy as np
from sklearn import preprocessing

scaler = preprocessing.MinMaxScaler()

dfTest = pd.DataFrame({'A':[14.00,90.20,90.95,96.27,91.21],'B':[103.02,107.26,110.35,114.23,114.68], 'C':['big','small','big','small','small']})
min_max_scaler = preprocessing.MinMaxScaler()

def scaleColumns(df, cols_to_scale):
    for col in cols_to_scale:
        df[col] = pd.DataFrame(min_max_scaler.fit_transform(pd.DataFrame(dfTest[col])),columns=[col])
    return df

dfTest

    A   B   C
0    14.00   103.02  big
1    90.20   107.26  small
2    90.95   110.35  big
3    96.27   114.23  small
4    91.21   114.68  small

scaled_df = scaleColumns(dfTest,['A','B'])
scaled_df

A   B   C
0    0.000000    0.000000    big
1    0.926219    0.363636    small
2    0.935335    0.628645    big
3    1.000000    0.961407    small
4    0.938495    1.000000    small

我很好奇这是否是进行此转换的首选/最有效的方法。有没有办法可以使用df.apply更好呢?

我也很惊讶我无法使用以下代码:

bad_output = min_max_scaler.fit_transform(dfTest['A'])

如果我将整个数据帧传递给缩放器,则它可以工作:

dfTest2 = dfTest.drop('C', axis = 1) good_output = min_max_scaler.fit_transform(dfTest2) good_output

我很困惑为什么将系列传递给定标器会失败。在上面的完整工作代码中,我希望只将一个系列传递给缩放器,然后将dataframe column =设置为缩放的序列。我已经看到这个问题在其他几个地方问过,但找不到一个好的答案。任何帮助了解这里发生的事情将不胜感激!

I have a pandas dataframe with mixed type columns, and I’d like to apply sklearn’s min_max_scaler to some of the columns. Ideally, I’d like to do these transformations in place, but haven’t figured out a way to do that yet. I’ve written the following code that works:

import pandas as pd
import numpy as np
from sklearn import preprocessing

scaler = preprocessing.MinMaxScaler()

dfTest = pd.DataFrame({'A':[14.00,90.20,90.95,96.27,91.21],'B':[103.02,107.26,110.35,114.23,114.68], 'C':['big','small','big','small','small']})
min_max_scaler = preprocessing.MinMaxScaler()

def scaleColumns(df, cols_to_scale):
    for col in cols_to_scale:
        df[col] = pd.DataFrame(min_max_scaler.fit_transform(pd.DataFrame(dfTest[col])),columns=[col])
    return df

dfTest

    A   B   C
0    14.00   103.02  big
1    90.20   107.26  small
2    90.95   110.35  big
3    96.27   114.23  small
4    91.21   114.68  small

scaled_df = scaleColumns(dfTest,['A','B'])
scaled_df

A   B   C
0    0.000000    0.000000    big
1    0.926219    0.363636    small
2    0.935335    0.628645    big
3    1.000000    0.961407    small
4    0.938495    1.000000    small

I’m curious if this is the preferred/most efficient way to do this transformation. Is there a way I could use df.apply that would be better?

I’m also surprised I can’t get the following code to work:

bad_output = min_max_scaler.fit_transform(dfTest['A'])

If I pass an entire dataframe to the scaler it works:

dfTest2 = dfTest.drop('C', axis = 1) good_output = min_max_scaler.fit_transform(dfTest2) good_output

I’m confused why passing a series to the scaler fails. In my full working code above I had hoped to just pass a series to the scaler then set the dataframe column = to the scaled series. I’ve seen this question asked a few other places, but haven’t found a good answer. Any help understanding what’s going on here would be greatly appreciated!


回答 0

我不确定以前的版本是否pandas阻止了此操作,但现在以下代码段对我来说效果很好,并且无需使用就可以产生所需的内容apply

>>> import pandas as pd
>>> from sklearn.preprocessing import MinMaxScaler


>>> scaler = MinMaxScaler()

>>> dfTest = pd.DataFrame({'A':[14.00,90.20,90.95,96.27,91.21],
                           'B':[103.02,107.26,110.35,114.23,114.68],
                           'C':['big','small','big','small','small']})

>>> dfTest[['A', 'B']] = scaler.fit_transform(dfTest[['A', 'B']])

>>> dfTest
          A         B      C
0  0.000000  0.000000    big
1  0.926219  0.363636  small
2  0.935335  0.628645    big
3  1.000000  0.961407  small
4  0.938495  1.000000  small

I am not sure if previous versions of pandas prevented this but now the following snippet works perfectly for me and produces exactly what you want without having to use apply

>>> import pandas as pd
>>> from sklearn.preprocessing import MinMaxScaler


>>> scaler = MinMaxScaler()

>>> dfTest = pd.DataFrame({'A':[14.00,90.20,90.95,96.27,91.21],
                           'B':[103.02,107.26,110.35,114.23,114.68],
                           'C':['big','small','big','small','small']})

>>> dfTest[['A', 'B']] = scaler.fit_transform(dfTest[['A', 'B']])

>>> dfTest
          A         B      C
0  0.000000  0.000000    big
1  0.926219  0.363636  small
2  0.935335  0.628645    big
3  1.000000  0.961407  small
4  0.938495  1.000000  small

回答 1

像这样?

dfTest = pd.DataFrame({
           'A':[14.00,90.20,90.95,96.27,91.21],
           'B':[103.02,107.26,110.35,114.23,114.68], 
           'C':['big','small','big','small','small']
         })
dfTest[['A','B']] = dfTest[['A','B']].apply(
                           lambda x: MinMaxScaler().fit_transform(x))
dfTest

    A           B           C
0   0.000000    0.000000    big
1   0.926219    0.363636    small
2   0.935335    0.628645    big
3   1.000000    0.961407    small
4   0.938495    1.000000    small

Like this?

dfTest = pd.DataFrame({
           'A':[14.00,90.20,90.95,96.27,91.21],
           'B':[103.02,107.26,110.35,114.23,114.68], 
           'C':['big','small','big','small','small']
         })
dfTest[['A','B']] = dfTest[['A','B']].apply(
                           lambda x: MinMaxScaler().fit_transform(x))
dfTest

    A           B           C
0   0.000000    0.000000    big
1   0.926219    0.363636    small
2   0.935335    0.628645    big
3   1.000000    0.961407    small
4   0.938495    1.000000    small

回答 2

正如pir的评论中提到的那样-该.apply(lambda el: scale.fit_transform(el))方法将产生以下警告:

DeprecationWarning:在0.17中弃用1d数组作为数据,它将在0.19中引发ValueError。如果数据具有单个功能,则使用X.reshape(-1,1)来重塑数据,如果包含单个样本,则使用X.reshape(1,-1)来重塑数据。

将您的列转换为numpy数组应该可以完成这项工作(我更喜欢StandardScaler):

from sklearn.preprocessing import StandardScaler
scale = StandardScaler()

dfTest[['A','B','C']] = scale.fit_transform(dfTest[['A','B','C']].as_matrix())

编辑 2018年11月(已针对熊猫0.23.4测试)-

作为罗布·默里提到的意见,大熊猫的电流(v0.23.4)版本.as_matrix()的回报FutureWarning。因此,应将其替换为.values

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

scaler.fit_transform(dfTest[['A','B']].values)

编辑 2019年5月(已针对熊猫0.24.2测试)-

正如joelostblom在评论中提到的那样:“因此0.24.0,建议使用.to_numpy()代替.values。”

更新的示例:

import pandas as pd
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
dfTest = pd.DataFrame({
               'A':[14.00,90.20,90.95,96.27,91.21],
               'B':[103.02,107.26,110.35,114.23,114.68],
               'C':['big','small','big','small','small']
             })
dfTest[['A', 'B']] = scaler.fit_transform(dfTest[['A','B']].to_numpy())
dfTest
      A         B      C
0 -1.995290 -1.571117    big
1  0.436356 -0.603995  small
2  0.460289  0.100818    big
3  0.630058  0.985826  small
4  0.468586  1.088469  small

As it is being mentioned in pir’s comment – the .apply(lambda el: scale.fit_transform(el)) method will produce the following warning:

DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.

Converting your columns to numpy arrays should do the job (I prefer StandardScaler):

from sklearn.preprocessing import StandardScaler
scale = StandardScaler()

dfTest[['A','B','C']] = scale.fit_transform(dfTest[['A','B','C']].as_matrix())

Edit Nov 2018 (Tested for pandas 0.23.4)–

As Rob Murray mentions in the comments, in the current (v0.23.4) version of pandas .as_matrix() returns FutureWarning. Therefore, it should be replaced by .values:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

scaler.fit_transform(dfTest[['A','B']].values)

Edit May 2019 (Tested for pandas 0.24.2)–

As joelostblom mentions in the comments, “Since 0.24.0, it is recommended to use .to_numpy() instead of .values.”

Updated example:

import pandas as pd
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
dfTest = pd.DataFrame({
               'A':[14.00,90.20,90.95,96.27,91.21],
               'B':[103.02,107.26,110.35,114.23,114.68],
               'C':['big','small','big','small','small']
             })
dfTest[['A', 'B']] = scaler.fit_transform(dfTest[['A','B']].to_numpy())
dfTest
      A         B      C
0 -1.995290 -1.571117    big
1  0.436356 -0.603995  small
2  0.460289  0.100818    big
3  0.630058  0.985826  small
4  0.468586  1.088469  small

回答 3

df = pd.DataFrame(scale.fit_transform(df.values), columns=df.columns, index=df.index)

这应该在没有折旧警告的情况下起作用。

df = pd.DataFrame(scale.fit_transform(df.values), columns=df.columns, index=df.index)

This should work without depreciation warnings.


回答 4

您只能使用以下方法进行操作 pandas

In [235]:
dfTest = pd.DataFrame({'A':[14.00,90.20,90.95,96.27,91.21],'B':[103.02,107.26,110.35,114.23,114.68], 'C':['big','small','big','small','small']})
df = dfTest[['A', 'B']]
df_norm = (df - df.min()) / (df.max() - df.min())
print df_norm
print pd.concat((df_norm, dfTest.C),1)

          A         B
0  0.000000  0.000000
1  0.926219  0.363636
2  0.935335  0.628645
3  1.000000  0.961407
4  0.938495  1.000000
          A         B      C
0  0.000000  0.000000    big
1  0.926219  0.363636  small
2  0.935335  0.628645    big
3  1.000000  0.961407  small
4  0.938495  1.000000  small

You can do it using pandas only:

In [235]:
dfTest = pd.DataFrame({'A':[14.00,90.20,90.95,96.27,91.21],'B':[103.02,107.26,110.35,114.23,114.68], 'C':['big','small','big','small','small']})
df = dfTest[['A', 'B']]
df_norm = (df - df.min()) / (df.max() - df.min())
print df_norm
print pd.concat((df_norm, dfTest.C),1)

          A         B
0  0.000000  0.000000
1  0.926219  0.363636
2  0.935335  0.628645
3  1.000000  0.961407
4  0.938495  1.000000
          A         B      C
0  0.000000  0.000000    big
1  0.926219  0.363636  small
2  0.935335  0.628645    big
3  1.000000  0.961407  small
4  0.938495  1.000000  small

回答 5

我知道这是一个很老的评论,但仍然:

不要使用单括号(dfTest['A']),而应使用双括号(dfTest[['A']])

即:min_max_scaler.fit_transform(dfTest[['A']])

我相信这会取得理想的结果。

I know it’s a very old comment, but still:

Instead of using single bracket (dfTest['A']), use double brackets (dfTest[['A']]).

i.e: min_max_scaler.fit_transform(dfTest[['A']]).

I believe this will give the desired result.


Python中两个日期之间的差异

问题:Python中两个日期之间的差异

我有两个不同的日期,我想知道它们之间的天数差异。日期格式为YYYY-MM-DD。

我有一个可以在日期中添加或减去给定数字的功能:

def addonDays(a, x):
   ret = time.strftime("%Y-%m-%d",time.localtime(time.mktime(time.strptime(a,"%Y-%m-%d"))+x*3600*24+3600))      
   return ret

其中A是日期,x是我要添加的天数。结果是另一个日期。

我需要一个可以给出两个日期的函数,结果将是一个以天为单位的日期差的整数。

I have two different dates and I want to know the difference in days between them. The format of the date is YYYY-MM-DD.

I have a function that can ADD or SUBTRACT a given number to a date:

def addonDays(a, x):
   ret = time.strftime("%Y-%m-%d",time.localtime(time.mktime(time.strptime(a,"%Y-%m-%d"))+x*3600*24+3600))      
   return ret

where A is the date and x the number of days I want to add. And the result is another date.

I need a function where I can give two dates and the result would be an int with date difference in days.


回答 0

使用-得到两者之间的区别datetime对象,并采取days会员。

from datetime import datetime

def days_between(d1, d2):
    d1 = datetime.strptime(d1, "%Y-%m-%d")
    d2 = datetime.strptime(d2, "%Y-%m-%d")
    return abs((d2 - d1).days)

Use - to get the difference between two datetime objects and take the days member.

from datetime import datetime

def days_between(d1, d2):
    d1 = datetime.strptime(d1, "%Y-%m-%d")
    d2 = datetime.strptime(d2, "%Y-%m-%d")
    return abs((d2 - d1).days)

回答 1

另一个简短的解决方案:

from datetime import date

def diff_dates(date1, date2):
    return abs(date2-date1).days

def main():
    d1 = date(2013,1,1)
    d2 = date(2013,9,13)
    result1 = diff_dates(d2, d1)
    print '{} days between {} and {}'.format(result1, d1, d2)
    print ("Happy programmer's day!")

main()

Another short solution:

from datetime import date

def diff_dates(date1, date2):
    return abs(date2-date1).days

def main():
    d1 = date(2013,1,1)
    d2 = date(2013,9,13)
    result1 = diff_dates(d2, d1)
    print '{} days between {} and {}'.format(result1, d1, d2)
    print ("Happy programmer's day!")

main()

回答 2

我尝试了上面larsmans发布的代码,但是有两个问题:

1)原样的代码将引发mauguerra提到的错误2)如果将代码更改为以下内容:

...
    d1 = d1.strftime("%Y-%m-%d")
    d2 = d2.strftime("%Y-%m-%d")
    return abs((d2 - d1).days)

这会将您的datetime对象转换为字符串,但是有两件事

1)尝试执行d2-d1将失败,因为您无法在字符串上使用减号运算符,并且2)如果阅读了上述答案的第一行,则想在两个datetime对象上使用-运算符,但是将它们转换为字符串

我发现您实际上只需要以下内容:

import datetime

end_date = datetime.datetime.utcnow()
start_date = end_date - datetime.timedelta(days=8)
difference_in_days = abs((end_date - start_date).days)

print difference_in_days

I tried the code posted by larsmans above but, there are a couple of problems:

1) The code as is will throw the error as mentioned by mauguerra 2) If you change the code to the following:

...
    d1 = d1.strftime("%Y-%m-%d")
    d2 = d2.strftime("%Y-%m-%d")
    return abs((d2 - d1).days)

This will convert your datetime objects to strings but, two things

1) Trying to do d2 – d1 will fail as you cannot use the minus operator on strings and 2) If you read the first line of the above answer it stated, you want to use the – operator on two datetime objects but, you just converted them to strings

What I found is that you literally only need the following:

import datetime

end_date = datetime.datetime.utcnow()
start_date = end_date - datetime.timedelta(days=8)
difference_in_days = abs((end_date - start_date).days)

print difference_in_days

回答 3

试试这个:

data=pd.read_csv('C:\Users\Desktop\Data Exploration.csv')
data.head(5)
first=data['1st Gift']
last=data['Last Gift']
maxi=data['Largest Gift']
l_1=np.mean(first)-3*np.std(first)
u_1=np.mean(first)+3*np.std(first)


m=np.abs(data['1st Gift']-np.mean(data['1st Gift']))>3*np.std(data['1st Gift'])
pd.value_counts(m)
l=first[m]
data.loc[:,'1st Gift'][m==True]=np.mean(data['1st Gift'])+3*np.std(data['1st Gift'])
data['1st Gift'].head()




m=np.abs(data['Last Gift']-np.mean(data['Last Gift']))>3*np.std(data['Last Gift'])
pd.value_counts(m)
l=last[m]
data.loc[:,'Last Gift'][m==True]=np.mean(data['Last Gift'])+3*np.std(data['Last Gift'])
data['Last Gift'].head()

Try this:

data=pd.read_csv('C:\Users\Desktop\Data Exploration.csv')
data.head(5)
first=data['1st Gift']
last=data['Last Gift']
maxi=data['Largest Gift']
l_1=np.mean(first)-3*np.std(first)
u_1=np.mean(first)+3*np.std(first)


m=np.abs(data['1st Gift']-np.mean(data['1st Gift']))>3*np.std(data['1st Gift'])
pd.value_counts(m)
l=first[m]
data.loc[:,'1st Gift'][m==True]=np.mean(data['1st Gift'])+3*np.std(data['1st Gift'])
data['1st Gift'].head()




m=np.abs(data['Last Gift']-np.mean(data['Last Gift']))>3*np.std(data['Last Gift'])
pd.value_counts(m)
l=last[m]
data.loc[:,'Last Gift'][m==True]=np.mean(data['Last Gift'])+3*np.std(data['Last Gift'])
data['Last Gift'].head()

回答 4

pd.date_range(’2019-01-01’,’2019-02-01’)。shape [0]

pd.date_range(‘2019-01-01’, ‘2019-02-01’).shape[0]


如何防止Google Colab断开连接?

问题:如何防止Google Colab断开连接?

问:是否可以通过编程方式防止Google Colab在超时时断开连接?

下面介绍导致笔记本计算机自动断开连接的情况:

Google Colab笔记本的空闲超时为90分钟,绝对超时为12小时。这意味着,如果用户在超过90分钟的时间内未与其Google Colab笔记本互动,则其实例将自动终止。另外,Colab实例的最大生存期为12小时。

自然,我们希望自动将最大值从实例中挤出,而不必不断地手动与之交互。在这里,我将假定常见的系统要求:

  • Ubuntu 18 LTS / Windows 10 / Mac操作系统
  • 对于基于Linux的系统,请使用流行的DE,例如Gnome 3或Unity
  • Firefox或Chromium浏览器

我要在这里指出,这种行为并未违反 Google Colab的使用条款,尽管根据其常见问题解答不鼓励这样做(简而言之:从道德上讲,如果您真的不需要它,则用尽所有GPU是不可行的))。


我当前的解决方案非常愚蠢:

  • 首先,我关闭屏幕保护程序,因此我的屏幕始终保持打开状态。
  • 我有一个Arduino开发板,所以我只是将它变成了一个橡胶鸭子USB,并使其在我睡觉时模拟原始用户交互(只是因为我手边有其他用例)。

有更好的方法吗?

Q: Is there any way to programmatically prevent Google Colab from disconnecting on a timeout?

The following describes the conditions causing a notebook to automatically disconnect:

Google Colab notebooks have an idle timeout of 90 minutes and absolute timeout of 12 hours. This means, if user does not interact with his Google Colab notebook for more than 90 minutes, its instance is automatically terminated. Also, maximum lifetime of a Colab instance is 12 hours.

Naturally, we want to automatically squeeze the maximum out of the instance, without having to manually interact with it constantly. Here I will assume commonly seen system requirements:

  • Ubuntu 18 LTS / Windows 10 / Mac Operating systems
  • In case of Linux-based systems, using popular DEs like Gnome 3 or Unity
  • Firefox or Chromium browsers

I should point out here that such behavior does not violate Google Colab’s Terms of Use, although it is not encouraged according to their FAQ (in short: morally it is not okay to use up all of the GPUs if you don’t really need it).


My current solution is very dumb:

  • First, I turn the screensaver off, so my sreen is always on.
  • I have an Arduino board, so I just turned it into a rubber ducky usb and make it emulate primitive user interaction while I sleep (just because I have it at hand for other use-cases).

Are there better ways?


回答 0

编辑: 显然,该解决方案非常简单,并且不需要任何JavaScript。只需在底部创建具有以下行的新单元格:

while True:pass

现在将单元格保持在运行顺序中,以便无限循环不会停止,从而使会话保持活动状态。

旧方法: 设置一个JavaScript间隔,每60秒点击一次connect按钮。使用Ctrl + Shift + I打开开发人员设置(在您的Web浏览器中),然后单击控制台选项卡,然后在控制台提示符下键入此设置。(对于Mac,请按Option + Command + I)

function ConnectButton(){
    console.log("Connect pushed"); 
    document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click() 
}
setInterval(ConnectButton,60000);

Edit: Apparently the solution is very easy, and doesn’t need any JavaScript. Just create a new cell at the bottom having the following line:

while True:pass

now keep the cell in the run sequence so that the infinite loop won’t stop and thus keep your session alive.

Old method: Set a javascript interval to click on the connect button every 60 seconds. Open developer-settings (in your web-browser) with Ctrl+Shift+I then click on console tab and type this on the console prompt. (for mac press Option+Command+I)

function ConnectButton(){
    console.log("Connect pushed"); 
    document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click() 
}
setInterval(ConnectButton,60000);

回答 1

由于现在将连接按钮的ID更改为“ colab-connect-button”,因此可以使用以下代码来继续单击该按钮。

function ClickConnect(){
    console.log("Clicked on connect button"); 
    document.querySelector("colab-connect-button").click()
}
setInterval(ClickConnect,60000)

如果仍然无法解决问题,请按照以下步骤操作:

  1. 右键单击连接按钮(位于colab的右上方)
  2. 点击检查
  3. 获取按钮的HTML ID并替换为以下代码
function ClickConnect(){
    console.log("Clicked on connect button"); 
    document.querySelector("Put ID here").click() // Change id here
}
setInterval(ClickConnect,60000)

Since the id of the connect button is now changed to “colab-connect-button”, the following code can be used to keep clicking on the button.

function ClickConnect(){
    console.log("Clicked on connect button"); 
    document.querySelector("colab-connect-button").click()
}
setInterval(ClickConnect,60000)

If still, this doesn’t work, then follow the steps given below:

  1. Right-click on the connect button (on the top-right side of the colab)
  2. Click on inspect
  3. Get the HTML id of the button and substitute in the following code
function ClickConnect(){
    console.log("Clicked on connect button"); 
    document.querySelector("Put ID here").click() // Change id here
}
setInterval(ClickConnect,60000)

回答 2

嗯,这对我有用-

在控制台中运行以下代码,它将阻止您断开连接。Ctrl + Shift + i打开检查器视图。然后进入控制台。

function ClickConnect(){
    console.log("Working"); 
    document.querySelector("colab-toolbar-button#connect").click() 
}
setInterval(ClickConnect,60000)

如何防止Google Colab断开连接

Well this is working for me –

run the following code in the console and it will prevent you from disconnecting. Ctrl+ Shift + i to open inspector view . Then go to console.

function ClickConnect(){
    console.log("Working"); 
    document.querySelector("colab-toolbar-button#connect").click() 
}
setInterval(ClickConnect,60000)

How to prevent google colab from disconnecting


回答 3

对我而言,以下示例:

  • document.querySelector("#connect").click() 要么
  • document.querySelector("colab-toolbar-button#connect").click() 要么
  • document.querySelector("colab-connect-button").click()

抛出错误。

我必须使它们适应以下条件:

版本1:

function ClickConnect(){
  console.log("Connnect Clicked - Start"); 
  document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click();
  console.log("Connnect Clicked - End"); 
};
setInterval(ClickConnect, 60000)

版本2: 如果您希望能够停止该功能,请使用以下新代码:

var startClickConnect = function startClickConnect(){
    var clickConnect = function clickConnect(){
        console.log("Connnect Clicked - Start");
        document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click();
        console.log("Connnect Clicked - End"); 
    };

    var intervalId = setInterval(clickConnect, 60000);

    var stopClickConnectHandler = function stopClickConnect() {
        console.log("Connnect Clicked Stopped - Start");
        clearInterval(intervalId);
        console.log("Connnect Clicked Stopped - End");
    };

    return stopClickConnectHandler;
};

var stopClickConnect = startClickConnect();

为了停止,请调用:

stopClickConnect();

For me the following examples:

  • document.querySelector("#connect").click() or
  • document.querySelector("colab-toolbar-button#connect").click() or
  • document.querySelector("colab-connect-button").click()

were throwing errors.

I had to adapt them to the following:

Version 1:

function ClickConnect(){
  console.log("Connnect Clicked - Start"); 
  document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click();
  console.log("Connnect Clicked - End"); 
};
setInterval(ClickConnect, 60000)

Version 2: If you would like to be able to stop the function, here is the new code:

var startClickConnect = function startClickConnect(){
    var clickConnect = function clickConnect(){
        console.log("Connnect Clicked - Start");
        document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click();
        console.log("Connnect Clicked - End"); 
    };

    var intervalId = setInterval(clickConnect, 60000);

    var stopClickConnectHandler = function stopClickConnect() {
        console.log("Connnect Clicked Stopped - Start");
        clearInterval(intervalId);
        console.log("Connnect Clicked Stopped - End");
    };

    return stopClickConnectHandler;
};

var stopClickConnect = startClickConnect();

In order to stop, call:

stopClickConnect();

回答 4

使用Pynput在您的PC中创建python代码

from pynput.mouse import Button, Controller
import time

mouse = Controller()

while True:
    mouse.click(Button.left, 1)
    time.sleep(30)

在您的桌面上运行此代码,然后将鼠标指针悬停在任何目录的目录结构上(左侧的左侧栏-文件部分),此代码将每30秒不断单击一次目录,因此每30秒将展开和缩小一次,因此您的会话不会过期重要-您必须在PC中运行此代码

create a python code in your pc with pynput

from pynput.mouse import Button, Controller
import time

mouse = Controller()

while True:
    mouse.click(Button.left, 1)
    time.sleep(30)

Run this code in your Desktop, Then point mouse arrow over (colabs left panel – file section) directory structure on any directory this code will keep clicking on directory on every 30 seconds so it will expand and shrink every 30 seconds so your session will not get expired Important – you have to run this code in your pc


回答 5

我没有单击“连接”按钮,而是单击“评论”按钮以使会话保持活动状态。(2020年8月)

function ClickConnect(){

console.log("Working"); 
document.querySelector("#comments > span").click() 
}
setInterval(ClickConnect,5000)

Instead of clicking the connect button, i just clicking on comment button to keep my session alive. (August-2020)

function ClickConnect(){

console.log("Working"); 
document.querySelector("#comments > span").click() 
}
setInterval(ClickConnect,5000)

回答 6

我使用宏程序定期单击RAM / Disk按钮以整夜训练模型。诀窍是配置一个宏程序,以两次单击Ram / Disk Colab工具栏按钮,两次单击之间的间隔很短,这样即使运行时断开连接,它也将重新连接。(第一次单击用于关闭对话框,第二次单击用于重新连接)。但是,您仍然必须整夜打开笔记本电脑,甚至可以固定Colab标签。

I use a Macro Program to periodically click on the RAM/Disk button to train the model all night. The trick is to configure a macro program to click on the Ram/Disk Colab Toolbar Button twice with a short interval between the two clicks so that even if the Runtime gets disconnected it will reconnect back. (the first click used to close the dialog box and the second click used to RECONNECT). However, you still have to leave your laptop open all night and maybe pin the Colab tab.


回答 7

在某些脚本的帮助下,以上答案可能效果很好。对于没有脚本的烦人的断开连接,我有一个解决方案(或一种技巧),尤其是当您的程序必须从google驱动器读取数据时,例如训练深度学习网络模型时,使用脚本进行reconnect操作就没有用了,因为一旦您断开与colab的连接,该程序就死了,应该再次手动连接到Google驱动器,以使您的模型能够再次读取数据集,但是脚本不会执行此操作。
我已经测试了很多次,并且效果很好。
当您使用浏览器(我使用Chrome)在colab页面上运行程序时,请记住,一旦程序开始运行,就不要对浏览器进行任何操作,例如:切换到其他网页,打开或关闭另一个网页,以及依此类推,只需将其放置在那里,等待程序完成运行,就可以切换到pycharm等其他软件来继续编写代码,而不必切换到另一个网页。我不知道为什么打开或关闭或切换到其他页面会导致google colab页面的连接问题,但是每次我尝试打扰我的浏览器(如执行某些搜索工作)时,我与colab的连接都会很快断开。

The above answers with the help of some scripts maybe work well. I have a solution(or a kind of trick) for that annoying disconnection without scripts, especially when your program must read data from your google drive, like training a deep learning network model, where using scripts to do reconnect operation is of no use because once you disconnect with your colab, the program is just dead, you should manually connect to your google drive again to make your model able to read dataset again, but the scripts will not do that thing.
I’ve already test it many times and it works well.
When you run a program on the colab page with a browser(I use Chrome), just remember that don’t do any operation to your browser once your program starts running, like: switch to other webpages, open or close another webpage, and so on, just just leave it alone there and waiting for your program finish running, you can switch to another software, like pycharm to keep writing your codes but not switch to another webpage. I don’t know why open or close or switch to other pages will cause the connection problem of the google colab page, but each time I try to bothered my browser, like do some search job, my connection to colab will soon break down.


回答 8

尝试这个:

function ClickConnect(){
  console.log("Working"); 
  document
    .querySelector("#top-toolbar > colab-connect-button")
    .shadowRoot
    .querySelector("#connect")
    .click()
}

setInterval(ClickConnect,60000)

Try this:

function ClickConnect(){
  console.log("Working"); 
  document
    .querySelector("#top-toolbar > colab-connect-button")
    .shadowRoot
    .querySelector("#connect")
    .click()
}

setInterval(ClickConnect,60000)

回答 9

使用python硒

from selenium.webdriver.common.keys import Keys
from selenium import webdriver
import time   

driver = webdriver.Chrome('/usr/lib/chromium-browser/chromedriver')

notebook_url = ''
driver.get(notebook_url)

# run all cells
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.F9)
time.sleep(5)

# click to stay connected
start_time = time.time()
current_time = time.time()
max_time = 11*59*60 #12hours

while (current_time - start_time) < max_time:
    webdriver.ActionChains(driver).send_keys(Keys.ESCAPE).perform()
    driver.find_element_by_xpath('//*[@id="top-toolbar"]/colab-connect-button').click()
    time.sleep(30)
    current_time = time.time()

Using python selenium

from selenium.webdriver.common.keys import Keys
from selenium import webdriver
import time   

driver = webdriver.Chrome('/usr/lib/chromium-browser/chromedriver')

notebook_url = ''
driver.get(notebook_url)

# run all cells
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.F9)
time.sleep(5)

# click to stay connected
start_time = time.time()
current_time = time.time()
max_time = 11*59*60 #12hours

while (current_time - start_time) < max_time:
    webdriver.ActionChains(driver).send_keys(Keys.ESCAPE).perform()
    driver.find_element_by_xpath('//*[@id="top-toolbar"]/colab-connect-button').click()
    time.sleep(30)
    current_time = time.time()

回答 10

我认为JavaScript解决方案不再有效。我在笔记本中使用以下命令进行操作:

    from IPython.display import display, HTML
    js = ('<script>function ConnectButton(){ '
           'console.log("Connect pushed"); '
           'document.querySelector("#connect").click()} '
           'setInterval(ConnectButton,3000);</script>')
    display(HTML(js))

首次执行全部运行时(在启动JavaScript或Python代码之前),控制台将显示:

Connected to 
wss://colab.research.google.com/api/kernels/0e1ce105-0127-4758-90e48cf801ce01a3/channels?session_id=5d8...

但是,每次运行JavaScript时,您都会看到console.log部分,但是click部分仅给出:

Connect pushed

Uncaught TypeError: Cannot read property 'click' of null
 at ConnectButton (<anonymous>:1:92)

其他人建议将按钮名称更改为#colab-connect-button,但这会产生相同的错误。

启动运行系统后,该按钮将更改为显示RAM / DISK,并显示一个下拉列表。单击下拉列表创建一个<DIV class=goog menu...>以前未在DOM中显示的新内容,并带有2个选项“连接到托管运行时”和“连接到本地运行时”。如果控制台窗口已打开并显示元素,则在单击下拉元素时可以看到此DIV出现。只需在出现的新窗口中的两个选项之间移动鼠标焦点,即可向DOM添加其他元素,一旦鼠标释放焦点,它们便会从DOM中完全删除,甚至无需单击即可。

I don’t believe the JavaScript solutions work anymore. I was doing it from within my notebook with:

    from IPython.display import display, HTML
    js = ('<script>function ConnectButton(){ '
           'console.log("Connect pushed"); '
           'document.querySelector("#connect").click()} '
           'setInterval(ConnectButton,3000);</script>')
    display(HTML(js))

When you first do a Run all (before the JavaScript or Python code has started), the console displays:

Connected to 
wss://colab.research.google.com/api/kernels/0e1ce105-0127-4758-90e48cf801ce01a3/channels?session_id=5d8...

However, ever time the JavaScript runs, you see the console.log portion, but the click portion simply gives:

Connect pushed

Uncaught TypeError: Cannot read property 'click' of null
 at ConnectButton (<anonymous>:1:92)

Others suggested the button name has changed to #colab-connect-button, but that gives same error.

After the runtime is started, the button is changed to show RAM/DISK, and a drop down is presented. Clicking on the drop down creates a new <DIV class=goog menu...> that was not shown in the DOM previously, with 2 options “Connect to hosted runtime” and “Connect to local runtime”. If the console window is open and showing elements, you can see this DIV appear when you click the dropdown element. Simply moving the mouse focus between the two options in the new window that appears adds additional elements to the DOM, as soon as the mouse looses focus, they are removed from the DOM completely, even without clicking.


回答 11

我尝试了上面的代码,但它们对我不起作用。这是我重新连接的JS代码。

let interval = setInterval(function(){
let ok = document.getElementById('ok');
if(ok != null){
   console.log("Connect pushed");
ok.click();
}},60000)

您可以使用相同的方式(在浏览器的控制台上运行)来运行它。如果要停止脚本,可以输入clearInterval(interval)并再次运行setInterval(interval)

我希望这可以帮助你。

I tried the codes above but they did not work for me. So here is my JS code for reconnecting.

let interval = setInterval(function(){
let ok = document.getElementById('ok');
if(ok != null){
   console.log("Connect pushed");
ok.click();
}},60000)

You can use it with the same way (run it on the console of your browser) to run it. If you want to stop the script, you can enter clearInterval(interval) and want to run again setInterval(interval).

I hope this helps you.


回答 12

更新了一个。这个对我有用。

function ClickConnect(){
console.log("Working"); 
document.querySelector("paper-icon-button").click()
}
Const myjob = setInterval(ClickConnect, 60000)

如果对您不起作用,请尝试运行以下命令清除它:

clearInterval(myjob)

Updated one. it works for me.

function ClickConnect(){
console.log("Working"); 
document.querySelector("paper-icon-button").click()
}
Const myjob = setInterval(ClickConnect, 60000)

If isn’t working you for you guys try clear it by running:

clearInterval(myjob)

回答 13

这对我有用(似乎他们更改了按钮的类名或ID):

function ClickConnect(){
    console.log("Working"); 
    document.querySelector("colab-connect-button").click() 
}
setInterval(ClickConnect,60000)

This one worked for me (it seems like they changed the button classname or id) :

function ClickConnect(){
    console.log("Working"); 
    document.querySelector("colab-connect-button").click() 
}
setInterval(ClickConnect,60000)

回答 14

投票得最多的答案当然对我有用,但这会使“管理会话”窗口一次又一次地弹出。
我已经解决了这一问题,方法是使用浏览器控制台如下所示自动单击刷新按钮

function ClickRefresh(){
    console.log("Clicked on refresh button"); 
    document.querySelector("paper-icon-button").click()
}
setInterval(ClickRefresh, 60000)

随时在此要点上为此贡献更多代码片段https://gist.github.com/Subangkar/fd1ef276fd40dc374a7c80acc247613e

The most voted answer certainly works for me but it makes the Manage session window popping up again and again.
I’ve solved that by auto clicking the refresh button using browser console like below

function ClickRefresh(){
    console.log("Clicked on refresh button"); 
    document.querySelector("paper-icon-button").click()
}
setInterval(ClickRefresh, 60000)

Feel free to contribute more snippets for this at this gist https://gist.github.com/Subangkar/fd1ef276fd40dc374a7c80acc247613e


回答 15

也许以前的许多解决方案都不再起作用。例如,下面的代码继续在Colab中创建新的代码单元,但仍在工作。无疑,创建一堆代码单元是一个不便之处。如果在运行几个小时后创建了太多的代码单元,而没有足够的RAM,则浏览器可能会冻结。

反复创建代码单元-

function ClickConnect(){
console.log("Working"); 
document.querySelector("colab-toolbar-button").click() 
}setInterval(ClickConnect,60000)

但是我发现下面的代码正在运行,它不会引起任何问题。在Colab笔记本选项卡中,Ctrl + Shift + i同时单击该键,然后将以下代码粘贴到控制台中。120000个间隔就足够了。

function ClickConnect(){
console.log("Working"); 
document.querySelector("colab-toolbar-button#connect").click() 
}setInterval(ClickConnect,120000)

我已在2020年11月在firefox中测试了此代码。它也将在chrome上工作。

Perhaps many of the previous solutions are no longer working. For example, this bellow code continues to create new code cells in Colab, working though. Undoubtedly, creating a bunch of code cells is an inconvenience. If too many code cells are created in some hours of running and there is no enough RAM, the browser may freeze.

This repetedly creates code cells—

function ClickConnect(){
console.log("Working"); 
document.querySelector("colab-toolbar-button").click() 
}setInterval(ClickConnect,60000)

But I found the code below is working, it doesn’t cause any problems. In the Colab notebook tab, click on the Ctrl + Shift + i key simultaneously and paste the below code in the console. 120000 intervals are enough.

function ClickConnect(){
console.log("Working"); 
document.querySelector("colab-toolbar-button#connect").click() 
}setInterval(ClickConnect,120000)

I have tested this code in firefox, in November 2020. It will work on chrome too.


回答 16

我建议使用JQuery(似乎Co-lab默认包含JQuery)。

function ClickConnect(){
  console.log("Working");
  $("colab-toolbar-button").click();
}
setInterval(ClickConnect,60000);

I would recommend using JQuery (It seems that Co-lab includes JQuery by default).

function ClickConnect(){
  console.log("Working");
  $("colab-toolbar-button").click();
}
setInterval(ClickConnect,60000);

回答 17

这些JavaScript函数存在问题:

function ClickConnect(){
    console.log("Clicked on connect button"); 
    document.querySelector("colab-connect-button").click()
}
setInterval(ClickConnect,60000)

他们实际在单击按钮之前在控制台上打印“ Clicked on connect button”。从该线程的不同答案中可以看出,自Google Colab启动以来,connect按钮的ID已经更改了两次。而且将来也可能会更改。因此,如果您打算从该线程中复制旧答案,则可能会说“单击了连接按钮”,但实际上可能不会这样做。当然,如果单击不起作用,它将在控制台上显示一个错误,但是如果您可能不会意外看到该怎么办?因此,您最好这样做:

function ClickConnect(){
    document.querySelector("colab-connect-button").click()
    console.log("Clicked on connect button"); 
}
setInterval(ClickConnect,60000)

您肯定会看到它是否真正起作用。

I have a problem with these javascript functions:

function ClickConnect(){
    console.log("Clicked on connect button"); 
    document.querySelector("colab-connect-button").click()
}
setInterval(ClickConnect,60000)

They print the “Clicked on connect button” on the console before the button is actually clicked. As you can see from different answers in this thread, the id of the connect button has changed a couple of times since Google Colab was launched. And it could be changed in the future as well. So if you’re going to copy an old answer from this thread it may say “Clicked on connect button” but it may actually not do that. Of course if the clicking won’t work it will print an error on the console but what if you may not accidentally see it? So you better do this:

function ClickConnect(){
    document.querySelector("colab-connect-button").click()
    console.log("Clicked on connect button"); 
}
setInterval(ClickConnect,60000)

And you’ll definitely see if it truly works or not.


回答 18

function ClickConnect()
{
    console.log("Working...."); 
    document.querySelector("paper-button#comments").click()
}
setInterval(ClickConnect,600)

这对我有用,但明智地使用

快乐学习:)

function ClickConnect()
{
    console.log("Working...."); 
    document.querySelector("paper-button#comments").click()
}
setInterval(ClickConnect,600)

this worked for me but use wisely

happy learning :)


回答 19

以下最新解决方案适用于我:

function ClickConnect(){
  colab.config
  console.log("Connnect Clicked - Start"); 
  document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click();
  console.log("Connnect Clicked - End");
};
setInterval(ClickConnect, 60000)

the following LATEST solution works for me:

function ClickConnect(){
  colab.config
  console.log("Connnect Clicked - Start"); 
  document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click();
  console.log("Connnect Clicked - End");
};
setInterval(ClickConnect, 60000)

回答 20

下面的javascript对我有用。学分@ artur.k.space

function ColabReconnect() {
    var dialog = document.querySelector("colab-dialog.yes-no-dialog");
    var dialogTitle = dialog && dialog.querySelector("div.content-area>h2");
    if (dialogTitle && dialogTitle.innerText == "Runtime disconnected") {
        dialog.querySelector("paper-button#ok").click();
        console.log("Reconnecting...");
    } else {
        console.log("ColabReconnect is in service.");
    }
}
timerId = setInterval(ColabReconnect, 60000);

在Colab笔记本中,同时单击Ctrl + Shift +i键。将脚本复制并粘贴到提示行中。然后Enter在关闭编辑器之前点击。

这样,该功能将每60秒检查一次,以查看是否显示了屏幕连接对话框,如果显示,则该功能将ok自动为您单击该按钮。

The javascript below works for me. Credits to @artur.k.space.

function ColabReconnect() {
    var dialog = document.querySelector("colab-dialog.yes-no-dialog");
    var dialogTitle = dialog && dialog.querySelector("div.content-area>h2");
    if (dialogTitle && dialogTitle.innerText == "Runtime disconnected") {
        dialog.querySelector("paper-button#ok").click();
        console.log("Reconnecting...");
    } else {
        console.log("ColabReconnect is in service.");
    }
}
timerId = setInterval(ColabReconnect, 60000);

In the Colab notebook, click on Ctrl + Shift + the i key simultaneously. Copy and paste the script into the prompt line. Then hit Enter before closing the editor.

By doing so, the function will check every 60 seconds to see if the onscreen connection dialog is shown, and if it is, the function would then click the ok button automatically for you.


回答 21

好吧,我不是python家伙,也不知道这个’Colab’的实际用途是什么,我将其用作构建系统。我以前在其中设置了ssh转发,然后将这段代码放到运行中,是的。

import getpass
authtoken = getpass.getpass()

Well I am not a python guy nor I know what is the actual use of this ‘Colab’, I use it as a build system lol. And I used to setup ssh forwarding in it then put this code and just leave it running and yeah it works.

import getpass
authtoken = getpass.getpass()

回答 22

此代码在文件资源管理器窗格中单击“刷新文件夹”。

function ClickRefresh(){
  console.log("Working"); 
  document.querySelector("[icon='colab:folder-refresh']").click()
}
const myjob = setInterval(ClickRefresh, 60000)

This code keep clicking “Refresh folder” in the file explorer pane.

function ClickRefresh(){
  console.log("Working"); 
  document.querySelector("[icon='colab:folder-refresh']").click()
}
const myjob = setInterval(ClickRefresh, 60000)

回答 23

GNU Colab使您可以在Colaboratory实例之上运行标准的持久桌面环境。

实际上,它包含一种不让机器死掉的机制。

这是一个视频演示

GNU Colab lets you run a standard persistent desktop environment on top of a Colaboratory instance.

Indeed it contains a mechanism to not let machines die of idling.

Here’s a video demonstration.


回答 24

您也可以使用Python按下箭头键。我也在以下代码中添加了一些随机性。

from pyautogui import press, typewrite, hotkey
import time
from random import shuffle

array = ["left", "right", "up", "down"]

while True:
    shuffle(array)
    time.sleep(10)
    press(array[0])
    press(array[1])
    press(array[2])
    press(array[3])

You can also use Python to press the arrow keys. I added a little bit of randomness in the following code as well.

from pyautogui import press, typewrite, hotkey
import time
from random import shuffle

array = ["left", "right", "up", "down"]

while True:
    shuffle(array)
    time.sleep(10)
    press(array[0])
    press(array[1])
    press(array[2])
    press(array[3])

回答 25

只需在要运行的单元格之后运行以下代码,以防止数据丢失。

!python

同样要退出此模式,请写

exit()

Just run the code below after the cell you want to run to save from data loss.

!python

Also to exit from this mode, write

exit()

回答 26

我一直在寻找解决方案,直到找到一个Python3,该Python3总是在同一位置来回移动鼠标并单击,但这足以使Colab误以为我在笔记本电脑上很活跃并且没有断开连接。

import numpy as np
import time
import mouse
import threading

def move_mouse():
    while True:
        random_row = np.random.random_sample()*100
        random_col = np.random.random_sample()*10
        random_time = np.random.random_sample()*np.random.random_sample() * 100
        mouse.wheel(1000)
        mouse.wheel(-1000)
        mouse.move(random_row, random_col, absolute=False, duration=0.2)
        mouse.move(-random_row, -random_col, absolute=False, duration = 0.2)
        mouse.LEFT
        time.sleep(random_time)


x = threading.Thread(target=move_mouse)
x.start()

您需要安装所需的软件包:sudo -H pip3 install <package_name> 您只需要使用(在本地计算机中)运行它即可(sudo因为它可以控制鼠标)并且它应该可以工作,从而使您能够充分利用Colab的12h会话。

积分: 对于使用Colab(Pro)的用户:防止会话由于不活动而断开连接

I was looking for a solution until I found a Python3 that randomly moves the mouse back and forth and clicks, always on the same place, but that’s enough to fool Colab into thinking I’m active on the notebook and not disconnect.

import numpy as np
import time
import mouse
import threading

def move_mouse():
    while True:
        random_row = np.random.random_sample()*100
        random_col = np.random.random_sample()*10
        random_time = np.random.random_sample()*np.random.random_sample() * 100
        mouse.wheel(1000)
        mouse.wheel(-1000)
        mouse.move(random_row, random_col, absolute=False, duration=0.2)
        mouse.move(-random_row, -random_col, absolute=False, duration = 0.2)
        mouse.LEFT
        time.sleep(random_time)


x = threading.Thread(target=move_mouse)
x.start()

You need to install the needed packages: sudo -H pip3 install <package_name> You just need to run it (in your local machine) with sudo (as it takes control of the mouse) and it should work, allowing you to take full advantage of Colab’s 12h sessions.

Credits: For those using Colab (Pro): Preventing Session from disconnecting due to inactivity


回答 27

function ClickConnect(){
    console.log("Clicked on connect button"); 
    document.querySelector("connect").click() // Change id here
}
setInterval(ClickConnect,60000)

试试上面对我有用的代码:)

function ClickConnect(){
    console.log("Clicked on connect button"); 
    document.querySelector("connect").click() // Change id here
}
setInterval(ClickConnect,60000)

Try above code it worked for me:)


是什么导致[* a]总体化?

问题:是什么导致[* a]总体化?

显然list(a)不是总归,[x for x in a]在某些时候[*a]总归,始终都是总归吗?

这是从0到12的大小n,以及三种方法的结果大小(以字节为单位):

0 56 56 56
1 64 88 88
2 72 88 96
3 80 88 104
4 88 88 112
5 96 120 120
6 104 120 128
7 112 120 136
8 120 120 152
9 128 184 184
10 136 184 192
11 144 184 200
12 152 184 208

这样计算,可以使用Python 3 在repl.it中重现。8

from sys import getsizeof

for n in range(13):
    a = [None] * n
    print(n, getsizeof(list(a)),
             getsizeof([x for x in a]),
             getsizeof([*a]))

因此:这是如何工作的?如何整体化[*a]?实际上,它使用什么机制从给定的输入创建结果列表?它是否使用了迭代器a并使用了类似的东西list.append?源代码在哪里?

产生图像的数据和代码协作。)

放大到较小的n:

缩小到较大的n:

Apparently list(a) doesn’t overallocate, [x for x in a] overallocates at some points, and [*a] overallocates all the time?

Here are sizes n from 0 to 12 and the resulting sizes in bytes for the three methods:

0 56 56 56
1 64 88 88
2 72 88 96
3 80 88 104
4 88 88 112
5 96 120 120
6 104 120 128
7 112 120 136
8 120 120 152
9 128 184 184
10 136 184 192
11 144 184 200
12 152 184 208

Computed like this, reproducable at repl.it, using Python 3.8:

from sys import getsizeof

for n in range(13):
    a = [None] * n
    print(n, getsizeof(list(a)),
             getsizeof([x for x in a]),
             getsizeof([*a]))

So: How does this work? How does [*a] overallocate? Actually, what mechanism does it use to create the result list from the given input? Does it use an iterator over a and use something like list.append? Where is the source code?

(Colab with data and code that produced the images.)

Zooming in to smaller n:

Zooming out to larger n:


回答 0

[*a] 在内部执行C等效于

  1. 新建一个空的 list
  2. 呼叫 newlist.extend(a)
  3. 返回list

因此,如果您将测试扩展到:

from sys import getsizeof

for n in range(13):
    a = [None] * n
    l = []
    l.extend(a)
    print(n, getsizeof(list(a)),
             getsizeof([x for x in a]),
             getsizeof([*a]),
             getsizeof(l))

在线尝试!

你会看到的结果getsizeof([*a])l = []; l.extend(a); getsizeof(l)是相同的。

通常这是正确的做法;在extending时,您通常希望以后再添加,对于一般的解压缩,类似的情况是,假设要在一个接一个的基础上添加多个内容。[*a]这不是正常情况;Python假定list[*a, b, c, *d])中添加了多个项目或可迭代项,因此在常见情况下,过度分配可以节省工作。

相比之下,list由单个预先确定的可迭代大小(带有list())构成的结构在使用过程中可能不会增长或收缩,并且积算过早,除非另外证明。Python最近修复了一个错误,该错误使构造函数甚至可以对已知大小的输入进行整体分配

至于list理解,它们实际上等效于重复的appends,因此当您一次添加一个元素时,您会看到正常的过度分配增长模式的最终结果。

需要明确的是,这都不是语言保证。这就是CPython实现它的方式。Python语言规范通常与特定的增长模式无关list(除了保证从末期开始摊销O(1) appends和pops)。如评论中所述,具体实现在3.9中再次更改;尽管这不会影响[*a],但可能会影响其他情况,这些情况曾经是“构建tuple单个项目的临时内容,然后extend使用tuple”,现在变成的多个应用程序LIST_APPEND,当发生超额分配以及要计算的数字是多少时,该应用程序可能会发生变化。

[*a] is internally doing the C equivalent of:

  1. Make a new, empty list
  2. Call newlist.extend(a)
  3. Returns list.

So if you expand your test to:

from sys import getsizeof

for n in range(13):
    a = [None] * n
    l = []
    l.extend(a)
    print(n, getsizeof(list(a)),
             getsizeof([x for x in a]),
             getsizeof([*a]),
             getsizeof(l))

Try it online!

you’ll see the results for getsizeof([*a]) and l = []; l.extend(a); getsizeof(l) are the same.

This is usually the right thing to do; when extending you’re usually expecting to add more later, and similarly for generalized unpacking, it’s assumed that multiple things will be added one after the other. [*a] is not the normal case; Python assumes there are multiple items or iterables being added to the list ([*a, b, c, *d]), so overallocation saves work in the common case.

By contrast, a list constructed from a single, presized iterable (with list()) may not grow or shrink during use, and overallocating is premature until proven otherwise; Python recently fixed a bug that made the constructor overallocate even for inputs with known size.

As for list comprehensions, they’re effectively equivalent to repeated appends, so you’re seeing the final result of the normal overallocation growth pattern when adding an element at a time.

To be clear, none of this is a language guarantee. It’s just how CPython implements it. The Python language spec is generally unconcerned with specific growth patterns in list (aside from guaranteeing amortized O(1) appends and pops from the end). As noted in the comments, the specific implementation changes again in 3.9; while it won’t affect [*a], it could affect other cases where what used to be “build a temporary tuple of individual items and then extend with the tuple” now becomes multiple applications of LIST_APPEND, which can change when the overallocation occurs and what numbers go into the calculation.


回答 1

的全貌是什么情况,建立在其他的答案和评论(尤其是ShadowRanger的答案,这也解释了为什么它这样做)。

拆卸显示已BUILD_LIST_UNPACK被使用:

>>> import dis
>>> dis.dis('[*a]')
  1           0 LOAD_NAME                0 (a)
              2 BUILD_LIST_UNPACK        1
              4 RETURN_VALUE

这是在中ceval.c处理,它会构建一个空列表并将其扩展(带有a):

        case TARGET(BUILD_LIST_UNPACK): {
            ...
            PyObject *sum = PyList_New(0);
              ...
                none_val = _PyList_Extend((PyListObject *)sum, PEEK(i));

_PyList_Extend 用途 list_extend

_PyList_Extend(PyListObject *self, PyObject *iterable)
{
    return list_extend(self, iterable);
}

其中调用list_resize的总和为

list_extend(PyListObject *self, PyObject *iterable)
    ...
        n = PySequence_Fast_GET_SIZE(iterable);
        ...
        m = Py_SIZE(self);
        ...
        if (list_resize(self, m + n) < 0) {

overallocates如下:

list_resize(PyListObject *self, Py_ssize_t newsize)
{
  ...
    new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

让我们检查一下。用上面的公式计算预期的点数,并通过将预期的字节数乘以8(如我在此处使用64位Python)并添加一个空列表的字节数(即列表对象的固定开销)来计算预期的字节数:

from sys import getsizeof
for n in range(13):
    a = [None] * n
    expected_spots = n + (n >> 3) + (3 if n < 9 else 6)
    expected_bytesize = getsizeof([]) + expected_spots * 8
    real_bytesize = getsizeof([*a])
    print(n,
          expected_bytesize,
          real_bytesize,
          real_bytesize == expected_bytesize)

输出:

0 80 56 False
1 88 88 True
2 96 96 True
3 104 104 True
4 112 112 True
5 120 120 True
6 128 128 True
7 136 136 True
8 152 152 True
9 184 184 True
10 192 192 True
11 200 200 True
12 208 208 True

匹配,除了n = 0list_extend实际上是快捷方式,因此实际上也匹配:

        if (n == 0) {
            ...
            Py_RETURN_NONE;
        }
        ...
        if (list_resize(self, m + n) < 0) {

Full picture of what happens, building on the other answers and comments (especially ShadowRanger’s answer, which also explains why it’s done like that).

Disassembling shows that BUILD_LIST_UNPACK gets used:

>>> import dis
>>> dis.dis('[*a]')
  1           0 LOAD_NAME                0 (a)
              2 BUILD_LIST_UNPACK        1
              4 RETURN_VALUE

That’s handled in ceval.c, which builds an empty list and extends it (with a):

        case TARGET(BUILD_LIST_UNPACK): {
            ...
            PyObject *sum = PyList_New(0);
              ...
                none_val = _PyList_Extend((PyListObject *)sum, PEEK(i));

_PyList_Extend uses list_extend:

_PyList_Extend(PyListObject *self, PyObject *iterable)
{
    return list_extend(self, iterable);
}

Which calls list_resize with the sum of the sizes:

list_extend(PyListObject *self, PyObject *iterable)
    ...
        n = PySequence_Fast_GET_SIZE(iterable);
        ...
        m = Py_SIZE(self);
        ...
        if (list_resize(self, m + n) < 0) {

And that overallocates as follows:

list_resize(PyListObject *self, Py_ssize_t newsize)
{
  ...
    new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

Let’s check that. Compute the expected number of spots with the formula above, and compute the expected byte size by multiplying it with 8 (as I’m using 64-bit Python here) and adding an empty list’s byte size (i.e., a list object’s constant overhead):

from sys import getsizeof
for n in range(13):
    a = [None] * n
    expected_spots = n + (n >> 3) + (3 if n < 9 else 6)
    expected_bytesize = getsizeof([]) + expected_spots * 8
    real_bytesize = getsizeof([*a])
    print(n,
          expected_bytesize,
          real_bytesize,
          real_bytesize == expected_bytesize)

Output:

0 80 56 False
1 88 88 True
2 96 96 True
3 104 104 True
4 112 112 True
5 120 120 True
6 128 128 True
7 136 136 True
8 152 152 True
9 184 184 True
10 192 192 True
11 200 200 True
12 208 208 True

Matches except for n = 0, which list_extend actually shortcuts, so actually that matches, too:

        if (n == 0) {
            ...
            Py_RETURN_NONE;
        }
        ...
        if (list_resize(self, m + n) < 0) {

回答 2

这些将是CPython解释器的实现细节,因此在其他解释器之间可能不一致。

也就是说,您可以list(a)在这里看到理解和行为的位置:

https://github.com/python/cpython/blob/master/Objects/listobject.c#L36

专为理解:

 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
...

new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

在这些行的下面,有list_preallocate_exact调用时使用的行list(a)

These are going to be implementation details of the CPython interpreter, and so may not be consistent across other interpreters.

That said, you can see where the comprehension and list(a) behaviors come in here:

https://github.com/python/cpython/blob/master/Objects/listobject.c#L36

Specifically for the comprehension:

 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
...

new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

Just below those lines, there is list_preallocate_exact which is used when calling list(a).


简洁的书写方式(a + b == c或a + c == b或b + c == a)

问题:简洁的书写方式(a + b == c或a + c == b或b + c == a)

是否有更紧凑或Pythonic的方式来编写布尔表达式

a + b == c or a + c == b or b + c == a

我想出了

a + b + c in (2*a, 2*b, 2*c)

但这有点奇怪。

Is there a more compact or pythonic way to write the boolean expression

a + b == c or a + c == b or b + c == a

I came up with

a + b + c in (2*a, 2*b, 2*c)

but that is a little strange.


回答 0

如果我们看一下Python的Zen,请强调一下:

提姆·彼得斯(Tim Peters)撰写的《 Python之禅》

美丽胜于丑陋。
显式胜于隐式。
简单胜于复杂。
复杂胜于复杂。
扁平比嵌套更好。
稀疏胜于密集。
可读性很重要。
特殊情况还不足以打破规则。
尽管实用性胜过纯度。
错误绝不能默默传递。
除非明确地保持沉默。
面对模棱两可的想法,拒绝猜测的诱惑。
应该有一种-最好只有一种-显而易见的方法。
尽管除非您是荷兰人,否则一开始这种方式可能并不明显。
现在总比没有好。
虽然从来没有比这更好正确的现在。
如果实现难以解释,那是个坏主意。
如果实现易于解释,则可能是个好主意。
命名空间是一个很棒的主意-让我们做更多这些吧!

最Python化的解决方案是最清晰,最简单和最容易解释的解决方案:

a + b == c or a + c == b or b + c == a

更好的是,您甚至不需要了解Python就能理解此代码!就这么简单。这是毫无保留的最佳解决方案。其他一切都是智力上的自慰。

此外,这可能也是性能最佳的解决方案,因为它是所有短路建议中的唯一解决方案。如果为a + b == c,则仅进行一次加法和比较。

If we look at the Zen of Python, emphasis mine:

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren’t special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one– and preferably only one –obvious way to do it.
Although that way may not be obvious at first unless you’re Dutch.
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it’s a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea — let’s do more of those!

The most Pythonic solution is the one that is clearest, simplest, and easiest to explain:

a + b == c or a + c == b or b + c == a

Even better, you don’t even need to know Python to understand this code! It’s that easy. This is, without reservation, the best solution. Anything else is intellectual masturbation.

Furthermore, this is likely the best performing solution as well, as it is the only one out of all the proposals that short circuits. If a + b == c, only a single addition and comparison is done.


回答 1

解决以下三个等式:

a in (b+c, b-c, c-b)

Solving the three equalities for a:

a in (b+c, b-c, c-b)

回答 2

Python具有对序列的所有元素any执行的功能or。在这里,我已经将您的陈述转换为3元素元组。

any((a + b == c, a + c == b, b + c == a))

请注意,这or是短路的,因此,如果计算单个条件的成本很高,则最好保留原始结构。

Python has an any function that does an or on all the elements of a sequence. Here I’ve converted your statement into a 3-element tuple.

any((a + b == c, a + c == b, b + c == a))

Note that or is short circuiting, so if calculating the individual conditions is expensive it might be better to keep your original construct.


回答 3

如果您知道只处理正数,则可以使用,并且很干净:

a, b, c = sorted((a, b, c))
if a + b == c:
    do_stuff()

正如我所说,这仅适用于正数;但是,如果您知道它们将是肯定的,那么这是一个非常易读的IMO解决方案,即使直接在代码中而不是在函数中。

您可以执行此操作,这可能需要进行一些重复的计算。但是您没有将性能指定为目标:

from itertools import permutations

if any(x + y == z for x, y, z in permutations((a, b, c), 3)):
    do_stuff()

有无permutations()重复计算的可能性:

if any(x + y == z for x, y, z in [(a, b, c), (a, c, b), (b, c, a)]:
    do_stuff()

我可能会将此函数或任何其他解决方案放入函数中。然后,您可以在代码中干净地调用该函数。

就个人而言,除非我需要代码提供更多的灵活性,否则我只会在您的问题中使用第一种方法。简单高效。我仍然可以将其放入函数中:

def two_add_to_third(a, b, c):
    return a + b == c or a + c == b or b + c == a

if two_add_to_third(a, b, c):
    do_stuff()

那是相当Pythonic的,并且可能是最有效的方式(除了额外的函数调用);尽管您无论如何也不必太担心性能,除非它确实引起了问题。

If you know you’re only dealing with positive numbers, this will work, and is pretty clean:

a, b, c = sorted((a, b, c))
if a + b == c:
    do_stuff()

As I said, this only works for positive numbers; but if you know they’re going to be positive, this is a very readable solution IMO, even directly in the code as opposed to in a function.

You could do this, which might do a bit of repeated computation; but you didn’t specify performance as your goal:

from itertools import permutations

if any(x + y == z for x, y, z in permutations((a, b, c), 3)):
    do_stuff()

Or without permutations() and the possibility of repeated computations:

if any(x + y == z for x, y, z in [(a, b, c), (a, c, b), (b, c, a)]:
    do_stuff()

I would probably put this, or any other solution, into a function. Then you can just cleanly call the function in your code.

Personally, unless I needed more flexibility from the code, I would just use the first method in your question. It’s simple and efficient. I still might put it into a function:

def two_add_to_third(a, b, c):
    return a + b == c or a + c == b or b + c == a

if two_add_to_third(a, b, c):
    do_stuff()

That’s pretty Pythonic, and it’s quite possibly the most efficient way to do it (the extra function call aside); although you shouldn’t worry too much about performance anyway, unless it’s actually causing an issue.


回答 4

如果仅使用三个变量,则使用初始方法:

a + b == c or a + c == b or b + c == a

已经很pythonic了。

如果您打算使用更多的变量,那么您的推理方法如下:

a + b + c in (2*a, 2*b, 2*c)

非常聪明,但请考虑一下原因。为什么这样做?
通过一些简单的算法,我们看到:

a + b = c
c = c
a + b + c == c + c == 2*c
a + b + c == 2*c

而这将不得不为任何一个,b或c保持为真,这意味着是的,它会等于2*a2*b2*c。任何数量的变量都是如此。

因此,快速编写此代码的一种好方法是只包含一个变量列表,并对照一倍的值列表检查它们的总和。

values = [a,b,c,d,e,...]
any(sum(values) in [2*x for x in values])

这样,要将更多变量添加到方程式中,您要做的就是通过“ n”个新变量来编辑值列表,而不是编写“ n”个方程式

If you will only be using three variables then your initial method:

a + b == c or a + c == b or b + c == a

Is already very pythonic.

If you plan on using more variables then your method of reasoning with:

a + b + c in (2*a, 2*b, 2*c)

Is very smart but lets think about why. Why does this work?
Well through some simple arithmetic we see that:

a + b = c
c = c
a + b + c == c + c == 2*c
a + b + c == 2*c

And this will have to hold true for either a,b, or c, meaning that yes it will equal 2*a, 2*b, or 2*c. This will be true for any number of variables.

So a good way to write this quickly would be to simply have a list of your variables and check their sum against a list of the doubled values.

values = [a,b,c,d,e,...]
any(sum(values) in [2*x for x in values])

This way, to add more variables into the equation all you have to do is edit your values list by ‘n’ new variables, not write ‘n’ equations


回答 5

以下代码可用于将每个元素与其他元素的总和进行迭代比较,这是根据整个列表的总和(不包括该元素)计算得出的。

 l = [a,b,c]
 any(sum(l)-e == e for e in l)

The following code can be used to iteratively compare each element with the sum of the others, which is computed from sum of the whole list, excluding that element.

 l = [a,b,c]
 any(sum(l)-e == e for e in l)

回答 6

不要尝试简化它。取而代之的是使用函数命名

def any_two_sum_to_third(a, b, c):
  return a + b == c or a + c == b or b + c == a

if any_two_sum_to_third(foo, bar, baz):
  ...

将条件替换为“聪明”可能会使它更短,但不会使其更具可读性。但是,如何保留它也不是很容易理解,因为要知道为什么要一眼检查这三个条件是很棘手的。这使您可以清楚地确定要检查的内容。

关于性能,这种方法的确增加了函数调用的开销,但是除非您发现了绝对必须解决的瓶颈,否则不要牺牲性能的可读性。并始终进行测量,因为某些巧妙的实现能够在某些情况下优化并内联某些函数调用。

Don’t try and simplify it. Instead, name what you’re doing with a function:

def any_two_sum_to_third(a, b, c):
  return a + b == c or a + c == b or b + c == a

if any_two_sum_to_third(foo, bar, baz):
  ...

Replace the condition with something “clever” might make it shorter, but it won’t make it more readable. Leaving it how it is isn’t very readable either however, because it’s tricky to know why you’re checking those three conditions at a glance. This makes it absolutely crystal clear what you’re checking for.

Regarding performance, this approach does add the overhead of a function call, but never sacrifice readability for performance unless you’ve found a bottleneck you absolutely must fix. And always measure, as some clever implementations are capable of optimizing away and inlining some function calls in some circumstances.


回答 7

Python 3:

(a+b+c)/2 in (a,b,c)
(a+b+c+d)/2 in (a,b,c,d)
...

它可以缩放为任意数量的变量:

arr = [a,b,c,d,...]
sum(arr)/2 in arr

但是,总的来说,我同意除非您拥有三个以上的变量,否则原始版本更具可读性。

Python 3:

(a+b+c)/2 in (a,b,c)
(a+b+c+d)/2 in (a,b,c,d)
...

It scales to any number of variables:

arr = [a,b,c,d,...]
sum(arr)/2 in arr

However, in general I agree that unless you have more than three variables, the original version is more readable.


回答 8

(a+b-c)*(a+c-b)*(b+c-a) == 0

如果任意两项的总和等于第三项,则其中一个因子将为零,从而使整个乘积为零。

(a+b-c)*(a+c-b)*(b+c-a) == 0

If the sum of any two terms is equal to the third term, then one of the factors will be zero, making the entire product zero.


回答 9

怎么样:

a == b + c or abs(a) == abs(b - c)

请注意,如果变量是无符号的,这将不起作用。

从代码优化的角度(至少在x86平台上),这似乎是最有效的解决方案。

现代编译器将内联两个abs()函数调用,并通过使用巧妙的CDQ,XOR和SUB指令序列来避免符号测试和随后的条件分支。因此,上面的高级代码将仅由低延迟,高吞吐量的ALU指令和仅两个条件表示。

How about just:

a == b + c or abs(a) == abs(b - c)

Note that this won’t work if variables are unsigned.

From the viewpoint of code optimization (at least on x86 platform) this seems to be the most efficient solution.

Modern compilers will inline both abs() function calls and avoid sign testing and subsequent conditional branch by using a clever sequence of CDQ, XOR, and SUB instructions. The above high-level code will thus be represented with only low-latency, high-throughput ALU instructions and just two conditionals.


回答 10

Alex Varga提供的解决方案“ a in(b + c,bc,cb)”是紧凑的,并且在数学上很漂亮,但实际上我不会那样写代码,因为下一个开发人员不会立即理解代码的目的。

马克·兰瑟姆(Mark Ransom)的解决方案

any((a + b == c, a + c == b, b + c == a))

比它更清晰,但不比它更简洁

a + b == c or a + c == b or b + c == a

当编写代码时,别人不得不去看,或者当我忘记了编写代码时的想法时,我将不得不花很长时间去看,太短或太聪明往往弊大于利。代码应可读。简洁是件好事,但不是那么简洁,以致下一个程序员无法理解。

The solution provided by Alex Varga “a in (b+c, b-c, c-b)” is compact and mathematically beautiful, but I wouldn’t actually write code that way because the next developer coming along would not immediately understand the purpose of the code.

Mark Ransom’s solution of

any((a + b == c, a + c == b, b + c == a))

is more clear but not much more succinct than

a + b == c or a + c == b or b + c == a

When writing code that someone else will have to look at, or that I will have to look at a long time later when I have forgotten what I was thinking when I wrote it, being too short or clever tends to do more harm than good. Code should be readable. So succinct is good, but not so succinct that the next programmer can’t understand it.


回答 11

请求的是更紧凑或更Pythonic-我尝试了更紧凑。

给定

import functools, itertools
f = functools.partial(itertools.permutations, r = 3)
def g(x,y,z):
    return x + y == z

比原版少2个字符

any(g(*args) for args in f((a,b,c)))

测试:

assert any(g(*args) for args in f((a,b,c))) == (a + b == c or a + c == b or b + c == a)

另外,给出:

h = functools.partial(itertools.starmap, g)

这是等效的

any(h(f((a,b,c))))

Request is for more compact OR more pythonic – I tried my hand at more compact.

given

import functools, itertools
f = functools.partial(itertools.permutations, r = 3)
def g(x,y,z):
    return x + y == z

This is 2 characters less than the original

any(g(*args) for args in f((a,b,c)))

test with:

assert any(g(*args) for args in f((a,b,c))) == (a + b == c or a + c == b or b + c == a)

additionally, given:

h = functools.partial(itertools.starmap, g)

This is equivalent

any(h(f((a,b,c))))

回答 12

我想介绍一下我认为是最pythonic的答案:

def one_number_is_the_sum_of_the_others(a, b, c):
    return any((a == b + c, b == a + c, c == a + b))

一般情况,未优化:

def one_number_is_the_sum_of_the_others(numbers):
    for idx in range(len(numbers)):
        remaining_numbers = numbers[:]
        sum_candidate = remaining_numbers.pop(idx)
        if sum_candidate == sum(remaining_numbers):
            return True
    return False 

就Python Zen而言,我认为强调的陈述比其他答案更受关注:

提姆·彼得斯(Tim Peters)撰写的《 Python之禅》

美丽胜于丑陋。
显式胜于隐式。
简单胜于复杂。
复杂胜于复杂。
扁平比嵌套更好。
稀疏胜于密集。
可读性很重要。
特殊情况还不足以打破规则。
尽管实用性胜过纯度。
错误绝不能默默传递。
除非明确地保持沉默。
面对模棱两可的想法,拒绝猜测的诱惑。
应该有一种-最好只有一种-显而易见的方法。
尽管除非您是荷兰人,否则一开始这种方式可能并不明显。
现在总比没有好。
虽然从来没有比这更好正确的现在。
如果实现难以解释,那是个坏主意。
如果实现易于解释,则可能是个好主意。
命名空间是一个很棒的主意-让我们做更多这些吧!

I want to present what I see as the most pythonic answer:

def one_number_is_the_sum_of_the_others(a, b, c):
    return any((a == b + c, b == a + c, c == a + b))

The general case, non-optimized:

def one_number_is_the_sum_of_the_others(numbers):
    for idx in range(len(numbers)):
        remaining_numbers = numbers[:]
        sum_candidate = remaining_numbers.pop(idx)
        if sum_candidate == sum(remaining_numbers):
            return True
    return False 

In terms of the Zen of Python I think the emphasized statements are more followed than from other answer:

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren’t special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one– and preferably only one –obvious way to do it.
Although that way may not be obvious at first unless you’re Dutch.
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it’s a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea — let’s do more of those!


回答 13

作为我编程的一个老习惯,我认为将复杂的表达式放在子句中可以使其更具可读性,如下所示:

a == b+c or b == a+c or c == a+b

加号()

((a == b+c) or (b == a+c) or (c == a+b))

而且我认为使用多行也可以使更多的感觉像这样:

((a == b+c) or 
 (b == a+c) or 
 (c == a+b))

As an old habit of my programming, I think placing complex expression at right in a clause can make it more readable like this:

a == b+c or b == a+c or c == a+b

Plus ():

((a == b+c) or (b == a+c) or (c == a+b))

And also I think using multi-lines can also make more senses like this:

((a == b+c) or 
 (b == a+c) or 
 (c == a+b))

回答 14

以一般方式

m = a+b-c;
if (m == 0 || m == 2*a || m == 2*b) do_stuff ();

如果您可以操作输入变量,

c = a+b-c;
if (c==0 || c == 2*a || c == 2*b) do_stuff ();

如果要使用位hack进行利用,则可以使用“!”,“ >> 1”和“ << 1”

我避免了除法,尽管它可以避免两次乘法以避免舍入错误。但是,检查溢出

In a generic way,

m = a+b-c;
if (m == 0 || m == 2*a || m == 2*b) do_stuff ();

if, manipulating an input variable is OK for you,

c = a+b-c;
if (c==0 || c == 2*a || c == 2*b) do_stuff ();

if you want to exploit using bit hacks, you can use “!”, “>> 1” and “<< 1”

I avoided division though it enables use to avoid two multiplications to avoid round off errors. However, check for overflows


回答 15

def any_sum_of_others (*nums):
    num_elements = len(nums)
    for i in range(num_elements):
        discriminating_map = map(lambda j: -1 if j == i else 1, range(num_elements))
        if sum(n * u for n, u in zip(nums, discriminating_map)) == 0:
            return True
    return False

print(any_sum_of_others(0, 0, 0)) # True
print(any_sum_of_others(1, 2, 3)) # True
print(any_sum_of_others(7, 12, 5)) # True
print(any_sum_of_others(4, 2, 2)) # True
print(any_sum_of_others(1, -1, 0)) # True
print(any_sum_of_others(9, 8, -4)) # False
print(any_sum_of_others(4, 3, 2)) # False
print(any_sum_of_others(1, 1, 1, 1, 4)) # True
print(any_sum_of_others(0)) # True
print(any_sum_of_others(1)) # False
def any_sum_of_others (*nums):
    num_elements = len(nums)
    for i in range(num_elements):
        discriminating_map = map(lambda j: -1 if j == i else 1, range(num_elements))
        if sum(n * u for n, u in zip(nums, discriminating_map)) == 0:
            return True
    return False

print(any_sum_of_others(0, 0, 0)) # True
print(any_sum_of_others(1, 2, 3)) # True
print(any_sum_of_others(7, 12, 5)) # True
print(any_sum_of_others(4, 2, 2)) # True
print(any_sum_of_others(1, -1, 0)) # True
print(any_sum_of_others(9, 8, -4)) # False
print(any_sum_of_others(4, 3, 2)) # False
print(any_sum_of_others(1, 1, 1, 1, 4)) # True
print(any_sum_of_others(0)) # True
print(any_sum_of_others(1)) # False

在python中连接字符串和整数

问题:在python中连接字符串和整数

用python说你有

s = "string"
i = 0
print s+i

会给你错误,所以你写

print s+str(i) 

不会出错。

我认为这是处理int和字符串连接的笨拙方式。甚至Java也不需要显式转换为String来进行这种连接。有没有更好的方法来进行这种串联,即在Python中无需显式转换?

In python say you have

s = "string"
i = 0
print s+i

will give you error so you write

print s+str(i) 

to not get error.

I think this is quite a clumsy way to handle int and string concatenation. Even Java does not need explicit casting to String to do this sort of concatenation. Is there a better way to do this sort of concatenation i.e without explicit casting in Python?


回答 0

现代字符串格式:

"{} and {}".format("string", 1)

Modern string formatting:

"{} and {}".format("string", 1)

回答 1

没有字符串格式:

>> print 'Foo',0
Foo 0

No string formatting:

>> print 'Foo',0
Foo 0

回答 2

使用新样式.format()方法(默认使用.format()提供)进行字符串格式化:

 '{}{}'.format(s, i)

或更旧的,但“仍然存在”,- %格式化:

 '%s%d' %(s, i)

在以上两个示例中,串联的两个项目之间没有空格。如果需要空间,可以简单地将其添加到格式字符串中。

这些提供了 如何串联项目,它们之间的空间等很多控制和灵活性。有关的详细信息格式规范的请参见此

String formatting, using the new-style .format() method (with the defaults .format() provides):

 '{}{}'.format(s, i)

Or the older, but “still sticking around”, %-formatting:

 '%s%d' %(s, i)

In both examples above there’s no space between the two items concatenated. If space is needed, it can simply be added in the format strings.

These provide a lot of control and flexibility about how to concatenate items, the space between them etc. For details about format specifications see this.


回答 3

Python是一种有趣的语言,尽管通常有一种(或两种)“显而易见”的方式来完成任何给定的任务,但灵活性仍然存在。

s = "string"
i = 0

print (s + repr(i))

上面的代码段使用Python3语法编写,但始终允许打印后的括号(可选),直到版本3强制使用为止。

希望这可以帮助。

凯特琳

Python is an interesting language in that while there is usually one (or two) “obvious” ways to accomplish any given task, flexibility still exists.

s = "string"
i = 0

print (s + repr(i))

The above code snippet is written in Python3 syntax but the parentheses after print were always allowed (optional) until version 3 made them mandatory.

Hope this helps.

Caitlin


回答 4

format()方法可用于连接字符串和整数

print(s+"{}".format(i))

format() method can be used to concatenate string and integer

print(s+"{}".format(i))

回答 5

如果您只想打印哟,可以这样做:

print(s , i)

if you only want to print yo can do this:

print(s , i)

回答 6

在python 3.6及更高版本中,您可以像这样格式化它:

new_string = f'{s} {i}'
print(new_string)

要不就:

print(f'{s} {i}')

in python 3.6 and newer, you can format it just like this:

new_string = f'{s} {i}'
print(new_string)

or just:

print(f'{s} {i}')

减少Django的内存使用量。低挂水果?

问题:减少Django的内存使用量。低挂水果?

我的内存使用量随着时间的推移而增加,并且重新启动Django对用户而言并不友好。

我不确定如何分析内存使用情况,但是一些有关如何开始测量的提示将很有用。

我感觉有些简单的步骤可以带来很大的收益。确保将“调试”设置为“假”是显而易见的。

有人可以建议别人吗?在低流量的网站上缓存会带来多少改善?

在这种情况下,我使用mod_python在Apache 2.x下运行。我听说mod_wsgi较为精简,但在此阶段进行切换将非常棘手,除非我知道收益会很大。

编辑:感谢到目前为止的提示。关于如何发现内存用尽的任何建议?是否有任何有关Python内存分析的指南?

同样如前所述,有些事情会使切换到mod_wsgi变得很棘手,因此我想对在朝这个方向努力之前所能获得的收益有所了解。

编辑:卡尔在这里发布了更详细的回复,值得一读:Django部署:减少Apache的开销

编辑: Graham Dumpleton的文章是我在MPM和mod_wsgi相关的东西上找到的最好的文章。我很失望,但是没人能提供有关调试应用程序本身的内存使用情况的任何信息。

最终编辑:好吧,我一直在与Webfaction讨论这个问题,看他们是否可以协助重新编译Apache,这就是他们的话:

“我真的认为切换到MPM Worker + mod_wsgi设置不会给您带来太大的好处。我估计您可能可以节省20MB左右,但可能不超过20MB。”

所以!这使我回到了最初的问题(我仍然不明智)。如何确定问题所在?这是一个众所周知的准则,如果不进行测试以查看需要优化的地方就不会进行优化,但是关于测量Python内存使用情况的教程的方式很少,而针对Django的教程则完全没有。

感谢大家的帮助,但我认为这个问题仍然悬而未决!

另一个最终编辑;-)

我在django-users列表上问了这个,并得到了一些非常有帮助的回复

老实说,有史以来最后一次更新!

这是刚刚发布。可能是迄今为止最好的解决方案:使用Pympler分析Django对象的大小和内存使用情况

My memory usage increases over time and restarting Django is not kind to users.

I am unsure how to go about profiling the memory usage but some tips on how to start measuring would be useful.

I have a feeling that there are some simple steps that could produce big gains. Ensuring ‘debug’ is set to ‘False’ is an obvious biggie.

Can anyone suggest others? How much improvement would caching on low-traffic sites?

In this case I’m running under Apache 2.x with mod_python. I’ve heard mod_wsgi is a bit leaner but it would be tricky to switch at this stage unless I know the gains would be significant.

Edit: Thanks for the tips so far. Any suggestions how to discover what’s using up the memory? Are there any guides to Python memory profiling?

Also as mentioned there’s a few things that will make it tricky to switch to mod_wsgi so I’d like to have some idea of the gains I could expect before ploughing forwards in that direction.

Edit: Carl posted a slightly more detailed reply here that is worth reading: Django Deployment: Cutting Apache’s Overhead

Edit: Graham Dumpleton’s article is the best I’ve found on the MPM and mod_wsgi related stuff. I am rather disappointed that no-one could provide any info on debugging the memory usage in the app itself though.

Final Edit: Well I have been discussing this with Webfaction to see if they could assist with recompiling Apache and this is their word on the matter:

“I really don’t think that you will get much of a benefit by switching to an MPM Worker + mod_wsgi setup. I estimate that you might be able to save around 20MB, but probably not much more than that.”

So! This brings me back to my original question (which I am still none the wiser about). How does one go about identifying where the problems lies? It’s a well known maxim that you don’t optimize without testing to see where you need to optimize but there is very little in the way of tutorials on measuring Python memory usage and none at all specific to Django.

Thanks for everyone’s assistance but I think this question is still open!

Another final edit ;-)

I asked this on the django-users list and got some very helpful replies

Honestly the last update ever!

This was just released. Could be the best solution yet: Profiling Django object size and memory usage with Pympler


回答 0

确保您没有保留对数据的全局引用。这样可以防止python垃圾回收器释放内存。

不要使用mod_python。它在apache中加载一个解释器。如果需要使用apache,请mod_wsgi改用。切换并不困难。这很容易。与django -dead相比,为djangomod_wsgi进行配置更容易mod_python

如果您可以从需求中删除apache,那对您的记忆会更好。spawning似乎是运行python Web应用程序的新的快速可扩展方式。

编辑:我看不到如何切换到mod_wsgi可能是“ 棘手的 ”。这应该是一个非常容易的任务。请详细说明您在使用交换机时遇到的问题。

Make sure you are not keeping global references to data. That prevents the python garbage collector from releasing the memory.

Don’t use mod_python. It loads an interpreter inside apache. If you need to use apache, use mod_wsgi instead. It is not tricky to switch. It is very easy. mod_wsgi is way easier to configure for django than brain-dead mod_python.

If you can remove apache from your requirements, that would be even better to your memory. spawning seems to be the new fast scalable way to run python web applications.

EDIT: I don’t see how switching to mod_wsgi could be “tricky“. It should be a very easy task. Please elaborate on the problem you are having with the switch.


回答 1

如果您在mod_wsgi下运行,并且由于它是WSGI兼容的,则大概是在生成的,您可以使用Dozer查看您的内存使用情况。

在mod_wsgi下,只需将其添加到WSGI脚本的底部:

from dozer import Dozer
application = Dozer(application)

然后将浏览器指向http:// domain / _dozer / index以查看所有内存分配的列表。

我还要添加对mod_wsgi的支持之声。与mod_python相比,它在性能和内存使用方面有很大的不同。Graham Dumpleton对mod_wsgi的支持非常出色,无论是在积极开发方面还是在帮助邮件列表中的人员优化安装方面均如此。curse.com上的David Cramer 发布了一些图表(不幸的是,现在似乎找不到),显示了在该高流量站点上切换到mod_wsgi后cpu和内存使用量的急剧下降。django开发人员中有几个已经切换。说真的,这很容易:)

If you are running under mod_wsgi, and presumably spawning since it is WSGI compliant, you can use Dozer to look at your memory usage.

Under mod_wsgi just add this at the bottom of your WSGI script:

from dozer import Dozer
application = Dozer(application)

Then point your browser at http://domain/_dozer/index to see a list of all your memory allocations.

I’ll also just add my voice of support for mod_wsgi. It makes a world of difference in terms of performance and memory usage over mod_python. Graham Dumpleton’s support for mod_wsgi is outstanding, both in terms of active development and in helping people on the mailing list to optimize their installations. David Cramer at curse.com has posted some charts (which I can’t seem to find now unfortunately) showing the drastic reduction in cpu and memory usage after they switched to mod_wsgi on that high traffic site. Several of the django devs have switched. Seriously, it’s a no-brainer :)


回答 2

这些是我知道的Python内存探查器解决方案(与Django无关):

免责声明:我与后者有一定关系。

各个项目的文档应使您了解如何使用这些工具来分析Python应用程序的内存行为。

以下是一个不错的“战争故事”,还提供了一些有用的指导:

These are the Python memory profiler solutions I’m aware of (not Django related):

Disclaimer: I have a stake in the latter.

The individual project’s documentation should give you an idea of how to use these tools to analyze memory behavior of Python applications.

The following is a nice “war story” that also gives some helpful pointers:


回答 3

此外,检查是否不使用任何已知的泄漏器。由于Unicode处理中的错误,MySQLdb会泄漏Django的大量内存。除此之外,Django Debug Toolbar可能会帮助您跟踪猪。

Additionally, check if you do not use any of known leakers. MySQLdb is known to leak enormous amounts of memory with Django due to bug in unicode handling. Other than that, Django Debug Toolbar might help you to track the hogs.


回答 4

除了不保留对大型数据对象的全局引用之外,还应尽可能避免将大型数据集加载到内存中。

在守护程序模式下切换到mod_wsgi,并使用Apache的worker mpm代替prefork。后面的步骤可以使您以更少的内存开销为更多的并发用户提供服务。

In addition to not keeping around global references to large data objects, try to avoid loading large datasets into memory at all wherever possible.

Switch to mod_wsgi in daemon mode, and use Apache’s worker mpm instead of prefork. This latter step can allow you to serve many more concurrent users with much less memory overhead.


回答 5

Webfaction实际上有一些技巧可以降低Django的内存使用量。

要点:

  • 确保将debug设置为false(您已经知道)。
  • 在您的Apache配置中使用“ ServerLimit”
  • 检查内存中没有大对象
  • 考虑在单独的进程或服务器中提供静态内容。
  • 在您的apache配置中使用“ MaxRequestsPerChild”
  • 找出并了解您正在使用多少内存

Webfaction actually has some tips for keeping django memory usage down.

The major points:

  • Make sure debug is set to false (you already know that).
  • Use “ServerLimit” in your apache config
  • Check that no big objects are being loaded in memory
  • Consider serving static content in a separate process or server.
  • Use “MaxRequestsPerChild” in your apache config
  • Find out and understand how much memory you’re using

回答 6

mod_wsgi的另一个优点:maximum-requestsWSGIDaemonProcess指令中设置一个参数,mod_wsgi会每隔很长时间就重新启动守护进程。对用户来说,应该没有可见的效果,除了第一次刷新新进程时页面加载缓慢之外,因为它将把Django和您的应用程序代码加载到内存中。

但是,即使确实有内存泄漏,也应避免进程过大,而不必中断对用户的服务。

Another plus for mod_wsgi: set a maximum-requests parameter in your WSGIDaemonProcess directive and mod_wsgi will restart the daemon process every so often. There should be no visible effect for the user, other than a slow page load the first time a fresh process is hit, as it’ll be loading Django and your application code into memory.

But even if you do have memory leaks, that should keep the process size from getting too large, without having to interrupt service to your users.


回答 7

这是我用于mod_wsgi的脚本(称为wsgi.py,并放在django项目的根目录中):

import os
import sys
import django.core.handlers.wsgi

from os import path

sys.stdout = open('/dev/null', 'a+')
sys.stderr = open('/dev/null', 'a+')

sys.path.append(path.join(path.dirname(__file__), '..'))

os.environ['DJANGO_SETTINGS_MODULE'] = 'myproject.settings'
application = django.core.handlers.wsgi.WSGIHandler()

根据需要调整myproject.settings和路径。我将所有输出重定向到/ dev / null,因为默认情况下mod_wsgi阻止打印。请改用日志记录。

对于apache:

<VirtualHost *>
   ServerName myhost.com

   ErrorLog /var/log/apache2/error-myhost.log
   CustomLog /var/log/apache2/access-myhost.log common

   DocumentRoot "/var/www"

   WSGIScriptAlias / /path/to/my/wsgi.py

</VirtualHost>

希望这至少应该可以帮助您设置mod_wsgi,以便您查看它是否有所作为。

Here is the script I use for mod_wsgi (called wsgi.py, and put in the root off my django project):

import os
import sys
import django.core.handlers.wsgi

from os import path

sys.stdout = open('/dev/null', 'a+')
sys.stderr = open('/dev/null', 'a+')

sys.path.append(path.join(path.dirname(__file__), '..'))

os.environ['DJANGO_SETTINGS_MODULE'] = 'myproject.settings'
application = django.core.handlers.wsgi.WSGIHandler()

Adjust myproject.settings and the path as needed. I redirect all output to /dev/null since mod_wsgi by default prevents printing. Use logging instead.

For apache:

<VirtualHost *>
   ServerName myhost.com

   ErrorLog /var/log/apache2/error-myhost.log
   CustomLog /var/log/apache2/access-myhost.log common

   DocumentRoot "/var/www"

   WSGIScriptAlias / /path/to/my/wsgi.py

</VirtualHost>

Hopefully this should at least help you set up mod_wsgi so you can see if it makes a difference.


回答 8

缓存:确保已将其刷新。它很容易将某些东西放到缓存中,但是由于缓存引用而永远不会被GC。

Swig’d代码:确保任何内存管理都正确完成,这真的很容易在python中丢失,尤其是在第三方库中

监视:如果可以,获取有关内存使用率和命中率的数据。通常,您会看到某种类型的请求与内存使用之间的关联。

Caches: make sure they’re being flushed. Its easy for something to land in a cache, but never be GC’d because of the cache reference.

Swig’d code: Make sure any memory management is being done correctly, its really easy to miss these in python, especially with third party libraries

Monitoring: If you can, get data about memory usage and hits. Usually you’ll see a correlation between a certain type of request and memory usage.


回答 9

我们偶然发现了Django中包含大型站点地图(10000个项)的错误。似乎Django在生成站点地图时正在尝试将它们全部加载到内存中:http : //code.djangoproject.com/ticket/11572-当Google对该网站进行访问时,有效地终止了Apache进程。

We stumbled over a bug in Django with big sitemaps (10.000 items). Seems Django is trying to load them all in memory when generating the sitemap: http://code.djangoproject.com/ticket/11572 – effectively kills the apache process when Google pays a visit to the site.


除非分配输出,为什么调用Python字符串方法不做任何事情?

问题:除非分配输出,为什么调用Python字符串方法不做任何事情?

我尝试做一个简单的字符串替换,但是我不知道为什么它似乎不起作用:

X = "hello world"
X.replace("hello", "goodbye")

我想将单词更改hellogoodbye,因此应将字符串更改"hello world""goodbye world"。但是X仍然存在"hello world"。为什么我的代码不起作用?

I try to do a simple string replacement, but I don’t know why it doesn’t seem to work:

X = "hello world"
X.replace("hello", "goodbye")

I want to change the word hello to goodbye, thus it should change the string "hello world" to "goodbye world". But X just remains "hello world". Why is my code not working?


回答 0

这是因为字符串在Python中是不可变的

这意味着将X.replace("hello","goodbye")返回的副本,X其中包含已替换的副本。因此,您需要替换此行:

X.replace("hello", "goodbye")

用这一行:

X = X.replace("hello", "goodbye")

更广泛地说,这是所有Python字符串的方法是“就地”修改字符串的内容真实,例如replacestriptranslatelower/ upperjoin,…

如果要使用它而不要将其丢弃,则必须将其输出分配给某些东西。

X  = X.strip(' \t')
X2 = X.translate(...)
Y  = X.lower()
Z  = X.upper()
A  = X.join(':')
B  = X.capitalize()
C  = X.casefold()

等等。

This is because strings are immutable in Python.

Which means that X.replace("hello","goodbye") returns a copy of X with replacements made. Because of that you need replace this line:

X.replace("hello", "goodbye")

with this line:

X = X.replace("hello", "goodbye")

More broadly, this is true for all Python string methods that change a string’s content “in-place”, e.g. replace,strip,translate,lower/upper,join,…

You must assign their output to something if you want to use it and not throw it away, e.g.

X  = X.strip(' \t')
X2 = X.translate(...)
Y  = X.lower()
Z  = X.upper()
A  = X.join(':')
B  = X.capitalize()
C  = X.casefold()

and so on.


回答 1

所有的字符串功能lowerupperstrip而没有修改原始返回的字符串。如果您尝试修改一个字符串(可能会想到)well it is an iterable,它将失败。

x = 'hello'
x[0] = 'i' #'str' object does not support item assignment

关于字符串不可更改的重要性的文章很好读:为什么Python字符串不可更改?使用它们的最佳实践

All string functions as lower, upper, strip are returning a string without modifying the original. If you try to modify a string, as you might think well it is an iterable, it will fail.

x = 'hello'
x[0] = 'i' #'str' object does not support item assignment

There is a good reading about the importance of strings being immutable: Why are Python strings immutable? Best practices for using them


创建动态选择字段

问题:创建动态选择字段

我在尝试了解如何在Django中创建动态选择字段时遇到了一些麻烦。我有一个模型设置类似:

class rider(models.Model):
     user = models.ForeignKey(User)
     waypoint = models.ManyToManyField(Waypoint)

class Waypoint(models.Model):
     lat = models.FloatField()
     lng = models.FloatField()

我想做的是创建一个选择字段whos的值是与该骑手相关联的航点(将是登录的人)。

目前,我以如下形式覆盖init:

class waypointForm(forms.Form):
     def __init__(self, *args, **kwargs):
          super(joinTripForm, self).__init__(*args, **kwargs)
          self.fields['waypoints'] = forms.ChoiceField(choices=[ (o.id, str(o)) for o in Waypoint.objects.all()])

但是所有要做的就是列出所有路标,它们与任何特定的骑手都没有关联。有任何想法吗?谢谢。

I’m having some trouble trying to understand how to create a dynamic choice field in django. I have a model set up something like:

class rider(models.Model):
     user = models.ForeignKey(User)
     waypoint = models.ManyToManyField(Waypoint)

class Waypoint(models.Model):
     lat = models.FloatField()
     lng = models.FloatField()

What I’m trying to do is create a choice Field whos values are the waypoints associated with that rider (which would be the person logged in).

Currently I’m overriding init in my forms like so:

class waypointForm(forms.Form):
     def __init__(self, *args, **kwargs):
          super(joinTripForm, self).__init__(*args, **kwargs)
          self.fields['waypoints'] = forms.ChoiceField(choices=[ (o.id, str(o)) for o in Waypoint.objects.all()])

But all that does is list all the waypoints, they’re not associated with any particular rider. Any ideas? Thanks.


回答 0

您可以通过将用户传递给表单init来过滤航点

class waypointForm(forms.Form):
    def __init__(self, user, *args, **kwargs):
        super(waypointForm, self).__init__(*args, **kwargs)
        self.fields['waypoints'] = forms.ChoiceField(
            choices=[(o.id, str(o)) for o in Waypoint.objects.filter(user=user)]
        )

您在启动表单时的视线通过了用户

form = waypointForm(user)

在模型形式的情况下

class waypointForm(forms.ModelForm):
    def __init__(self, user, *args, **kwargs):
        super(waypointForm, self).__init__(*args, **kwargs)
        self.fields['waypoints'] = forms.ModelChoiceField(
            queryset=Waypoint.objects.filter(user=user)
        )

    class Meta:
        model = Waypoint

you can filter the waypoints by passing the user to the form init

class waypointForm(forms.Form):
    def __init__(self, user, *args, **kwargs):
        super(waypointForm, self).__init__(*args, **kwargs)
        self.fields['waypoints'] = forms.ChoiceField(
            choices=[(o.id, str(o)) for o in Waypoint.objects.filter(user=user)]
        )

from your view while initiating the form pass the user

form = waypointForm(user)

in case of model form

class waypointForm(forms.ModelForm):
    def __init__(self, user, *args, **kwargs):
        super(waypointForm, self).__init__(*args, **kwargs)
        self.fields['waypoints'] = forms.ModelChoiceField(
            queryset=Waypoint.objects.filter(user=user)
        )

    class Meta:
        model = Waypoint

回答 1

有针对您的问题的内置解决方案:ModelChoiceField

通常,ModelForm当您需要创建/更改数据库对象时,总是值得尝试使用。在95%的情况下都有效,并且比创建自己的实现要干净得多。

There’s built-in solution for your problem: ModelChoiceField.

Generally, it’s always worth trying to use ModelForm when you need to create/change database objects. Works in 95% of the cases and it’s much cleaner than creating your own implementation.


回答 2

问题是你什么时候做

def __init__(self, user, *args, **kwargs):
    super(waypointForm, self).__init__(*args, **kwargs)
    self.fields['waypoints'] = forms.ChoiceField(choices=[ (o.id, str(o)) for o in Waypoint.objects.filter(user=user)])

在更新请求中,先前的值将丢失!

the problem is when you do

def __init__(self, user, *args, **kwargs):
    super(waypointForm, self).__init__(*args, **kwargs)
    self.fields['waypoints'] = forms.ChoiceField(choices=[ (o.id, str(o)) for o in Waypoint.objects.filter(user=user)])

in a update request, the previous value will lost!


回答 3

在初始化骑乘者实例时将其传递给表单怎么样?

class WaypointForm(forms.Form):
    def __init__(self, rider, *args, **kwargs):
      super(joinTripForm, self).__init__(*args, **kwargs)
      qs = rider.Waypoint_set.all()
      self.fields['waypoints'] = forms.ChoiceField(choices=[(o.id, str(o)) for o in qs])

# In view:
rider = request.user
form = WaypointForm(rider) 

How about passing the rider instance to the form while initializing it?

class WaypointForm(forms.Form):
    def __init__(self, rider, *args, **kwargs):
      super(joinTripForm, self).__init__(*args, **kwargs)
      qs = rider.Waypoint_set.all()
      self.fields['waypoints'] = forms.ChoiceField(choices=[(o.id, str(o)) for o in qs])

# In view:
rider = request.user
form = WaypointForm(rider) 

回答 4

在正常选择字段下的工作解决方案下。我的问题是,每个用户都基于很少的条件有自己的CUSTOM选择字段选项。

class SupportForm(BaseForm):

    affiliated = ChoiceField(required=False, label='Fieldname', choices=[], widget=Select(attrs={'onchange': 'sysAdminCheck();'}))

    def __init__(self, *args, **kwargs):

        self.request = kwargs.pop('request', None)
        grid_id = get_user_from_request(self.request)
        for l in get_all_choices().filter(user=user_id):
            admin = 'y' if l in self.core else 'n'
            choice = (('%s_%s' % (l.name, admin)), ('%s' % l.name))
            self.affiliated_choices.append(choice)
        super(SupportForm, self).__init__(*args, **kwargs)
        self.fields['affiliated'].choices = self.affiliated_choice

Underneath working solution with normal choice field. my problem was that each user have their own CUSTOM choicefield options based on few conditions.

class SupportForm(BaseForm):

    affiliated = ChoiceField(required=False, label='Fieldname', choices=[], widget=Select(attrs={'onchange': 'sysAdminCheck();'}))

    def __init__(self, *args, **kwargs):

        self.request = kwargs.pop('request', None)
        grid_id = get_user_from_request(self.request)
        for l in get_all_choices().filter(user=user_id):
            admin = 'y' if l in self.core else 'n'
            choice = (('%s_%s' % (l.name, admin)), ('%s' % l.name))
            self.affiliated_choices.append(choice)
        super(SupportForm, self).__init__(*args, **kwargs)
        self.fields['affiliated'].choices = self.affiliated_choice

回答 5

正如Breedly和Liang指出的那样,Ashok的解决方案将阻止您在发布表单时获取选择值。

一种稍微不同但仍然不完善的解决方法是:

class waypointForm(forms.Form):
    def __init__(self, user, *args, **kwargs):
        self.base_fields['waypoints'].choices = self._do_the_choicy_thing()
        super(waypointForm, self).__init__(*args, **kwargs)

但是,这可能会导致一些并发问题。

As pointed by Breedly and Liang, Ashok’s solution will prevent you from getting the select value when posting the form.

One slightly different, but still imperfect, way to solve that would be:

class waypointForm(forms.Form):
    def __init__(self, user, *args, **kwargs):
        self.base_fields['waypoints'].choices = self._do_the_choicy_thing()
        super(waypointForm, self).__init__(*args, **kwargs)

This could cause some concurrence problems, though.


回答 6

您可以将字段声明为表单的一流属性,并且可以动态设置选项:

class WaypointForm(forms.Form):
    waypoints = forms.ChoiceField(choices=[])

    def __init__(self, user, *args, **kwargs):
        super().__init__(*args, **kwargs)
        waypoint_choices = [(o.id, str(o)) for o in Waypoint.objects.filter(user=user)]
        self.fields['waypoints'].choices = waypoint_choices

您也可以使用ModelChoiceField并以类似方式在init上设置queryset。

You can declare the field as a first-class attribute of your form and just set choices dynamically:

class WaypointForm(forms.Form):
    waypoints = forms.ChoiceField(choices=[])

    def __init__(self, user, *args, **kwargs):
        super().__init__(*args, **kwargs)
        waypoint_choices = [(o.id, str(o)) for o in Waypoint.objects.filter(user=user)]
        self.fields['waypoints'].choices = waypoint_choices

You can also use a ModelChoiceField and set the queryset on init in a similar manner.


回答 7

如果您需要django admin中的动态选择字段;这适用于Django> = 2.1。

class CarAdminForm(forms.ModelForm):
    class Meta:
        model = Car

    def __init__(self, *args, **kwargs):
        super(CarForm, self).__init__(*args, **kwargs)

        # Now you can make it dynamic.
        choices = (
            ('audi', 'Audi'),
            ('tesla', 'Tesla')
        )

        self.fields.get('car_field').choices = choices

    car_field = forms.ChoiceField(choices=[])

@admin.register(Car)
class CarAdmin(admin.ModelAdmin):
    form = CarAdminForm

希望这可以帮助。

If you need a dynamic choice field in django admin; This works for django >=2.1.

class CarAdminForm(forms.ModelForm):
    class Meta:
        model = Car

    def __init__(self, *args, **kwargs):
        super(CarForm, self).__init__(*args, **kwargs)

        # Now you can make it dynamic.
        choices = (
            ('audi', 'Audi'),
            ('tesla', 'Tesla')
        )

        self.fields.get('car_field').choices = choices

    car_field = forms.ChoiceField(choices=[])

@admin.register(Car)
class CarAdmin(admin.ModelAdmin):
    form = CarAdminForm

Hope this helps.


virtualenv –no-site-packages和pip仍在查找全局软件包吗?

问题:virtualenv –no-site-packages和pip仍在查找全局软件包吗?

我印象中virtualenv --no-site-packages会创建一个完全独立和隔离的Python环境,但事实并非如此。

例如,我在全局安装了python-django,但希望使用其他Django版本创建virtualenv。

$ virtualenv --no-site-packages foo       
New python executable in foo/bin/python
Installing setuptools............done.
$ pip -E foo install Django
Requirement already satisfied: Django in /usr/share/pyshared
Installing collected packages: Django
Successfully installed Django

据我所知,pip -E foo install以上内容应该重新安装新版本的Django。另外,如果我告诉pip冻结环境,则会得到很多软件包。我希望对于一个新鲜的环境来说--no-site-packages这是空白?

$ pip -E foo freeze
4Suite-XML==1.0.2
BeautifulSoup==3.1.0.1
Brlapi==0.5.3
BzrTools==1.17.0
Django==1.1
... and so on ...

我是否误解了--no-site-packages应该如何工作?

I was under the impression that virtualenv --no-site-packages would create a completely separate and isolated Python environment, but it doesn’t seem to.

For example, I have python-django installed globally, but wish to create a virtualenv with a different Django version.

$ virtualenv --no-site-packages foo       
New python executable in foo/bin/python
Installing setuptools............done.
$ pip -E foo install Django
Requirement already satisfied: Django in /usr/share/pyshared
Installing collected packages: Django
Successfully installed Django

From what I can tell, the pip -E foo install above is supposed to re-install a new version of Django. Also, if I tell pip to freeze the environment, I get a whole lot of packages. I would expect that for a fresh environment with --no-site-packages this would be blank?

$ pip -E foo freeze
4Suite-XML==1.0.2
BeautifulSoup==3.1.0.1
Brlapi==0.5.3
BzrTools==1.17.0
Django==1.1
... and so on ...

Am I misunderstanding how --no-site-packages is supposed to work?


回答 0

我遇到了这样的问题,直到意识到(早于发现virtualenv),我就开始在.bashrc文件中的PYTHONPATH中添加目录。由于已经过去一年多了,所以我没有马上想到。

I had a problem like this, until I realized that (long before I had discovered virtualenv), I had gone adding directories to the PYTHONPATH in my .bashrc file. As it had been over a year beforehand, I didn’t think of that straight away.


回答 1

您必须确保pip在创建的虚拟环境中而不是全局环境中运行二进制文件。

env/bin/pip freeze

查看测试:

我们使用以下--no-site-packages选项创建virtualenv :

$ virtualenv --no-site-packages -p /usr/local/bin/python mytest
Running virtualenv with interpreter /usr/local/bin/python
New python executable in mytest/bin/python
Installing setuptools, pip, wheel...done.

我们检查freeze新创建的的输出pip

$ mytest/bin/pip freeze
argparse==1.3.0
wheel==0.24.0

但是,如果我们使用global pip,那么我们将得到:

$ pip freeze
...
pyxdg==0.25
...
range==1.0.0
...
virtualenv==13.1.2

即,pip已在整个系统中安装的所有软件包。通过检查,which pip我们得到了(至少在我的情况下)类似的东西/usr/local/bin/pip,这意味着当我们这样做时pip freeze,将调用此二进制文件而不是mytest/bin/pip

You have to make sure you are running the pip binary in the virtual environment you created, not the global one.

env/bin/pip freeze

See a test:

We create the virtualenv with the --no-site-packages option:

$ virtualenv --no-site-packages -p /usr/local/bin/python mytest
Running virtualenv with interpreter /usr/local/bin/python
New python executable in mytest/bin/python
Installing setuptools, pip, wheel...done.

We check the output of freeze from the newly created pip:

$ mytest/bin/pip freeze
argparse==1.3.0
wheel==0.24.0

But if we do use the global pip, this is what we get:

$ pip freeze
...
pyxdg==0.25
...
range==1.0.0
...
virtualenv==13.1.2

That is, all the packages that pip has installed in the whole system. By checking which pip we get (at least in my case) something like /usr/local/bin/pip, meaning that when we do pip freeze it is calling this binary instead of mytest/bin/pip.


回答 2

最终,我发现无论出于什么原因,pip -E都无法正常工作。但是,如果我实际上激活了virtualenv,并使用virtualenv提供的easy_install来安装pip,然后直接从内部使用pip,那么它似乎可以正常工作,并且只显示virtualenv中的软件包。

Eventually I found that, for whatever reason, pip -E was not working. However, if I actually activate the virtualenv, and use easy_install provided by virtualenv to install pip, then use pip directly from within, it seems to work as expected and only show the packages in the virtualenv


回答 3

我知道这是一个非常老的问题,但是对于那些来到这里寻求解决方案的人来说:

运行之前不要忘记激活virtualenvsource bin/activatepip freeze。否则,您将获得所有全局软件包的列表。

I know this is a very old question but for those arriving here looking for a solution:

Don’t forget to activate the virtualenv (source bin/activate) before running pip freeze. Otherwise you’ll get a list of all global packages.


回答 4

暂时清除PYTHONPATH带有:

export PYTHONPATH=

然后创建并激活虚拟环境:

virtualenv foo
. foo/bin/activate

只有这样:

pip freeze

Temporarily clear the PYTHONPATH with:

export PYTHONPATH=

Then create and activate the virtual environment:

virtualenv foo
. foo/bin/activate

Only then:

pip freeze

回答 5

--no-site-packages顾名思义,应该从中删除标准site-packages目录sys.path。保留在标准Python路径中的所有其他内容都将保留在那里。

--no-site-packages should, as the name suggests, remove the standard site-packages directory from sys.path. Anything else that lives in the standard Python path will remain there.


回答 6

如果直接调用脚本script.py,则在Windows 上可能会发生类似的问题,因为脚本随后使用Windows默认打开器并在虚拟环境之外打开Python。调用它将python script.py在虚拟环境中使用Python。

A similar problem can occur on Windows if you call scripts directly as script.py which then uses the Windows default opener and opens Python outside the virtual environment. Calling it with python script.py will use Python with the virtual environment.


回答 7

当您将virtualenv目录移动到另一个目录(在linux上)或重命名父目录时,似乎也会发生这种情况。

This also seems to happen when you move the virtualenv directory to another directory (on linux), or rename a parent directory.


回答 8

我遇到了同样的问题。对我来说(在Ubuntu上)的问题是我的路径名包含$。当我在$ dir之外创建一个virtualenv时,它工作正常。

奇怪的。

I was having this same problem. The issue for me (on Ubuntu) was that my path name contained $. When I created a virtualenv outside of the $ dir, it worked fine.

Weird.


回答 9

virtualenv pip无法正常工作的可能原因之一是,如果任何父文件夹的名称中都有空格/Documents/project name/app 重命名以/Documents/projectName/app解决该问题。

One of the possible reasons why virtualenv pip won’t work is if any of the parent folders had space in its name /Documents/project name/app renaming it to /Documents/projectName/app solves the problem.


回答 10

我遇到了同样的问题,即venv中的点仍然可以用作全局点。
搜索很多页面后,我以这种方式解决了。
1.通过virtualenv使用选项“ –no-site-packages”创建一个新的venv

virtualenv --no-site-packages --python=/xx/xx/bin/python my_env_nmae

请注意,尽管从Virtualenv的doc文件中的1.7.0版本开始,“-no-site-packages”选项默认为true,但是我发现它除非您手动将其设置为不起作用。为了获得纯净的venv,我强烈建议打开2。激活您创建的新env

source ./my_env_name/bin/activate
  1. 检查您的pip位置和python位置,并确保这两个命令在虚拟环境下
pip --version
which python
  1. 在虚拟环境下使用pip安装不受全局软件包中断影响的软件包
pip install package_name

希望这个答案对您有帮助!

I came accross the same problem where pip in venv still works as global pip.
After searching many pages, i figure it out this way.
1. Create a new venv by virtualenv with option “–no-site-packages”

virtualenv --no-site-packages --python=/xx/xx/bin/python my_env_nmae

please note that although the “–no-site-packages” option was default true since 1.7.0 in the doc file of virtualenv, but i found it not working unless you set it on manually. In order to get a pure venv, i strongly suggest turning this option on 2. Activate the new env you created

source ./my_env_name/bin/activate
  1. Check your pip location and python location and make sure these two commands are under virtual envirement
pip --version
which python
  1. Use pip under virtual env to install packages free from the global packages interuption
pip install package_name

Wish this answer helps you!


回答 11

这是所有pip安装选项的列表-我没有找到任何’ -E‘选项,可能是较旧的版本。下面,我virtualenv将为即将到来的SO用户提供简单的英语用法和使用方法。


一切似乎都不错,请接受激活virtualenvfoo)。它所要做的就是允许我们拥有多个(和不同的)Python环境,即各种Python版本,各种Django版本或任何其他Python包-如果我们有生产中的先前版本,并且想用我们的测试最新的Django版本应用。

简而言之,创建和使用(激活)虚拟环境(virtualenv)使得可以使用不同的Python解释器(即python 2.7和3.3)运行或测试我们的应用程序或简单的python脚本-可以全新安装(使用--no-site-packages选项),也可以使用现有的所有软件包/ last设置(使用--system-site-packages选项)。要使用它,我们必须激活它:

$ pip install django 将其安装到全局站点程序包中,并类似地获取 pip freeze will会给出全局站点程序包的名称。

而在venv dir(foo)内部执行$ source /bin/activate将激活venv,即,现在使用pip安装的所有内容都只会安装在虚拟env中,并且只有现在pip冻结将不会提供全局站点软件包python软件包的列表。一旦激活:

$ virtualenv --no-site-packages foo       
New python executable in foo/bin/python
Installing setuptools............done.
$ cd foo
$ source bin/activate 
(foo)$ pip install django

(foo)$符号表明我们正在使用虚拟python环境之前,即任何带有pip的东西-安装,冻结,卸载将仅限于该venv,并且对全局/默认Python安装/软件包没有影响。

Here’s the list of all the pip install options – I didn’t find any ‘-E‘ option, may be older version had it. Below I am sharing a plain english usage and working of virtualenv for the upcoming SO users.


Every thing seems fine, accept activating the virtualenv (foo). All it does is allow us to have multiple (and varying) python environment i.e. various Python versions, or various Django versions, or any other Python package – in case we have a previous version in production and want to test the latest Django release with our application.

In short creating and using (activating) virtual environment (virtualenv) makes it possible to run or test our application or simple python scripts with different Python interpreter i.e. Python 2.7 and 3.3 – can be a fresh installation (using --no-site-packages option) or all the packages from existing/last setup (using --system-site-packages option). To use it we have to activate it:

$ pip install django will install it into the global site-packages, and similarly getting the pip freeze will give names of the global site-packages.

while inside the venv dir (foo) executing $ source /bin/activate will activate venv i.e. now anything installed with pip will only be installed in the virtual env, and only now the pip freeze will not give the list of global site-packages python packages. Once activated:

$ virtualenv --no-site-packages foo       
New python executable in foo/bin/python
Installing setuptools............done.
$ cd foo
$ source bin/activate 
(foo)$ pip install django

(foo) before the $ sign indicates we are using a virtual python environment i.e. any thing with pip – install, freeze, uninstall will be limited to this venv, and no effect on global/default Python installation/packages.