标签归档:Python

有条件替换熊猫

问题:有条件替换熊猫

我有一个DataFrame,我想用超过零的值替换特定列中的值。我以为这是实现此目标的一种方式:

df[df.my_channel > 20000].my_channel = 0

如果将通道复制到新的数据框中,这很简单:

df2 = df.my_channel 

df2[df2 > 20000] = 0

这完全符合我的要求,但似乎无法与通道一起用作原始DataFrame的一部分。

I have a DataFrame, and I want to replace the values in a particular column that exceed a value with zero. I had thought this was a way of achieving this:

df[df.my_channel > 20000].my_channel = 0

If I copy the channel into a new data frame it’s simple:

df2 = df.my_channel 

df2[df2 > 20000] = 0

This does exactly what I want, but seems not to work with the channel as part of the original DataFrame.


回答 0

.ixindexer可以在0.20.0之前的熊猫版本上正常工作,但是由于pandas为0.20.0 ,因此不推荐使用.ix indexer ,因此应避免使用它。而是可以使用或索引器。您可以通过以下方法解决此问题:.lociloc

mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0

或者,一行

df.loc[df.my_channel > 20000, 'my_channel'] = 0

mask帮助您选择这些行df.my_channel > 20000True,而df.loc[mask, column_name] = 0将值0到所选择的行,其中mask在其名称是列存放column_name

更新: 在这种情况下,应该使用,loc因为如果使用iloc,则会NotImplementedError告诉您基于iLocation的基于整数类型的布尔索引不可用

.ix indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix indexer is deprecated, so you should avoid using it. Instead, you can use .loc or iloc indexers. You can solve this problem by:

mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0

Or, in one line,

df.loc[df.my_channel > 20000, 'my_channel'] = 0

mask helps you to select the rows in which df.my_channel > 20000 is True, while df.loc[mask, column_name] = 0 sets the value 0 to the selected rows where maskholds in the column which name is column_name.

Update: In this case, you should use loc because if you use iloc, you will get a NotImplementedError telling you that iLocation based boolean indexing on an integer type is not available.


回答 1

尝试

df.loc[df.my_channel > 20000, 'my_channel'] = 0

注: 由于v0.20.0,ix 已被弃用,赞成loc/ iloc

Try

df.loc[df.my_channel > 20000, 'my_channel'] = 0

Note: Since v0.20.0, ix has been deprecated in favour of loc / iloc.


回答 2

np.where 功能如下:

df['X'] = np.where(df['Y']>=50, 'yes', 'no')

在您的情况下,您需要:

import numpy as np
df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

np.where function works as follows:

df['X'] = np.where(df['Y']>=50, 'yes', 'no')

In your case you would want:

import numpy as np
df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

回答 3

原始数据框不更新的原因是,链接索引可能会导致您修改副本而不是数据框的视图。该文档提供了以下建议:

在熊猫对象中设置值时,必须注意避免所谓的链接索引。

您有几种选择:-

loc +布尔索引

loc 可以用于设置值并支持布尔掩码:

df.loc[df['my_channel'] > 20000, 'my_channel'] = 0

mask +布尔索引

您可以分配给您的系列:

df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)

或者,您可以就地更新系列:

df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)

np.where +布尔索引

可以通过分配当你的条件原系列使用NumPy的满足的; 但是,前两种解决方案更干净,因为它们仅显式更改指定的值。

df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])

The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:

When setting values in a pandas object, care must be taken to avoid what is called chained indexing.

You have a few alternatives:-

loc + Boolean indexing

loc may be used for setting values and supports Boolean masks:

df.loc[df['my_channel'] > 20000, 'my_channel'] = 0

mask + Boolean indexing

You can assign to your series:

df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)

Or you can update your series in place:

df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)

np.where + Boolean indexing

You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.

df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])

回答 4

我会用lambda一个函数SeriesDataFrame是这样的:

f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)

我没有断言这是一种有效的方法,但是效果很好。

I would use lambda function on a Series of a DataFrame like this:

f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)

I do not assert that this is an efficient way, but it works fine.


回答 5

试试这个:

df.my_channel = df.my_channel.where(df.my_channel <= 20000, other= 0)

要么

df.my_channel = df.my_channel.mask(df.my_channel > 20000, other= 0)

Try this:

df.my_channel = df.my_channel.where(df.my_channel <= 20000, other= 0)

or

df.my_channel = df.my_channel.mask(df.my_channel > 20000, other= 0)


pytorch中的模型摘要

问题:pytorch中的模型摘要

有什么办法,我可以像在Keras中的model.summary()方法那样在PyTorch中打印模型的摘要,如下所示?

Model Summary:
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_1 (InputLayer)             (None, 1, 15, 27)     0                                            
____________________________________________________________________________________________________
convolution2d_1 (Convolution2D)  (None, 8, 15, 27)     872         input_1[0][0]                    
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D)    (None, 8, 7, 27)      0           convolution2d_1[0][0]            
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 1512)          0           maxpooling2d_1[0][0]             
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 1)             1513        flatten_1[0][0]                  
====================================================================================================
Total params: 2,385
Trainable params: 2,385
Non-trainable params: 0

Is there any way, I can print the summary of a model in PyTorch like model.summary() method does in Keras as follows?

Model Summary:
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_1 (InputLayer)             (None, 1, 15, 27)     0                                            
____________________________________________________________________________________________________
convolution2d_1 (Convolution2D)  (None, 8, 15, 27)     872         input_1[0][0]                    
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D)    (None, 8, 7, 27)      0           convolution2d_1[0][0]            
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 1512)          0           maxpooling2d_1[0][0]             
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 1)             1513        flatten_1[0][0]                  
====================================================================================================
Total params: 2,385
Trainable params: 2,385
Non-trainable params: 0

回答 0

虽然您不会像Keras的模型那样获得关于模型的详细信息。

例如:

from torchvision import models
model = models.vgg16()
print(model)

在这种情况下,输出将如下所示:

VGG (
  (features): Sequential (
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU (inplace)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU (inplace)
    (4): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU (inplace)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU (inplace)
    (9): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU (inplace)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU (inplace)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU (inplace)
    (16): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU (inplace)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU (inplace)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU (inplace)
    (23): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU (inplace)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU (inplace)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU (inplace)
    (30): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
  )
  (classifier): Sequential (
    (0): Dropout (p = 0.5)
    (1): Linear (25088 -> 4096)
    (2): ReLU (inplace)
    (3): Dropout (p = 0.5)
    (4): Linear (4096 -> 4096)
    (5): ReLU (inplace)
    (6): Linear (4096 -> 1000)
  )
)

Kashyap所述,现在您可以使用该state_dict方法来获取不同图层的权重。但是,使用此层列表可能会提供更多指导,即创建一个辅助函数来获得类似模型摘要的Keras!希望这可以帮助!

While you will not get as detailed information about the model as in Keras’ model.summary, simply printing the model will give you some idea about the different layers involved and their specifications.

For instance:

from torchvision import models
model = models.vgg16()
print(model)

The output in this case would be something as follows:

VGG (
  (features): Sequential (
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU (inplace)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU (inplace)
    (4): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU (inplace)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU (inplace)
    (9): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU (inplace)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU (inplace)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU (inplace)
    (16): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU (inplace)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU (inplace)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU (inplace)
    (23): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU (inplace)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU (inplace)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU (inplace)
    (30): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
  )
  (classifier): Sequential (
    (0): Dropout (p = 0.5)
    (1): Linear (25088 -> 4096)
    (2): ReLU (inplace)
    (3): Dropout (p = 0.5)
    (4): Linear (4096 -> 4096)
    (5): ReLU (inplace)
    (6): Linear (4096 -> 1000)
  )
)

Now you could, as mentioned by Kashyap, use the state_dict method to get the weights of the different layers. But using this listing of the layers would perhaps provide more direction is creating a helper function to get that Keras like model summary! Hope this helps!


回答 1

是的,您可以使用pytorch-summary包获得准确的Keras表示形式。

VGG16的示例

from torchvision import models
from torchsummary import summary

vgg = models.vgg16()
summary(vgg, (3, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 224, 224]           1,792
              ReLU-2         [-1, 64, 224, 224]               0
            Conv2d-3         [-1, 64, 224, 224]          36,928
              ReLU-4         [-1, 64, 224, 224]               0
         MaxPool2d-5         [-1, 64, 112, 112]               0
            Conv2d-6        [-1, 128, 112, 112]          73,856
              ReLU-7        [-1, 128, 112, 112]               0
            Conv2d-8        [-1, 128, 112, 112]         147,584
              ReLU-9        [-1, 128, 112, 112]               0
        MaxPool2d-10          [-1, 128, 56, 56]               0
           Conv2d-11          [-1, 256, 56, 56]         295,168
             ReLU-12          [-1, 256, 56, 56]               0
           Conv2d-13          [-1, 256, 56, 56]         590,080
             ReLU-14          [-1, 256, 56, 56]               0
           Conv2d-15          [-1, 256, 56, 56]         590,080
             ReLU-16          [-1, 256, 56, 56]               0
        MaxPool2d-17          [-1, 256, 28, 28]               0
           Conv2d-18          [-1, 512, 28, 28]       1,180,160
             ReLU-19          [-1, 512, 28, 28]               0
           Conv2d-20          [-1, 512, 28, 28]       2,359,808
             ReLU-21          [-1, 512, 28, 28]               0
           Conv2d-22          [-1, 512, 28, 28]       2,359,808
             ReLU-23          [-1, 512, 28, 28]               0
        MaxPool2d-24          [-1, 512, 14, 14]               0
           Conv2d-25          [-1, 512, 14, 14]       2,359,808
             ReLU-26          [-1, 512, 14, 14]               0
           Conv2d-27          [-1, 512, 14, 14]       2,359,808
             ReLU-28          [-1, 512, 14, 14]               0
           Conv2d-29          [-1, 512, 14, 14]       2,359,808
             ReLU-30          [-1, 512, 14, 14]               0
        MaxPool2d-31            [-1, 512, 7, 7]               0
           Linear-32                 [-1, 4096]     102,764,544
             ReLU-33                 [-1, 4096]               0
          Dropout-34                 [-1, 4096]               0
           Linear-35                 [-1, 4096]      16,781,312
             ReLU-36                 [-1, 4096]               0
          Dropout-37                 [-1, 4096]               0
           Linear-38                 [-1, 1000]       4,097,000
================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 218.59
Params size (MB): 527.79
Estimated Total Size (MB): 746.96
----------------------------------------------------------------

Yes, you can get exact Keras representation, using pytorch-summary package.

Example for VGG16

from torchvision import models
from torchsummary import summary

vgg = models.vgg16()
summary(vgg, (3, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 224, 224]           1,792
              ReLU-2         [-1, 64, 224, 224]               0
            Conv2d-3         [-1, 64, 224, 224]          36,928
              ReLU-4         [-1, 64, 224, 224]               0
         MaxPool2d-5         [-1, 64, 112, 112]               0
            Conv2d-6        [-1, 128, 112, 112]          73,856
              ReLU-7        [-1, 128, 112, 112]               0
            Conv2d-8        [-1, 128, 112, 112]         147,584
              ReLU-9        [-1, 128, 112, 112]               0
        MaxPool2d-10          [-1, 128, 56, 56]               0
           Conv2d-11          [-1, 256, 56, 56]         295,168
             ReLU-12          [-1, 256, 56, 56]               0
           Conv2d-13          [-1, 256, 56, 56]         590,080
             ReLU-14          [-1, 256, 56, 56]               0
           Conv2d-15          [-1, 256, 56, 56]         590,080
             ReLU-16          [-1, 256, 56, 56]               0
        MaxPool2d-17          [-1, 256, 28, 28]               0
           Conv2d-18          [-1, 512, 28, 28]       1,180,160
             ReLU-19          [-1, 512, 28, 28]               0
           Conv2d-20          [-1, 512, 28, 28]       2,359,808
             ReLU-21          [-1, 512, 28, 28]               0
           Conv2d-22          [-1, 512, 28, 28]       2,359,808
             ReLU-23          [-1, 512, 28, 28]               0
        MaxPool2d-24          [-1, 512, 14, 14]               0
           Conv2d-25          [-1, 512, 14, 14]       2,359,808
             ReLU-26          [-1, 512, 14, 14]               0
           Conv2d-27          [-1, 512, 14, 14]       2,359,808
             ReLU-28          [-1, 512, 14, 14]               0
           Conv2d-29          [-1, 512, 14, 14]       2,359,808
             ReLU-30          [-1, 512, 14, 14]               0
        MaxPool2d-31            [-1, 512, 7, 7]               0
           Linear-32                 [-1, 4096]     102,764,544
             ReLU-33                 [-1, 4096]               0
          Dropout-34                 [-1, 4096]               0
           Linear-35                 [-1, 4096]      16,781,312
             ReLU-36                 [-1, 4096]               0
          Dropout-37                 [-1, 4096]               0
           Linear-38                 [-1, 1000]       4,097,000
================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 218.59
Params size (MB): 527.79
Estimated Total Size (MB): 746.96
----------------------------------------------------------------

回答 2

为了使用torchsummary类型:

from torchsummary import summary

如果没有,请先安装。

pip install torchsummary 

然后您可以尝试一下,但是请注意某些原因,除非我将模型设置为cuda,否则它将无法正常工作alexnet.cuda

from torchsummary import summary
help(summary)
import torchvision.models as models
alexnet = models.alexnet(pretrained=False)
alexnet.cuda()
summary(alexnet, (3, 224, 224))
print(alexnet)

summary必须输入尺寸和批量大小设置为-1任何批量大小,我们提供的意思。

如果设置,summary(alexnet, (3, 224, 224), 32)则使用bs=32

summary(model, input_size, batch_size=-1, device='cuda')

出:

Help on function summary in module torchsummary.torchsummary:

summary(model, input_size, batch_size=-1, device='cuda')

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [32, 64, 55, 55]          23,296
              ReLU-2           [32, 64, 55, 55]               0
         MaxPool2d-3           [32, 64, 27, 27]               0
            Conv2d-4          [32, 192, 27, 27]         307,392
              ReLU-5          [32, 192, 27, 27]               0
         MaxPool2d-6          [32, 192, 13, 13]               0
            Conv2d-7          [32, 384, 13, 13]         663,936
              ReLU-8          [32, 384, 13, 13]               0
            Conv2d-9          [32, 256, 13, 13]         884,992
             ReLU-10          [32, 256, 13, 13]               0
           Conv2d-11          [32, 256, 13, 13]         590,080
             ReLU-12          [32, 256, 13, 13]               0
        MaxPool2d-13            [32, 256, 6, 6]               0
AdaptiveAvgPool2d-14            [32, 256, 6, 6]               0
          Dropout-15                 [32, 9216]               0
           Linear-16                 [32, 4096]      37,752,832
             ReLU-17                 [32, 4096]               0
          Dropout-18                 [32, 4096]               0
           Linear-19                 [32, 4096]      16,781,312
             ReLU-20                 [32, 4096]               0
           Linear-21                 [32, 1000]       4,097,000
================================================================
Total params: 61,100,840
Trainable params: 61,100,840
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 18.38
Forward/backward pass size (MB): 268.12
Params size (MB): 233.08
Estimated Total Size (MB): 519.58
----------------------------------------------------------------
AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace)
    (3): Dropout(p=0.5)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

In order to use torchsummary type:

from torchsummary import summary

Install it first if you don’t have it.

pip install torchsummary 

And then you can try it, but note from some reason it is not working unless I set model to cuda alexnet.cuda:

from torchsummary import summary
help(summary)
import torchvision.models as models
alexnet = models.alexnet(pretrained=False)
alexnet.cuda()
summary(alexnet, (3, 224, 224))
print(alexnet)

The summary must take the input size and batch size is set to -1 meaning any batch size we provide.

If we set summary(alexnet, (3, 224, 224), 32) this means use the bs=32.

summary(model, input_size, batch_size=-1, device='cuda')

Out:

Help on function summary in module torchsummary.torchsummary:

summary(model, input_size, batch_size=-1, device='cuda')

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [32, 64, 55, 55]          23,296
              ReLU-2           [32, 64, 55, 55]               0
         MaxPool2d-3           [32, 64, 27, 27]               0
            Conv2d-4          [32, 192, 27, 27]         307,392
              ReLU-5          [32, 192, 27, 27]               0
         MaxPool2d-6          [32, 192, 13, 13]               0
            Conv2d-7          [32, 384, 13, 13]         663,936
              ReLU-8          [32, 384, 13, 13]               0
            Conv2d-9          [32, 256, 13, 13]         884,992
             ReLU-10          [32, 256, 13, 13]               0
           Conv2d-11          [32, 256, 13, 13]         590,080
             ReLU-12          [32, 256, 13, 13]               0
        MaxPool2d-13            [32, 256, 6, 6]               0
AdaptiveAvgPool2d-14            [32, 256, 6, 6]               0
          Dropout-15                 [32, 9216]               0
           Linear-16                 [32, 4096]      37,752,832
             ReLU-17                 [32, 4096]               0
          Dropout-18                 [32, 4096]               0
           Linear-19                 [32, 4096]      16,781,312
             ReLU-20                 [32, 4096]               0
           Linear-21                 [32, 1000]       4,097,000
================================================================
Total params: 61,100,840
Trainable params: 61,100,840
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 18.38
Forward/backward pass size (MB): 268.12
Params size (MB): 233.08
Estimated Total Size (MB): 519.58
----------------------------------------------------------------
AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace)
    (3): Dropout(p=0.5)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

回答 3

这将显示模型的权重和参数(但不显示输出形状)。

from torch.nn.modules.module import _addindent
import torch
import numpy as np
def torch_summarize(model, show_weights=True, show_parameters=True):
    """Summarizes torch model by showing trainable parameters and weights."""
    tmpstr = model.__class__.__name__ + ' (\n'
    for key, module in model._modules.items():
        # if it contains layers let call it recursively to get params and weights
        if type(module) in [
            torch.nn.modules.container.Container,
            torch.nn.modules.container.Sequential
        ]:
            modstr = torch_summarize(module)
        else:
            modstr = module.__repr__()
        modstr = _addindent(modstr, 2)

        params = sum([np.prod(p.size()) for p in module.parameters()])
        weights = tuple([tuple(p.size()) for p in module.parameters()])

        tmpstr += '  (' + key + '): ' + modstr 
        if show_weights:
            tmpstr += ', weights={}'.format(weights)
        if show_parameters:
            tmpstr +=  ', parameters={}'.format(params)
        tmpstr += '\n'   

    tmpstr = tmpstr + ')'
    return tmpstr

# Test
import torchvision.models as models
model = models.alexnet()
print(torch_summarize(model))

# # Output
# AlexNet (
#   (features): Sequential (
#     (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2)), weights=((64, 3, 11, 11), (64,)), parameters=23296
#     (1): ReLU (inplace), weights=(), parameters=0
#     (2): MaxPool2d (size=(3, 3), stride=(2, 2), dilation=(1, 1)), weights=(), parameters=0
#     (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)), weights=((192, 64, 5, 5), (192,)), parameters=307392
#     (4): ReLU (inplace), weights=(), parameters=0
#     (5): MaxPool2d (size=(3, 3), stride=(2, 2), dilation=(1, 1)), weights=(), parameters=0
#     (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), weights=((384, 192, 3, 3), (384,)), parameters=663936
#     (7): ReLU (inplace), weights=(), parameters=0
#     (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), weights=((256, 384, 3, 3), (256,)), parameters=884992
#     (9): ReLU (inplace), weights=(), parameters=0
#     (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), weights=((256, 256, 3, 3), (256,)), parameters=590080
#     (11): ReLU (inplace), weights=(), parameters=0
#     (12): MaxPool2d (size=(3, 3), stride=(2, 2), dilation=(1, 1)), weights=(), parameters=0
#   ), weights=((64, 3, 11, 11), (64,), (192, 64, 5, 5), (192,), (384, 192, 3, 3), (384,), (256, 384, 3, 3), (256,), (256, 256, 3, 3), (256,)), parameters=2469696
#   (classifier): Sequential (
#     (0): Dropout (p = 0.5), weights=(), parameters=0
#     (1): Linear (9216 -> 4096), weights=((4096, 9216), (4096,)), parameters=37752832
#     (2): ReLU (inplace), weights=(), parameters=0
#     (3): Dropout (p = 0.5), weights=(), parameters=0
#     (4): Linear (4096 -> 4096), weights=((4096, 4096), (4096,)), parameters=16781312
#     (5): ReLU (inplace), weights=(), parameters=0
#     (6): Linear (4096 -> 1000), weights=((1000, 4096), (1000,)), parameters=4097000
#   ), weights=((4096, 9216), (4096,), (4096, 4096), (4096,), (1000, 4096), (1000,)), parameters=58631144
# )

编辑:isaykatsman有一个pytorch PR来添加一个model.summary()完全类似于keras https://github.com/pytorch/pytorch/pull/3043/files的文件

This will show a model’s weights and parameters (but not output shape).

from torch.nn.modules.module import _addindent
import torch
import numpy as np
def torch_summarize(model, show_weights=True, show_parameters=True):
    """Summarizes torch model by showing trainable parameters and weights."""
    tmpstr = model.__class__.__name__ + ' (\n'
    for key, module in model._modules.items():
        # if it contains layers let call it recursively to get params and weights
        if type(module) in [
            torch.nn.modules.container.Container,
            torch.nn.modules.container.Sequential
        ]:
            modstr = torch_summarize(module)
        else:
            modstr = module.__repr__()
        modstr = _addindent(modstr, 2)

        params = sum([np.prod(p.size()) for p in module.parameters()])
        weights = tuple([tuple(p.size()) for p in module.parameters()])

        tmpstr += '  (' + key + '): ' + modstr 
        if show_weights:
            tmpstr += ', weights={}'.format(weights)
        if show_parameters:
            tmpstr +=  ', parameters={}'.format(params)
        tmpstr += '\n'   

    tmpstr = tmpstr + ')'
    return tmpstr

# Test
import torchvision.models as models
model = models.alexnet()
print(torch_summarize(model))

# # Output
# AlexNet (
#   (features): Sequential (
#     (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2)), weights=((64, 3, 11, 11), (64,)), parameters=23296
#     (1): ReLU (inplace), weights=(), parameters=0
#     (2): MaxPool2d (size=(3, 3), stride=(2, 2), dilation=(1, 1)), weights=(), parameters=0
#     (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)), weights=((192, 64, 5, 5), (192,)), parameters=307392
#     (4): ReLU (inplace), weights=(), parameters=0
#     (5): MaxPool2d (size=(3, 3), stride=(2, 2), dilation=(1, 1)), weights=(), parameters=0
#     (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), weights=((384, 192, 3, 3), (384,)), parameters=663936
#     (7): ReLU (inplace), weights=(), parameters=0
#     (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), weights=((256, 384, 3, 3), (256,)), parameters=884992
#     (9): ReLU (inplace), weights=(), parameters=0
#     (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), weights=((256, 256, 3, 3), (256,)), parameters=590080
#     (11): ReLU (inplace), weights=(), parameters=0
#     (12): MaxPool2d (size=(3, 3), stride=(2, 2), dilation=(1, 1)), weights=(), parameters=0
#   ), weights=((64, 3, 11, 11), (64,), (192, 64, 5, 5), (192,), (384, 192, 3, 3), (384,), (256, 384, 3, 3), (256,), (256, 256, 3, 3), (256,)), parameters=2469696
#   (classifier): Sequential (
#     (0): Dropout (p = 0.5), weights=(), parameters=0
#     (1): Linear (9216 -> 4096), weights=((4096, 9216), (4096,)), parameters=37752832
#     (2): ReLU (inplace), weights=(), parameters=0
#     (3): Dropout (p = 0.5), weights=(), parameters=0
#     (4): Linear (4096 -> 4096), weights=((4096, 4096), (4096,)), parameters=16781312
#     (5): ReLU (inplace), weights=(), parameters=0
#     (6): Linear (4096 -> 1000), weights=((1000, 4096), (1000,)), parameters=4097000
#   ), weights=((4096, 9216), (4096,), (4096, 4096), (4096,), (1000, 4096), (1000,)), parameters=58631144
# )

Edit: isaykatsman has a pytorch PR to add a model.summary() that is exactly like keras https://github.com/pytorch/pytorch/pull/3043/files


回答 4

最容易记住的(不像Keras那样漂亮):

print(model)

这也可以:

repr(model)

如果只需要参数数量:

sum([param.nelement() for param in model.parameters()])

来自:是否有与keras类似的pytorch函数与model.summary()?(论坛.PyTorch.org)

Simplest to remember (not as pretty as Keras):

print(model)

This also work:

repr(model)

If you just want the number of parameters:

sum([param.nelement() for param in model.parameters()])

From: Is there similar pytorch function as model.summary() as keras? (forum.PyTorch.org)


回答 5

您可以使用

from torchsummary import summary

您可以指定设备

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

您可以创建一个网络,如果您使用的是MNIST数据集,则以下命令将起作用并向您显示摘要

model = Network().to(device)
summary(model,(1,28,28))

You can use

from torchsummary import summary

You can specify device

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

You can create a Network, and if you are using MNIST datasets, then following commands will work and show you summary

model = Network().to(device)
summary(model,(1,28,28))

回答 6

AFAK没有pytorch中的等效model.summary()

同时,您可以引用szagoruyko的脚本,它提供了一个很好的可视化效果,如resnet18-example

干杯

AFAK there is no model.summary() like equivalent in pytorch

Meanwhile you can refer script by szagoruyko, which gives a nice visualizaton like in resnet18-example

Cheers


回答 7

在为模型类定义对象后,只需打印模型

class RNN(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim):
        super().__init__()

        self.embedding = nn.Embedding(input_dim, embedding_dim)
        self.rnn = nn.RNN(embedding_dim, hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)
    def forward():
        ...

model = RNN(input_dim, embedding_dim, hidden_dim, output_dim)
print(model)

Simply print the model after defining an object for the model class

class RNN(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim):
        super().__init__()

        self.embedding = nn.Embedding(input_dim, embedding_dim)
        self.rnn = nn.RNN(embedding_dim, hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)
    def forward():
        ...

model = RNN(input_dim, embedding_dim, hidden_dim, output_dim)
print(model)

回答 8

您可以使用x.shape,以测量张量的x尺寸

You can just use x.shape, in order to measure tensor’s x dimensions


回答 9

为了可视化和总结PyTorch模型,也可以使用tensorboardX

For visualization and summary of PyTorch models, tensorboardX can also can be utilized.


socket.shutdown与socket.close

问题:socket.shutdown与socket.close

我最近看到了一些看起来像这样的代码(袜子当然是套接字对象):

sock.shutdown(socket.SHUT_RDWR)
sock.close()

在套接字上调用shutdown然后关闭它的目的是什么?如果有所不同,则此套接字用于非阻塞IO。

I recently saw a bit of code that looked like this (with sock being a socket object of course):

sock.shutdown(socket.SHUT_RDWR)
sock.close()

What exactly is the purpose of calling shutdown on the socket and then closing it? If it makes a difference, this socket is being used for non-blocking IO.


回答 0

这是一个解释

一旦不再需要套接字,调用程序就可以通过对套接字描述符应用close子例程来丢弃该套接字。如果在关闭时可靠的传输套接字具有与之关联的数据,则系统将继续尝试进行数据传输。但是,如果仍未交付数据,则系统将丢弃该数据。如果应用程序不使用任何暂挂数据,则可以在关闭套接字之前使用套接字上的shutdown子例程。

Here’s one explanation:

Once a socket is no longer required, the calling program can discard the socket by applying a close subroutine to the socket descriptor. If a reliable delivery socket has data associated with it when a close takes place, the system continues to attempt data transfer. However, if the data is still undelivered, the system discards the data. Should the application program have no use for any pending data, it can use the shutdown subroutine on the socket prior to closing it.


回答 1

调用closeshutdown对基础套接字有两种不同的影响。

首先要指出的是,套接字是基础操作系统中的资源,并且多个进程可以具有同一基础套接字的句柄。

您打电话的时候 close它时,将句柄计数减一,如果句柄计数达到零,则套接字和关联的连接将通过正常的关闭过程(有效地将FIN / EOF发送到对等方)来释放套接字。

这里要注意的是,如果句柄计数没有达到零,因为另一个进程仍然具有套接字的句柄,则连接不会关闭并且套接字不会被释放。

另一方面,调用shutdown读写会关闭基础连接,并向对等方发送FIN / EOF,而不管套接字有多少个进程。但是,它不会取消分配套接字,您仍然需要在事后调用close。

Calling close and shutdown have two different effects on the underlying socket.

The first thing to point out is that the socket is a resource in the underlying OS and multiple processes can have a handle for the same underlying socket.

When you call close it decrements the handle count by one and if the handle count has reached zero then the socket and associated connection goes through the normal close procedure (effectively sending a FIN / EOF to the peer) and the socket is deallocated.

The thing to pay attention to here is that if the handle count does not reach zero because another process still has a handle to the socket then the connection is not closed and the socket is not deallocated.

On the other hand calling shutdown for reading and writing closes the underlying connection and sends a FIN / EOF to the peer regardless of how many processes have handles to the socket. However, it does not deallocate the socket and you still need to call close afterward.


回答 2

关闭和关闭的说明:正常关闭(msdn)

关机(针对您的情况)表示连接的另一端不再有读写套接字的意图。然后关闭释放与套接字关联的所有内存。

忽略关闭可能会导致套接字在操作系统堆栈中徘徊,直到正常关闭连接为止。

在国际海事组织中,“关闭”和“关闭”这两个名称具有误导性,“关闭”和“破坏”将强调它们之间的差异。

Explanation of shutdown and close: Graceful shutdown (msdn)

Shutdown (in your case) indicates to the other end of the connection there is no further intention to read from or write to the socket. Then close frees up any memory associated with the socket.

Omitting shutdown may cause the socket to linger in the OSs stack until the connection has been closed gracefully.

IMO the names ‘shutdown’ and ‘close’ are misleading, ‘close’ and ‘destroy’ would emphasise their differences.


回答 3

在Socket Programming HOWTO(py2 / py3)中已经提到了

断开连接

严格来说,应该先shutdown在套接字上使用close它。该shutdown是在另一端的咨询到插座。根据您传递的参数,它可能表示“ 我不再发送了,但我仍会听 ”,或“ 我不在听,很好的摆脱!”。但是,大多数套接字库都习惯于程序员忽略使用此礼节,通常a close与相同shutdown(); close()。因此,在大多数情况下,不需要显式关闭。

it’s mentioned right in the Socket Programming HOWTO (py2/py3)

Disconnecting

Strictly speaking, you’re supposed to use shutdown on a socket before you close it. The shutdown is an advisory to the socket at the other end. Depending on the argument you pass it, it can mean “I’m not going to send anymore, but I’ll still listen”, or “I’m not listening, good riddance!”. Most socket libraries, however, are so used to programmers neglecting to use this piece of etiquette that normally a close is the same as shutdown(); close(). So in most situations, an explicit shutdown is not needed.


回答 4

上面的代码难道不是错误的吗?

在shutdown调用之后直接执行close调用可能会使内核无论如何都丢弃所有传出缓冲区。

根据 http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable, 需要在关机和关机之间等待关闭,直到读取返回0。

Isn’t this code above wrong?

The close call directly after the shutdown call might make the kernel discard all outgoing buffers anyway.

According to http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable one needs to wait between the shutdown and the close until read returns 0.


回答 5

有一些关闭的方式:http : //msdn.microsoft.com/en-us/library/system.net.sockets.socket.shutdown.aspx。* nix是相似的。


回答 6

Shutdown(1),强制套接字no发送更多数据

这在

1-缓冲液冲洗

2-奇怪的错误检测

3-安全防护

让我解释更多,当您将数据从A发送到B时,不保证将其发送到B,仅保证将其发送到A os缓冲区,然后缓冲区又将其发送到B os缓冲区。

因此,通过在A上调用shutdown(1),您将刷新A的缓冲区,如果缓冲区不为空,则会引发错误,即:尚未将数据发送到对等方

但是,这是不可挽回的,因此您可以在完全发送完所有数据之后,并确保至少在对等os缓冲区中执行此操作

Shutdown(1) , forces the socket no to send any more data

This is usefull in

1- Buffer flushing

2- Strange error detection

3- Safe guarding

Let me explain more , when you send a data from A to B , it’s not guaranteed to be sent to B , it’s only guaranteed to be sent to the A os buffer , which in turn sends it to the B os buffer

So by calling shutdown(1) on A , you flush A’s buffer and an error is raised if the buffer is not empty ie: data has not been sent to the peer yet

Howoever this is irrevesable , so you can do that after you completely sent all your data and you want to be sure that it’s atleast at the peer os buffer


为什么要在python中通过字符串声明unicode?

问题:为什么要在python中通过字符串声明unicode?

我仍在学习python,我对此表示怀疑:

在python 2.6.x中,我通常像这样在文件头中声明编码(如在PEP 0263中

# -*- coding: utf-8 -*-

之后,我的字符串照常编写:

a = "A normal string without declared Unicode"

但是每次我看到python项目代码时,都不会在标头中声明编码。而是在每个这样的字符串处声明它:

a = u"A string with declared Unicode"

有什么不同?目的是什么?我知道Python 2.6.x默认设置了ASCII编码,但是它可以被标头声明覆盖,那么每个字符串声明的意义是什么?

附录:似乎我将文件编码和字符串编码混为一谈了。感谢您的解释:)

I’m still learning python and I have a doubt:

In python 2.6.x I usually declare encoding in the file header like this (as in PEP 0263)

# -*- coding: utf-8 -*-

After that, my strings are written as usual:

a = "A normal string without declared Unicode"

But everytime I see a python project code, the encoding is not declared at the header. Instead, it is declared at every string like this:

a = u"A string with declared Unicode"

What’s the difference? What’s the purpose of this? I know Python 2.6.x sets ASCII encoding by default, but it can be overriden by the header declaration, so what’s the point of per string declaration?

Addendum: Seems that I’ve mixed up file encoding with string encoding. Thanks for explaining it :)


回答 0

正如其他人所提到的,这是两件事。

指定时# -*- coding: utf-8 -*-,就是告诉Python保存的源文件是utf-8。Python 2的默认值为ASCII(Python 3的默认值为utf-8)。这只会影响解释器读取文件中字符的方式。

通常,无论编码是什么,将高unicode字符嵌入文件中可能都不是最好的主意。您可以使用字符串unicode转义,这两种编码都可以使用。


当您在字符串的u前面声明一个字符串(如)时u'This is a string',它会告诉Python编译器该字符串是Unicode而不是字节。这大部分由解释器透明地处理。最明显的区别是您现在可以在字符串中嵌入unicode字符(即u'\u2665'现在合法)。您可以使用from __future__ import unicode_literals使其成为默认值。

这仅适用于Python 2;在Python 3中,默认值为Unicode,您需要b在前面指定a (例如b'These are bytes',以声明字节序列)。

Those are two different things, as others have mentioned.

When you specify # -*- coding: utf-8 -*-, you’re telling Python the source file you’ve saved is utf-8. The default for Python 2 is ASCII (for Python 3 it’s utf-8). This just affects how the interpreter reads the characters in the file.

In general, it’s probably not the best idea to embed high unicode characters into your file no matter what the encoding is; you can use string unicode escapes, which work in either encoding.


When you declare a string with a u in front, like u'This is a string', it tells the Python compiler that the string is Unicode, not bytes. This is handled mostly transparently by the interpreter; the most obvious difference is that you can now embed unicode characters in the string (that is, u'\u2665' is now legal). You can use from __future__ import unicode_literals to make it the default.

This only applies to Python 2; in Python 3 the default is Unicode, and you need to specify a b in front (like b'These are bytes', to declare a sequence of bytes).


回答 1

就像其他人所说的,# coding:指定保存源文件的编码。这是一些示例来说明这一点:

作为cp437(我的控制台编码)保存在磁盘上的文件,但未声明编码

b = 'über'
u = u'über'
print b,repr(b)
print u,repr(u)

输出:

  File "C:\ex.py", line 1
SyntaxError: Non-ASCII character '\x81' in file C:\ex.py on line 1, but no
encoding declared; see http://www.python.org/peps/pep-0263.html for details

带有以下内容的文件输出# coding: cp437

über '\x81ber'
über u'\xfcber'

刚开始,Python不知道编码,并抱怨非ASCII字符。一旦知道了编码,字节字符串就会获取磁盘上实际存在的字节。对于Unicode字符串,Python读取\ x81,知道在cp437中是ü,并将其解码为ü的Unicode代码点,即U + 00FC。打印字节字符串时,Python将十六进制值81直接发送到控制台。当印刷Unicode字符串,Python的正确检测我的控制台的编码作为CP437和翻译的Unicode ü为CP437值ü

这是在UTF-8中声明并保存的文件发生的情况:

├╝ber '\xc3\xbcber'
über u'\xfcber'

在UTF-8中,ü编码为十六进制字节C3 BC,因此字节字符串包含这些字节,但是Unicode字符串与第一个示例相同。Python读取了两个字节并将其正确解码。Python错误地打印了字节字符串,因为它直接将代表ü的两个UTF-8字节发送到了我的cp437控制台。

在这里,该文件被声明为cp437,但保存在UTF-8中:

├╝ber '\xc3\xbcber'
├╝ber u'\u251c\u255dber'

字节字符串仍然在磁盘上获得了字节(UTF-8十六进制字节C3 BC),但是将它们解释为两个cp437字符,而不是单个UTF-8编码的字符。转换为Unicode代码点的那两个字符,所有内容打印不正确。

As others have said, # coding: specifies the encoding the source file is saved in. Here are some examples to illustrate this:

A file saved on disk as cp437 (my console encoding), but no encoding declared

b = 'über'
u = u'über'
print b,repr(b)
print u,repr(u)

Output:

  File "C:\ex.py", line 1
SyntaxError: Non-ASCII character '\x81' in file C:\ex.py on line 1, but no
encoding declared; see http://www.python.org/peps/pep-0263.html for details

Output of file with # coding: cp437 added:

über '\x81ber'
über u'\xfcber'

At first, Python didn’t know the encoding and complained about the non-ASCII character. Once it knew the encoding, the byte string got the bytes that were actually on disk. For the Unicode string, Python read \x81, knew that in cp437 that was a ü, and decoded it into the Unicode codepoint for ü which is U+00FC. When the byte string was printed, Python sent the hex value 81 to the console directly. When the Unicode string was printed, Python correctly detected my console encoding as cp437 and translated Unicode ü to the cp437 value for ü.

Here’s what happens with a file declared and saved in UTF-8:

├╝ber '\xc3\xbcber'
über u'\xfcber'

In UTF-8, ü is encoded as the hex bytes C3 BC, so the byte string contains those bytes, but the Unicode string is identical to the first example. Python read the two bytes and decoded it correctly. Python printed the byte string incorrectly, because it sent the two UTF-8 bytes representing ü directly to my cp437 console.

Here the file is declared cp437, but saved in UTF-8:

├╝ber '\xc3\xbcber'
├╝ber u'\u251c\u255dber'

The byte string still got the bytes on disk (UTF-8 hex bytes C3 BC), but interpreted them as two cp437 characters instead of a single UTF-8-encoded character. Those two characters where translated to Unicode code points, and everything prints incorrectly.


回答 2

那没有设置字符串的格式。它设置文件的格式。即使具有该标头,它"hello"还是一个字节字符串,而不是Unicode字符串。要使其成为Unicode,您将不得不在u"hello"任何地方使用它。标头只是在读取.py文件时使用哪种格式的提示。

That doesn’t set the format of the string; it sets the format of the file. Even with that header, "hello" is a byte string, not a Unicode string. To make it Unicode, you’re going to have to use u"hello" everywhere. The header is just a hint of what format to use when reading the .py file.


回答 3

标头定义是定义代码本身的编码,而不是运行时的结果字符串。

在不带utf-8标头定义的python脚本中放置诸如۲之类的非ascii字符将引发警告

错误

The header definition is to define the encoding of the code itself, not the resulting strings at runtime.

putting a non-ascii character like ۲ in the python script without the utf-8 header definition will throw a warning

error


回答 4

我制作了以下名为unicoder的模块,以便能够对变量进行转换:

import sys
import os

def ustr(string):

    string = 'u"%s"'%string

    with open('_unicoder.py', 'w') as script:

        script.write('# -*- coding: utf-8 -*-\n')
        script.write('_ustr = %s'%string)

    import _unicoder
    value = _unicoder._ustr

    del _unicoder
    del sys.modules['_unicoder']

    os.system('del _unicoder.py')
    os.system('del _unicoder.pyc')

    return value

然后,您可以在程序中执行以下操作:

# -*- coding: utf-8 -*-

from unicoder import ustr

txt = 'Hello, Unicode World'
txt = ustr(txt)

print type(txt) # <type 'unicode'>

I made the following module called unicoder to be able to do the transformation on variables:

import sys
import os

def ustr(string):

    string = 'u"%s"'%string

    with open('_unicoder.py', 'w') as script:

        script.write('# -*- coding: utf-8 -*-\n')
        script.write('_ustr = %s'%string)

    import _unicoder
    value = _unicoder._ustr

    del _unicoder
    del sys.modules['_unicoder']

    os.system('del _unicoder.py')
    os.system('del _unicoder.pyc')

    return value

Then in your program you could do the following:

# -*- coding: utf-8 -*-

from unicoder import ustr

txt = 'Hello, Unicode World'
txt = ustr(txt)

print type(txt) # <type 'unicode'>

统计信息:Python中的组合

问题:统计信息:Python中的组合

我需要计算在Python combinatorials(NCR),但无法找到的功能做在mathnumpystat 图书馆。类似于函数的类型:

comb = calculate_combinations(n, r)

我需要可能的组合数量,而不是实际组合,因此itertools.combinations我对此并不感兴趣。

最后,我要避免使用阶乘,因为我将要计算其组合的数字可能会太大,并且阶乘将变得非常可怕。

这似乎是一个非常容易回答的问题,但是我被有关生成所有实际组合的问题淹没了,这不是我想要的。

I need to compute combinatorials (nCr) in Python but cannot find the function to do that in math, numpy or stat libraries. Something like a function of the type:

comb = calculate_combinations(n, r)

I need the number of possible combinations, not the actual combinations, so itertools.combinations does not interest me.

Finally, I want to avoid using factorials, as the numbers I’ll be calculating the combinations for can get too big and the factorials are going to be monstrous.

This seems like a REALLY easy to answer question, however I am being drowned in questions about generating all the actual combinations, which is not what I want.


回答 0

请参阅scipy.special.comb(旧版本的scipy中的scipy.misc.comb)。当exact为False时,它使用伽马函数来获得良好的精度而无需花费很多时间。在确切的情况下,它返回一个任意精度的整数,这可能需要很长时间才能计算出来。

See scipy.special.comb (scipy.misc.comb in older versions of scipy). When exact is False, it uses the gammaln function to obtain good precision without taking much time. In the exact case it returns an arbitrary-precision integer, which might take a long time to compute.


回答 1

为什么不自己写呢?这是一线之类的:

from operator import mul    # or mul=lambda x,y:x*y
from fractions import Fraction

def nCk(n,k): 
  return int( reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1) )

测试-打印Pascal的三角形:

>>> for n in range(17):
...     print ' '.join('%5d'%nCk(n,k) for k in range(n+1)).center(100)
...     
                                                   1                                                
                                                1     1                                             
                                             1     2     1                                          
                                          1     3     3     1                                       
                                       1     4     6     4     1                                    
                                    1     5    10    10     5     1                                 
                                 1     6    15    20    15     6     1                              
                              1     7    21    35    35    21     7     1                           
                           1     8    28    56    70    56    28     8     1                        
                        1     9    36    84   126   126    84    36     9     1                     
                     1    10    45   120   210   252   210   120    45    10     1                  
                  1    11    55   165   330   462   462   330   165    55    11     1               
               1    12    66   220   495   792   924   792   495   220    66    12     1            
            1    13    78   286   715  1287  1716  1716  1287   715   286    78    13     1         
         1    14    91   364  1001  2002  3003  3432  3003  2002  1001   364    91    14     1      
      1    15   105   455  1365  3003  5005  6435  6435  5005  3003  1365   455   105    15     1   
    1    16   120   560  1820  4368  8008 11440 12870 11440  8008  4368  1820   560   120    16     1
>>> 

PS。编辑以替换int(round(reduce(mul, (float(n-i)/(i+1) for i in range(k)), 1)))int(reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1))因此对于大N / K不会出错

Why not write it yourself? It’s a one-liner or such:

from operator import mul    # or mul=lambda x,y:x*y
from fractions import Fraction

def nCk(n,k): 
  return int( reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1) )

Test – printing Pascal’s triangle:

>>> for n in range(17):
...     print ' '.join('%5d'%nCk(n,k) for k in range(n+1)).center(100)
...     
                                                   1                                                
                                                1     1                                             
                                             1     2     1                                          
                                          1     3     3     1                                       
                                       1     4     6     4     1                                    
                                    1     5    10    10     5     1                                 
                                 1     6    15    20    15     6     1                              
                              1     7    21    35    35    21     7     1                           
                           1     8    28    56    70    56    28     8     1                        
                        1     9    36    84   126   126    84    36     9     1                     
                     1    10    45   120   210   252   210   120    45    10     1                  
                  1    11    55   165   330   462   462   330   165    55    11     1               
               1    12    66   220   495   792   924   792   495   220    66    12     1            
            1    13    78   286   715  1287  1716  1716  1287   715   286    78    13     1         
         1    14    91   364  1001  2002  3003  3432  3003  2002  1001   364    91    14     1      
      1    15   105   455  1365  3003  5005  6435  6435  5005  3003  1365   455   105    15     1   
    1    16   120   560  1820  4368  8008 11440 12870 11440  8008  4368  1820   560   120    16     1
>>> 

PS. edited to replace int(round(reduce(mul, (float(n-i)/(i+1) for i in range(k)), 1))) with int(reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1)) so it won’t err for big N/K


回答 2

在Google代码上快速搜索给出了(它使用了@Mark Byers的答案中的公式):

def choose(n, k):
    """
    A fast way to calculate binomial coefficients by Andrew Dalke (contrib).
    """
    if 0 <= k <= n:
        ntok = 1
        ktok = 1
        for t in xrange(1, min(k, n - k) + 1):
            ntok *= n
            ktok *= t
            n -= 1
        return ntok // ktok
    else:
        return 0

choose()scipy.misc.comb()您需要确切答案快10倍(在所有0 <=(n,k)<1e3对上测试)。

def comb(N,k): # from scipy.comb(), but MODIFIED!
    if (k > N) or (N < 0) or (k < 0):
        return 0L
    N,k = map(long,(N,k))
    top = N
    val = 1L
    while (top > (N-k)):
        val *= top
        top -= 1
    n = 1L
    while (n < k+1L):
        val /= n
        n += 1
    return val

A quick search on google code gives (it uses formula from @Mark Byers’s answer):

def choose(n, k):
    """
    A fast way to calculate binomial coefficients by Andrew Dalke (contrib).
    """
    if 0 <= k <= n:
        ntok = 1
        ktok = 1
        for t in xrange(1, min(k, n - k) + 1):
            ntok *= n
            ktok *= t
            n -= 1
        return ntok // ktok
    else:
        return 0

choose() is 10 times faster (tested on all 0 <= (n,k) < 1e3 pairs) than scipy.misc.comb() if you need an exact answer.

def comb(N,k): # from scipy.comb(), but MODIFIED!
    if (k > N) or (N < 0) or (k < 0):
        return 0L
    N,k = map(long,(N,k))
    top = N
    val = 1L
    while (top > (N-k)):
        val *= top
        top -= 1
    n = 1L
    while (n < k+1L):
        val /= n
        n += 1
    return val

回答 3

如果您想要确切的结果速度,请尝试gmpygmpy.comb应该完全按照您的要求进行操作,而且速度非常快(当然,作为gmpy的原始作者,我偏见;-)。

If you want exact results and speed, try gmpygmpy.comb should do exactly what you ask for, and it’s pretty fast (of course, as gmpy‘s original author, I am biased;-).


回答 4

如果您想要精确的结果,请使用sympy.binomial。看来这是最快的方法。

x = 1000000
y = 234050

%timeit scipy.misc.comb(x, y, exact=True)
1 loops, best of 3: 1min 27s per loop

%timeit gmpy.comb(x, y)
1 loops, best of 3: 1.97 s per loop

%timeit int(sympy.binomial(x, y))
100000 loops, best of 3: 5.06 µs per loop

If you want an exact result, use sympy.binomial. It seems to be the fastest method, hands down.

x = 1000000
y = 234050

%timeit scipy.misc.comb(x, y, exact=True)
1 loops, best of 3: 1min 27s per loop

%timeit gmpy.comb(x, y)
1 loops, best of 3: 1.97 s per loop

%timeit int(sympy.binomial(x, y))
100000 loops, best of 3: 5.06 µs per loop

回答 5

在许多情况下,数学定义的字面翻译是足够的(记住Python将自动使用大数算法):

from math import factorial

def calculate_combinations(n, r):
    return factorial(n) // factorial(r) // factorial(n-r)

对于我测试的某些输入(例如n = 1000 r = 500),这比reduce另一种(目前投票率最高)答案中建议的一种衬板的速度快10倍以上。另一方面,@ JF Sebastian提供的代码片段的性能优于。

A literal translation of the mathematical definition is quite adequate in a lot of cases (remembering that Python will automatically use big number arithmetic):

from math import factorial

def calculate_combinations(n, r):
    return factorial(n) // factorial(r) // factorial(n-r)

For some inputs I tested (e.g. n=1000 r=500) this was more than 10 times faster than the one liner reduce suggested in another (currently highest voted) answer. On the other hand, it is out-performed by the snippit provided by @J.F. Sebastian.


回答 6

从开始Python 3.8,标准库现在包括math.comb用于计算二项式系数的函数:

math.comb(n,k)

这是从n个项中不重复选择k个项的方法的数量
n! / (k! (n - k)!)

import math
math.comb(10, 5) # 252

Starting Python 3.8, the standard library now includes the math.comb function to compute the binomial coefficient:

math.comb(n, k)

which is the number of ways to choose k items from n items without repetition
n! / (k! (n - k)!):

import math
math.comb(10, 5) # 252

回答 7

这是另一种选择。该代码最初是用C ++编写的,因此可以将其反向移植到C ++以获取有限精度的整数(例如__int64)。优点是(1)它仅涉及整数运算,(2)通过执行连续的乘法和除法对,避免了膨胀整数值。我已经用Nas Banov的Pascal三角形测试了结果,它得到了正确的答案:

def choose(n,r):
  """Computes n! / (r! (n-r)!) exactly. Returns a python long int."""
  assert n >= 0
  assert 0 <= r <= n

  c = 1L
  denom = 1
  for (num,denom) in zip(xrange(n,n-r,-1), xrange(1,r+1,1)):
    c = (c * num) // denom
  return c

基本原理:为了最小化乘法和除法的数量,我们将表达式重写为

    n!      n(n-1)...(n-r+1)
--------- = ----------------
 r!(n-r)!          r!

为了尽可能避免乘法溢出,我们将按照以下STRICT顺序从左到右进行评估:

n / 1 * (n-1) / 2 * (n-2) / 3 * ... * (n-r+1) / r

我们可以证明按此顺序运算的整数算术是精确的(即无舍入误差)。

Here’s another alternative. This one was originally written in C++, so it can be backported to C++ for a finite-precision integer (e.g. __int64). The advantage is (1) it involves only integer operations, and (2) it avoids bloating the integer value by doing successive pairs of multiplication and division. I’ve tested the result with Nas Banov’s Pascal triangle, it gets the correct answer:

def choose(n,r):
  """Computes n! / (r! (n-r)!) exactly. Returns a python long int."""
  assert n >= 0
  assert 0 <= r <= n

  c = 1L
  denom = 1
  for (num,denom) in zip(xrange(n,n-r,-1), xrange(1,r+1,1)):
    c = (c * num) // denom
  return c

Rationale: To minimize the # of multiplications and divisions, we rewrite the expression as

    n!      n(n-1)...(n-r+1)
--------- = ----------------
 r!(n-r)!          r!

To avoid multiplication overflow as much as possible, we will evaluate in the following STRICT order, from left to right:

n / 1 * (n-1) / 2 * (n-2) / 3 * ... * (n-r+1) / r

We can show that integer arithmatic operated in this order is exact (i.e. no roundoff error).


回答 8

使用动态编程,时间复杂度为Θ(n * m),空间复杂度为Θ(m):

def binomial(n, k):
""" (int, int) -> int

         | c(n-1, k-1) + c(n-1, k), if 0 < k < n
c(n,k) = | 1                      , if n = k
         | 1                      , if k = 0

Precondition: n > k

>>> binomial(9, 2)
36
"""

c = [0] * (n + 1)
c[0] = 1
for i in range(1, n + 1):
    c[i] = 1
    j = i - 1
    while j > 0:
        c[j] += c[j - 1]
        j -= 1

return c[k]

Using dynamic programming, the time complexity is Θ(n*m) and space complexity Θ(m):

def binomial(n, k):
""" (int, int) -> int

         | c(n-1, k-1) + c(n-1, k), if 0 < k < n
c(n,k) = | 1                      , if n = k
         | 1                      , if k = 0

Precondition: n > k

>>> binomial(9, 2)
36
"""

c = [0] * (n + 1)
c[0] = 1
for i in range(1, n + 1):
    c[i] = 1
    j = i - 1
    while j > 0:
        c[j] += c[j - 1]
        j -= 1

return c[k]

回答 9

如果您的程序有上限n(例如n <= N),并且需要重复计算nCr(最好是>> N次),则使用lru_cache可以极大地提高性能:

from functools import lru_cache

@lru_cache(maxsize=None)
def nCr(n, r):
    return 1 if r == 0 or r == n else nCr(n - 1, r - 1) + nCr(n - 1, r)

构造缓存(隐式完成)需要花费O(N^2)时间。随后的所有对的调用都nCr将返回O(1)

If your program has an upper bound to n (say n <= N) and needs to repeatedly compute nCr (preferably for >>N times), using lru_cache can give you a huge performance boost:

from functools import lru_cache

@lru_cache(maxsize=None)
def nCr(n, r):
    return 1 if r == 0 or r == n else nCr(n - 1, r - 1) + nCr(n - 1, r)

Constructing the cache (which is done implicitly) takes up to O(N^2) time. Any subsequent calls to nCr will return in O(1).


回答 10

您可以编写2个简单的函数,实际上比使用scipy.special.comb快5到8倍。实际上,您不需要导入任何额外的程序包,并且该函数非常易于阅读。诀窍是使用备忘录存储先前计算的值,并使用nCr的定义

# create a memoization dictionary
memo = {}
def factorial(n):
    """
    Calculate the factorial of an input using memoization
    :param n: int
    :rtype value: int
    """
    if n in [1,0]:
        return 1
    if n in memo:
        return memo[n]
    value = n*factorial(n-1)
    memo[n] = value
    return value

def ncr(n, k):
    """
    Choose k elements from a set of n elements - n must be larger than or equal to k
    :param n: int
    :param k: int
    :rtype: int
    """
    return factorial(n)/(factorial(k)*factorial(n-k))

如果我们比较时间

from scipy.special import comb
%timeit comb(100,48)
>>> 100000 loops, best of 3: 6.78 µs per loop

%timeit ncr(100,48)
>>> 1000000 loops, best of 3: 1.39 µs per loop

You can write 2 simple functions that actually turns out to be about 5-8 times faster than using scipy.special.comb. In fact, you don’t need to import any extra packages, and the function is quite easily readable. The trick is to use memoization to store previously computed values, and using the definition of nCr

# create a memoization dictionary
memo = {}
def factorial(n):
    """
    Calculate the factorial of an input using memoization
    :param n: int
    :rtype value: int
    """
    if n in [1,0]:
        return 1
    if n in memo:
        return memo[n]
    value = n*factorial(n-1)
    memo[n] = value
    return value

def ncr(n, k):
    """
    Choose k elements from a set of n elements - n must be larger than or equal to k
    :param n: int
    :param k: int
    :rtype: int
    """
    return factorial(n)/(factorial(k)*factorial(n-k))

If we compare times

from scipy.special import comb
%timeit comb(100,48)
>>> 100000 loops, best of 3: 6.78 µs per loop

%timeit ncr(100,48)
>>> 1000000 loops, best of 3: 1.39 µs per loop

回答 11

使用sympy很容易。

import sympy

comb = sympy.binomial(n, r)

It’s pretty easy with sympy.

import sympy

comb = sympy.binomial(n, r)

回答 12

仅使用随Python分发的标准库

import itertools

def nCk(n, k):
    return len(list(itertools.combinations(range(n), k)))

Using only standard library distributed with Python:

import itertools

def nCk(n, k):
    return len(list(itertools.combinations(range(n), k)))

回答 13

当n大于20时,直接公式会产生大整数。

因此,另一个回应是:

from math import factorial

reduce(long.__mul__, range(n-r+1, n+1), 1L) // factorial(r)

简短,准确和高效,因为它通过坚持使用long避免了python大整数。

与scipy.special.comb相比,它更准确,更快捷:

 >>> from scipy.special import comb
 >>> nCr = lambda n,r: reduce(long.__mul__, range(n-r+1, n+1), 1L) // factorial(r)
 >>> comb(128,20)
 1.1965669823265365e+23
 >>> nCr(128,20)
 119656698232656998274400L  # accurate, no loss
 >>> from timeit import timeit
 >>> timeit(lambda: comb(n,r))
 8.231969118118286
 >>> timeit(lambda: nCr(128, 20))
 3.885951042175293

The direct formula produces big integers when n is bigger than 20.

So, yet another response:

from math import factorial

reduce(long.__mul__, range(n-r+1, n+1), 1L) // factorial(r)

short, accurate and efficient because this avoids python big integers by sticking with longs.

It is more accurate and faster when comparing to scipy.special.comb:

 >>> from scipy.special import comb
 >>> nCr = lambda n,r: reduce(long.__mul__, range(n-r+1, n+1), 1L) // factorial(r)
 >>> comb(128,20)
 1.1965669823265365e+23
 >>> nCr(128,20)
 119656698232656998274400L  # accurate, no loss
 >>> from timeit import timeit
 >>> timeit(lambda: comb(n,r))
 8.231969118118286
 >>> timeit(lambda: nCr(128, 20))
 3.885951042175293

回答 14

这是使用内置备忘录修饰器的@ killerT2333代码。

from functools import lru_cache

@lru_cache()
def factorial(n):
    """
    Calculate the factorial of an input using memoization
    :param n: int
    :rtype value: int
    """
    return 1 if n in (1, 0) else n * factorial(n-1)

@lru_cache()
def ncr(n, k):
    """
    Choose k elements from a set of n elements,
    n must be greater than or equal to k.
    :param n: int
    :param k: int
    :rtype: int
    """
    return factorial(n) / (factorial(k) * factorial(n - k))

print(ncr(6, 3))

This is @killerT2333 code using the builtin memoization decorator.

from functools import lru_cache

@lru_cache()
def factorial(n):
    """
    Calculate the factorial of an input using memoization
    :param n: int
    :rtype value: int
    """
    return 1 if n in (1, 0) else n * factorial(n-1)

@lru_cache()
def ncr(n, k):
    """
    Choose k elements from a set of n elements,
    n must be greater than or equal to k.
    :param n: int
    :param k: int
    :rtype: int
    """
    return factorial(n) / (factorial(k) * factorial(n - k))

print(ncr(6, 3))

回答 15

这是为您提供的高效算法

for i = 1.....r

   p = p * ( n - i ) / i

print(p)

例如nCr(30,7)= fact(30)/(fact(7)* fact(23))=(30 * 29 * 28 * 27 * 26 * 25 * 24)/(1 * 2 * 3 * 4 * 5 * 6 * 7)

因此,只需从1到r运行循环即可获得结果。

Here is an efficient algorithm for you

for i = 1.....r

   p = p * ( n - i ) / i

print(p)

For example nCr(30,7) = fact(30) / ( fact(7) * fact(23)) = ( 30 * 29 * 28 * 27 * 26 * 25 * 24 ) / (1 * 2 * 3 * 4 * 5 * 6 * 7)

So just run the loop from 1 to r can get the result.


回答 16

对于相当大的输入,这可能与在纯python中完成的速度一样快:

def choose(n, k):
    if k == n: return 1
    if k > n: return 0
    d, q = max(k, n-k), min(k, n-k)
    num =  1
    for n in xrange(d+1, n+1): num *= n
    denom = 1
    for d in xrange(1, q+1): denom *= d
    return num / denom

That’s probably as fast as you can do it in pure python for reasonably large inputs:

def choose(n, k):
    if k == n: return 1
    if k > n: return 0
    d, q = max(k, n-k), min(k, n-k)
    num =  1
    for n in xrange(d+1, n+1): num *= n
    denom = 1
    for d in xrange(1, q+1): denom *= d
    return num / denom

回答 17

此功能非常优化。

def nCk(n,k):
    m=0
    if k==0:
        m=1
    if k==1:
        m=n
    if k>=2:
        num,dem,op1,op2=1,1,k,n
        while(op1>=1):
            num*=op2
            dem*=op1
            op1-=1
            op2-=1
        m=num//dem
    return m

This function is very optimazed.

def nCk(n,k):
    m=0
    if k==0:
        m=1
    if k==1:
        m=n
    if k>=2:
        num,dem,op1,op2=1,1,k,n
        while(op1>=1):
            num*=op2
            dem*=op1
            op1-=1
            op2-=1
        m=num//dem
    return m

如何构造包含Cython代码的Python包

问题:如何构造包含Cython代码的Python包

我想制作一个包含一些Cython代码的Python包。我的Cython代码运行良好。但是,现在我想知道如何最好地打包它。

对于大多数只想安装软件包的人,我想包括.cCython创建的文件,并安排对其setup.py进行编译以生成模块。然后,用户不需要安装Cython即可安装软件包。

但是对于那些可能想要修改程序包的人,我也想提供Cython .pyx文件,并以某种方式还允许setup.py使用Cython构建它们(因此这些用户需要安装Cython)。

我应该如何构造软件包中的文件以适应这两种情况?

用Cython文档提供了一些指导。但这并没有说明如何制作一个setup.py处理Cython情况的单例。

I’d like to make a Python package containing some Cython code. I’ve got the the Cython code working nicely. However, now I want to know how best to package it.

For most people who just want to install the package, I’d like to include the .c file that Cython creates, and arrange for setup.py to compile that to produce the module. Then the user doesn’t need Cython installed in order to install the package.

But for people who may want to modify the package, I’d also like to provide the Cython .pyx files, and somehow also allow for setup.py to build them using Cython (so those users would need Cython installed).

How should I structure the files in the package to cater for both these scenarios?

The Cython documentation gives a little guidance. But it doesn’t say how to make a single setup.py that handles both the with/without Cython cases.


回答 0

我现在已经在Python程序包simplerandomBitBucket repo-编辑:now github)中亲自完成了这个任务(我不希望这是一个受欢迎的程序包,但这是学习Cython的好机会)。

此方法依赖于以下事实:.pyx使用Cython.Distutils.build_ext(至少使用Cython版本0.14)构建文件似乎总是.c在与源.pyx文件相同的目录中创建文件。

这是一个精简版setup.py,我希望其中显示要点:

from distutils.core import setup
from distutils.extension import Extension

try:
    from Cython.Distutils import build_ext
except ImportError:
    use_cython = False
else:
    use_cython = True

cmdclass = {}
ext_modules = []

if use_cython:
    ext_modules += [
        Extension("mypackage.mycythonmodule", ["cython/mycythonmodule.pyx"]),
    ]
    cmdclass.update({'build_ext': build_ext})
else:
    ext_modules += [
        Extension("mypackage.mycythonmodule", ["cython/mycythonmodule.c"]),
    ]

setup(
    name='mypackage',
    ...
    cmdclass=cmdclass,
    ext_modules=ext_modules,
    ...
)

我还进行了编辑,MANIFEST.in以确保将mycythonmodule.c其包含在源分发中(使用创建的源分发python setup.py sdist):

...
recursive-include cython *
...

我不承诺mycythonmodule.c版本控制“ trunk”(或Mercurial的“ default”)。发布时,我需要记住先进行操作python setup.py build_ext,以确保mycythonmodule.c该源代码是最新的并且是最新的。我还创建了一个release分支,并将C文件提交到该分支中。这样,我就拥有与该发行版一起分发的C文件的历史记录。

I’ve done this myself now, in a Python package simplerandom (BitBucket repo – EDIT: now github) (I don’t expect this to be a popular package, but it was a good chance to learn Cython).

This method relies on the fact that building a .pyx file with Cython.Distutils.build_ext (at least with Cython version 0.14) always seems to create a .c file in the same directory as the source .pyx file.

Here is a cut-down version of setup.py which I hope shows the essentials:

from distutils.core import setup
from distutils.extension import Extension

try:
    from Cython.Distutils import build_ext
except ImportError:
    use_cython = False
else:
    use_cython = True

cmdclass = {}
ext_modules = []

if use_cython:
    ext_modules += [
        Extension("mypackage.mycythonmodule", ["cython/mycythonmodule.pyx"]),
    ]
    cmdclass.update({'build_ext': build_ext})
else:
    ext_modules += [
        Extension("mypackage.mycythonmodule", ["cython/mycythonmodule.c"]),
    ]

setup(
    name='mypackage',
    ...
    cmdclass=cmdclass,
    ext_modules=ext_modules,
    ...
)

I also edited MANIFEST.in to ensure that mycythonmodule.c is included in a source distribution (a source distribution that is created with python setup.py sdist):

...
recursive-include cython *
...

I don’t commit mycythonmodule.c to version control ‘trunk’ (or ‘default’ for Mercurial). When I make a release, I need to remember to do a python setup.py build_ext first, to ensure that mycythonmodule.c is present and up-to-date for the source code distribution. I also make a release branch, and commit the C file into the branch. That way I have a historical record of the C file that was distributed with that release.


回答 1

克雷格·麦昆(Craig McQueen)的答案有所添加:请参见下文,了解如何覆盖sdist命令以使Cython在创建源代码分发之前自动编译源文件。

这样一来,您就可以避免意外分发过期C资源的风险。在您对分发过程的控制有限的情况下(例如,通过持续集成自动创建分发时),这也很有帮助。

from distutils.command.sdist import sdist as _sdist

...

class sdist(_sdist):
    def run(self):
        # Make sure the compiled Cython files in the distribution are up-to-date
        from Cython.Build import cythonize
        cythonize(['cython/mycythonmodule.pyx'])
        _sdist.run(self)
cmdclass['sdist'] = sdist

Adding to Craig McQueen’s answer: see below for how to override the sdist command to have Cython automatically compile your source files before creating a source distribution.

That way your run no risk of accidentally distributing outdated C sources. It also helps in the case where you have limited control over the distribution process e.g. when automatically creating distributions from continuous integration etc.

from distutils.command.sdist import sdist as _sdist

...

class sdist(_sdist):
    def run(self):
        # Make sure the compiled Cython files in the distribution are up-to-date
        from Cython.Build import cythonize
        cythonize(['cython/mycythonmodule.pyx'])
        _sdist.run(self)
cmdclass['sdist'] = sdist

回答 2

http://docs.cython.org/en/latest/src/userguide/source_files_and_compilation.html#distributing-cython-modules

强烈建议您分发生成的.c文件以及Cython源,以便用户无需使用Cython即可安装模块。

还建议您分发的版本中默认不启用Cython编译。即使用户安装了Cython,他也可能不想仅使用它来安装模块。另外,他使用的版本可能与您使用的版本不同,并且可能无法正确编译您的源代码。

这只是意味着您附带的setup.py文件将只是生成的.c文件上的常规distutils文件,对于基本示例,我们将使用:

from distutils.core import setup
from distutils.extension import Extension
 
setup(
    ext_modules = [Extension("example", ["example.c"])]
)

http://docs.cython.org/en/latest/src/userguide/source_files_and_compilation.html#distributing-cython-modules

It is strongly recommended that you distribute the generated .c files as well as your Cython sources, so that users can install your module without needing to have Cython available.

It is also recommended that Cython compilation not be enabled by default in the version you distribute. Even if the user has Cython installed, he probably doesn’t want to use it just to install your module. Also, the version he has may not be the same one you used, and may not compile your sources correctly.

This simply means that the setup.py file that you ship with will just be a normal distutils file on the generated .c files, for the basic example we would have instead:

from distutils.core import setup
from distutils.extension import Extension
 
setup(
    ext_modules = [Extension("example", ["example.c"])]
)

回答 3

最简单的方法是同时包含两者,而仅使用c文件?包括.pyx文件是不错的选择,但是无论如何只要有了.c文件就不需要了。想要重新编译.pyx的人可以安装Pyrex并手动进行。

否则,您需要有一个用于distutils的自定义build_ext命令,该命令首先生成C文件。Cython已经包含一个。http://docs.cython.org/src/userguide/source_files_and_compilation.html

该文档没有做的是说如何使其成为条件,但是

try:
     from Cython.distutils import build_ext
except ImportError:
     from distutils.command import build_ext

应该处理。

The easiest is to include both but just use the c-file? Including the .pyx file is nice, but it’s not needed once you have the .c file anyway. People who want to recompile the .pyx can install Pyrex and do it manually.

Otherwise you need to have a custom build_ext command for distutils that builds the C file first. Cython already includes one. http://docs.cython.org/src/userguide/source_files_and_compilation.html

What that documentation doesn’t do is say how to make this conditional, but

try:
     from Cython.distutils import build_ext
except ImportError:
     from distutils.command import build_ext

Should handle it.


回答 4

包含(Cython)生成的.c文件非常奇怪。尤其是当我们在git中包含它时。我更喜欢使用setuptools_cython。当Cython不可用时,它将构建一个具有内置Cython环境的鸡蛋,然后使用该鸡蛋构建代码。

一个可能的示例:https : //github.com/douban/greenify/blob/master/setup.py


更新(2017-01-05):

因为setuptools 18.0,没有必要使用setuptools_cython是一个从头开始构建Cython项目而无需的示例setuptools_cython

Including (Cython) generated .c files are pretty weird. Especially when we include that in git. I’d prefer to use setuptools_cython. When Cython is not available, it will build an egg which has built-in Cython environment, and then build your code using the egg.

A possible example: https://github.com/douban/greenify/blob/master/setup.py


Update(2017-01-05):

Since setuptools 18.0, there’s no need to use setuptools_cython. Here is an example to build Cython project from scratch without setuptools_cython.


回答 5

这是我编写的安装脚本,它使在构建中包括嵌套目录更加容易。需要从一个程序包中的文件夹运行它。

Givig结构如下:

__init__.py
setup.py
test.py
subdir/
      __init__.py
      anothertest.py

setup.py

from setuptools import setup, Extension
from Cython.Distutils import build_ext
# from os import path
ext_names = (
    'test',
    'subdir.anothertest',       
) 

cmdclass = {'build_ext': build_ext}
# for modules in main dir      
ext_modules = [
    Extension(
        ext,
        [ext + ".py"],            
    ) 
    for ext in ext_names if ext.find('.') < 0] 
# for modules in subdir ONLY ONE LEVEL DOWN!! 
# modify it if you need more !!!
ext_modules += [
    Extension(
        ext,
        ["/".join(ext.split('.')) + ".py"],     
    )
    for ext in ext_names if ext.find('.') > 0]

setup(
    name='name',
    ext_modules=ext_modules,
    cmdclass=cmdclass,
    packages=["base", "base.subdir"],
)
#  Build --------------------------
#  python setup.py build_ext --inplace

编译愉快;)

This is a setup script I wrote which makes it easier to include nested directories inside the build. One needs to run it from folder within a package.

Givig structure like this:

__init__.py
setup.py
test.py
subdir/
      __init__.py
      anothertest.py

setup.py

from setuptools import setup, Extension
from Cython.Distutils import build_ext
# from os import path
ext_names = (
    'test',
    'subdir.anothertest',       
) 

cmdclass = {'build_ext': build_ext}
# for modules in main dir      
ext_modules = [
    Extension(
        ext,
        [ext + ".py"],            
    ) 
    for ext in ext_names if ext.find('.') < 0] 
# for modules in subdir ONLY ONE LEVEL DOWN!! 
# modify it if you need more !!!
ext_modules += [
    Extension(
        ext,
        ["/".join(ext.split('.')) + ".py"],     
    )
    for ext in ext_names if ext.find('.') > 0]

setup(
    name='name',
    ext_modules=ext_modules,
    cmdclass=cmdclass,
    packages=["base", "base.subdir"],
)
#  Build --------------------------
#  python setup.py build_ext --inplace

Happy compiling ;)


回答 6

我想到的简单技巧:

from distutils.core import setup

try:
    from Cython.Build import cythonize
except ImportError:
    from pip import pip

    pip.main(['install', 'cython'])

    from Cython.Build import cythonize


setup(…)

如果无法导入,只需安装Cython。一个人可能不应该共享此代码,但是对于我自己的依赖关系来说已经足够了。

The simple hack I came up with:

from distutils.core import setup

try:
    from Cython.Build import cythonize
except ImportError:
    from pip import pip

    pip.main(['install', 'cython'])

    from Cython.Build import cythonize


setup(…)

Just install Cython if it could not be imported. One should probably not share this code, but for my own dependencies it’s good enough.


回答 7

所有其他答案都依赖

  • 发行版
  • 从导入Cython.Build,在通过cython setup_requires导入和导入cython之间会产生鸡与蛋的问题。

一种现代的解决方案是改用setuptools,请参见以下答案(自动处理Cython扩展需要setuptools 18.0,即,它已经可用了很多年)。setup.py具有需求处理,入口点和cython模块的现代标准可能如下所示:

from setuptools import setup, Extension

with open('requirements.txt') as f:
    requirements = f.read().splitlines()

setup(
    name='MyPackage',
    install_requires=requirements,
    setup_requires=[
        'setuptools>=18.0',  # automatically handles Cython extensions
        'cython>=0.28.4',
    ],
    entry_points={
        'console_scripts': [
            'mymain = mypackage.main:main',
        ],
    },
    ext_modules=[
        Extension(
            'mypackage.my_cython_module',
            sources=['mypackage/my_cython_module.pyx'],
        ),
    ],
)

All other answers either rely on

  • distutils
  • importing from Cython.Build, which creates a chicken-and-egg problem between requiring cython via setup_requires and importing it.

A modern solution is to use setuptools instead, see this answer (automatic handling of Cython extensions requires setuptools 18.0, i.e., it’s available for many years already). A modern standard setup.py with requirements handling, an entry point, and a cython module could look like this:

from setuptools import setup, Extension

with open('requirements.txt') as f:
    requirements = f.read().splitlines()

setup(
    name='MyPackage',
    install_requires=requirements,
    setup_requires=[
        'setuptools>=18.0',  # automatically handles Cython extensions
        'cython>=0.28.4',
    ],
    entry_points={
        'console_scripts': [
            'mymain = mypackage.main:main',
        ],
    },
    ext_modules=[
        Extension(
            'mypackage.my_cython_module',
            sources=['mypackage/my_cython_module.pyx'],
        ),
    ],
)

回答 8

我发现仅使用setuptools而非功能受限的distutils的最简单方法是

from setuptools import setup
from setuptools.extension import Extension
try:
    from Cython.Build import cythonize
except ImportError:
    use_cython = False
else:
    use_cython = True

ext_modules = []
if use_cython:
    ext_modules += cythonize('package/cython_module.pyx')
else:
    ext_modules += [Extension('package.cython_module',
                              ['package/cython_modules.c'])]

setup(name='package_name', ext_modules=ext_modules)

The easiest way I found using only setuptools instead of the feature limited distutils is

from setuptools import setup
from setuptools.extension import Extension
try:
    from Cython.Build import cythonize
except ImportError:
    use_cython = False
else:
    use_cython = True

ext_modules = []
if use_cython:
    ext_modules += cythonize('package/cython_module.pyx')
else:
    ext_modules += [Extension('package.cython_module',
                              ['package/cython_modules.c'])]

setup(name='package_name', ext_modules=ext_modules)

回答 9

我想我通过提供自定义build_ext命令找到了一种很好的方法。这个想法如下:

  1. 我通过重写finalize_options()import numpy在函数的主体中添加numpy标头,很好地避免了numpy在setup()安装之前不可用的问题。

  2. 如果cython在系统上可用,它将挂接到命令的check_extensions_list()方法中,并通过cython化所有过时的cython模块,将其替换为C扩展,稍后可通过该build_extension() 方法处理。我们也只是在模块中提供功能的后一部分:这意味着,如果cython不可用,但是我们有C扩展名,它仍然可以工作,从而可以进行源代码分发。

这是代码:

import re, sys, os.path
from distutils import dep_util, log
from setuptools.command.build_ext import build_ext

try:
    import Cython.Build
    HAVE_CYTHON = True
except ImportError:
    HAVE_CYTHON = False

class BuildExtWithNumpy(build_ext):
    def check_cython(self, ext):
        c_sources = []
        for fname in ext.sources:
            cname, matches = re.subn(r"(?i)\.pyx$", ".c", fname, 1)
            c_sources.append(cname)
            if matches and dep_util.newer(fname, cname):
                if HAVE_CYTHON:
                    return ext
                raise RuntimeError("Cython and C module unavailable")
        ext.sources = c_sources
        return ext

    def check_extensions_list(self, extensions):
        extensions = [self.check_cython(ext) for ext in extensions]
        return build_ext.check_extensions_list(self, extensions)

    def finalize_options(self):
        import numpy as np
        build_ext.finalize_options(self)
        self.include_dirs.append(np.get_include())

这样一来,人们就可以编写setup()参数而不必担心导入以及是否有可用的cython的问题:

setup(
    # ...
    ext_modules=[Extension("_my_fast_thing", ["src/_my_fast_thing.pyx"])],
    setup_requires=['numpy'],
    cmdclass={'build_ext': BuildExtWithNumpy}
    )

I think I found a pretty good way of doing this by providing a custom build_ext command. The idea is the following:

  1. I add the numpy headers by overriding finalize_options() and doing import numpy in the body of the function, which nicely avoids the problem of numpy not being available before setup() installs it.

  2. If cython is available on the system, it hooks into the command’s check_extensions_list() method and by cythonizes all out-of-date cython modules, replacing them with C extensions that can later handled by the build_extension() method. We just provide the latter part of the functionality in our module too: this means that if cython is not available but we have a C extension present, it still works, which allows you to do source distributions.

Here’s the code:

import re, sys, os.path
from distutils import dep_util, log
from setuptools.command.build_ext import build_ext

try:
    import Cython.Build
    HAVE_CYTHON = True
except ImportError:
    HAVE_CYTHON = False

class BuildExtWithNumpy(build_ext):
    def check_cython(self, ext):
        c_sources = []
        for fname in ext.sources:
            cname, matches = re.subn(r"(?i)\.pyx$", ".c", fname, 1)
            c_sources.append(cname)
            if matches and dep_util.newer(fname, cname):
                if HAVE_CYTHON:
                    return ext
                raise RuntimeError("Cython and C module unavailable")
        ext.sources = c_sources
        return ext

    def check_extensions_list(self, extensions):
        extensions = [self.check_cython(ext) for ext in extensions]
        return build_ext.check_extensions_list(self, extensions)

    def finalize_options(self):
        import numpy as np
        build_ext.finalize_options(self)
        self.include_dirs.append(np.get_include())

This allows one to just write the setup() arguments without worrying about imports and whether one has cython available:

setup(
    # ...
    ext_modules=[Extension("_my_fast_thing", ["src/_my_fast_thing.pyx"])],
    setup_requires=['numpy'],
    cmdclass={'build_ext': BuildExtWithNumpy}
    )

从字符串列表的元素中删除结尾的换行符

问题:从字符串列表的元素中删除结尾的换行符

我必须采用以下形式的大量单词:

['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']

然后使用strip功能,将其转换为:

['this', 'is', 'a', 'list', 'of', 'words']

我以为我写的东西行得通,但是我不断收到错误消息:

“’list’对象没有属性’strip’”

这是我尝试过的代码:

strip_list = []
for lengths in range(1,20):
    strip_list.append(0) #longest word in the text file is 20 characters long
for a in lines:
    strip_list.append(lines[a].strip())

I have to take a large list of words in the form:

['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']

and then using the strip function, turn it into:

['this', 'is', 'a', 'list', 'of', 'words']

I thought that what I had written would work, but I keep getting an error saying:

“‘list’ object has no attribute ‘strip'”

Here is the code that I tried:

strip_list = []
for lengths in range(1,20):
    strip_list.append(0) #longest word in the text file is 20 characters long
for a in lines:
    strip_list.append(lines[a].strip())

回答 0

>>> my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
>>> map(str.strip, my_list)
['this', 'is', 'a', 'list', 'of', 'words']
>>> my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
>>> map(str.strip, my_list)
['this', 'is', 'a', 'list', 'of', 'words']

回答 1

清单理解力? [x.strip() for x in lst]

list comprehension? [x.strip() for x in lst]


回答 2

您可以使用列表推导

strip_list = [item.strip() for item in lines]

map功能:

# with a lambda
strip_list = map(lambda it: it.strip(), lines)

# without a lambda
strip_list = map(str.strip, lines)

You can use lists comprehensions:

strip_list = [item.strip() for item in lines]

Or the map function:

# with a lambda
strip_list = map(lambda it: it.strip(), lines)

# without a lambda
strip_list = map(str.strip, lines)

回答 3

这可以使用PEP 202中定义的列表理解来完成

[w.strip() for w in  ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']]

This can be done using list comprehensions as defined in PEP 202

[w.strip() for w in  ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']]

回答 4

所有其他答案,主要是关于列表理解的,都很棒。但是只是为了解释您的错误:

strip_list = []
for lengths in range(1,20):
    strip_list.append(0) #longest word in the text file is 20 characters long
for a in lines:
    strip_list.append(lines[a].strip())

a是您列表的成员,而不是索引。您可以这样写:

[...]
for a in lines:
    strip_list.append(a.strip())

另一个重要的评论:您可以通过以下方式创建一个空列表:

strip_list = [0] * 20

但这不是那么有用,因为可以.append 内容追加到列表中。在您的情况下,创建带有默认值的列表是没有用的,因为在附加剥离字符串时,将逐项构建该列表。

因此,您的代码应类似于:

strip_list = []
for a in lines:
    strip_list.append(a.strip())

但是,可以肯定的是,最好的选择就是这个,因为这是完全一样的:

stripped = [line.strip() for line in lines]

如果您遇到的不仅仅是a复杂的事情.strip,请将其放在函数中并执行相同的操作。这是使用列表最易读的方式。

All other answers, and mainly about list comprehension, are great. But just to explain your error:

strip_list = []
for lengths in range(1,20):
    strip_list.append(0) #longest word in the text file is 20 characters long
for a in lines:
    strip_list.append(lines[a].strip())

a is a member of your list, not an index. What you could write is this:

[...]
for a in lines:
    strip_list.append(a.strip())

Another important comment: you can create an empty list this way:

strip_list = [0] * 20

But this is not so useful, as .append appends stuff to your list. In your case, it’s not useful to create a list with defaut values, as you’ll build it item per item when appending stripped strings.

So your code should be like:

strip_list = []
for a in lines:
    strip_list.append(a.strip())

But, for sure, the best one is this one, as this is exactly the same thing:

stripped = [line.strip() for line in lines]

In case you have something more complicated than just a .strip, put this in a function, and do the same. That’s the most readable way to work with lists.


回答 5

如果您只需要删除结尾的空格,则可以使用str.rstrip(),它的效率应比str.strip()

>>> lst = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
>>> [x.rstrip() for x in lst]
['this', 'is', 'a', 'list', 'of', 'words']
>>> list(map(str.rstrip, lst))
['this', 'is', 'a', 'list', 'of', 'words']

If you need to remove just trailing whitespace, you could use str.rstrip(), which should be slightly more efficient than str.strip():

>>> lst = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
>>> [x.rstrip() for x in lst]
['this', 'is', 'a', 'list', 'of', 'words']
>>> list(map(str.rstrip, lst))
['this', 'is', 'a', 'list', 'of', 'words']

回答 6

my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
print([l.strip() for l in my_list])

输出:

['this', 'is', 'a', 'list', 'of', 'words']
my_list = ['this\n', 'is\n', 'a\n', 'list\n', 'of\n', 'words\n']
print([l.strip() for l in my_list])

Output:

['this', 'is', 'a', 'list', 'of', 'words']

熊猫groupby:如何获得字符串的并集

问题:熊猫groupby:如何获得字符串的并集

我有一个这样的数据框:

   A         B       C
0  1  0.749065    This
1  2  0.301084      is
2  3  0.463468       a
3  4  0.643961  random
4  1  0.866521  string
5  2  0.120737       !

呼唤

In [10]: print df.groupby("A")["B"].sum()

将返回

A
1    1.615586
2    0.421821
3    0.463468
4    0.643961

现在,我想对列“ C”执行“相同”操作。因为该列包含字符串,所以sum()不起作用(尽管您可能认为它将字符串连接在一起)。我真正想看到的是每个组的字符串列表或一组字符串,即

A
1    {This, string}
2    {is, !}
3    {a}
4    {random}

我一直在尝试找到方法来做到这一点。

尽管Series.unique()(http://pandas.pydata.org/pandas-docs/stable/genic/pandas.Series.unique.html)无效,但是

df.groupby("A")["B"]

是一个

pandas.core.groupby.SeriesGroupBy object

所以我希望任何Series方法都可以。有任何想法吗?

I have a dataframe like this:

   A         B       C
0  1  0.749065    This
1  2  0.301084      is
2  3  0.463468       a
3  4  0.643961  random
4  1  0.866521  string
5  2  0.120737       !

Calling

In [10]: print df.groupby("A")["B"].sum()

will return

A
1    1.615586
2    0.421821
3    0.463468
4    0.643961

Now I would like to do “the same” for column “C”. Because that column contains strings, sum() doesn’t work (although you might think that it would concatenate the strings). What I would really like to see is a list or set of the strings for each group, i.e.

A
1    {This, string}
2    {is, !}
3    {a}
4    {random}

I have been trying to find ways to do this.

Series.unique() (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unique.html) doesn’t work, although

df.groupby("A")["B"]

is a

pandas.core.groupby.SeriesGroupBy object

so I was hoping any Series method would work. Any ideas?


回答 0

In [4]: df = read_csv(StringIO(data),sep='\s+')

In [5]: df
Out[5]: 
   A         B       C
0  1  0.749065    This
1  2  0.301084      is
2  3  0.463468       a
3  4  0.643961  random
4  1  0.866521  string
5  2  0.120737       !

In [6]: df.dtypes
Out[6]: 
A      int64
B    float64
C     object
dtype: object

当您应用自己的功能时,不会自动排除非数字列。这会慢一些,但比应用.sum()groupby

In [8]: df.groupby('A').apply(lambda x: x.sum())
Out[8]: 
   A         B           C
A                         
1  2  1.615586  Thisstring
2  4  0.421821         is!
3  3  0.463468           a
4  4  0.643961      random

sum 默认情况下串联

In [9]: df.groupby('A')['C'].apply(lambda x: x.sum())
Out[9]: 
A
1    Thisstring
2           is!
3             a
4        random
dtype: object

你几乎可以做你想做的

In [11]: df.groupby('A')['C'].apply(lambda x: "{%s}" % ', '.join(x))
Out[11]: 
A
1    {This, string}
2           {is, !}
3               {a}
4          {random}
dtype: object

在整个框架上进行一次,一次一组。关键是要返回一个Series

def f(x):
     return Series(dict(A = x['A'].sum(), 
                        B = x['B'].sum(), 
                        C = "{%s}" % ', '.join(x['C'])))

In [14]: df.groupby('A').apply(f)
Out[14]: 
   A         B               C
A                             
1  2  1.615586  {This, string}
2  4  0.421821         {is, !}
3  3  0.463468             {a}
4  4  0.643961        {random}
In [4]: df = read_csv(StringIO(data),sep='\s+')

In [5]: df
Out[5]: 
   A         B       C
0  1  0.749065    This
1  2  0.301084      is
2  3  0.463468       a
3  4  0.643961  random
4  1  0.866521  string
5  2  0.120737       !

In [6]: df.dtypes
Out[6]: 
A      int64
B    float64
C     object
dtype: object

When you apply your own function, there is not automatic exclusions of non-numeric columns. This is slower, though, than the application of .sum() to the groupby

In [8]: df.groupby('A').apply(lambda x: x.sum())
Out[8]: 
   A         B           C
A                         
1  2  1.615586  Thisstring
2  4  0.421821         is!
3  3  0.463468           a
4  4  0.643961      random

sum by default concatenates

In [9]: df.groupby('A')['C'].apply(lambda x: x.sum())
Out[9]: 
A
1    Thisstring
2           is!
3             a
4        random
dtype: object

You can do pretty much what you want

In [11]: df.groupby('A')['C'].apply(lambda x: "{%s}" % ', '.join(x))
Out[11]: 
A
1    {This, string}
2           {is, !}
3               {a}
4          {random}
dtype: object

Doing this on a whole frame, one group at a time. Key is to return a Series

def f(x):
     return Series(dict(A = x['A'].sum(), 
                        B = x['B'].sum(), 
                        C = "{%s}" % ', '.join(x['C'])))

In [14]: df.groupby('A').apply(f)
Out[14]: 
   A         B               C
A                             
1  2  1.615586  {This, string}
2  4  0.421821         {is, !}
3  3  0.463468             {a}
4  4  0.643961        {random}

回答 1

您可以使用该apply方法将任意函数应用于分组数据。因此,如果您想要一套,请套用set。如果需要列表,请应用list

>>> d
   A       B
0  1    This
1  2      is
2  3       a
3  4  random
4  1  string
5  2       !
>>> d.groupby('A')['B'].apply(list)
A
1    [This, string]
2           [is, !]
3               [a]
4          [random]
dtype: object

如果您还需要其他功能,只需编写一个函数即可执行所需的操作apply

You can use the apply method to apply an arbitrary function to the grouped data. So if you want a set, apply set. If you want a list, apply list.

>>> d
   A       B
0  1    This
1  2      is
2  3       a
3  4  random
4  1  string
5  2       !
>>> d.groupby('A')['B'].apply(list)
A
1    [This, string]
2           [is, !]
3               [a]
4          [random]
dtype: object

If you want something else, just write a function that does what you want and then apply that.


回答 2

您可能可以使用aggregate(或agg)函数来连接值。(未经测试的代码)

df.groupby('A')['B'].agg(lambda col: ''.join(col))

You may be able to use the aggregate (or agg) function to concatenate the values. (Untested code)

df.groupby('A')['B'].agg(lambda col: ''.join(col))

回答 3

您可以尝试以下方法:

df.groupby('A').agg({'B':'sum','C':'-'.join})

You could try this:

df.groupby('A').agg({'B':'sum','C':'-'.join})

回答 4

一个简单的解决方案是:

>>> df.groupby(['A','B']).c.unique().reset_index()

a simple solution would be :

>>> df.groupby(['A','B']).c.unique().reset_index()

回答 5

以命名聚合 pandas >= 0.25.0

从pandas 0.25.0版开始,我们已命名聚合,可以在其中进行分组,聚合并同时为我们的列分配新名称。这样,我们就不会获得MultiIndex列,并且鉴于它们包含的数据,这些列的名称更有意义:


汇总并获取字符串列表

grp = df.groupby('A').agg(B_sum=('B','sum'),
                          C=('C', list)).reset_index()

print(grp)
   A     B_sum               C
0  1  1.615586  [This, string]
1  2  0.421821         [is, !]
2  3  0.463468             [a]
3  4  0.643961        [random]

汇总并加入字符串

grp = df.groupby('A').agg(B_sum=('B','sum'),
                          C=('C', ', '.join)).reset_index()

print(grp)
   A     B_sum             C
0  1  1.615586  This, string
1  2  0.421821         is, !
2  3  0.463468             a
3  4  0.643961        random

Named aggregations with pandas >= 0.25.0

Since pandas version 0.25.0 we have named aggregations where we can groupby, aggregate and at the same time assign new names to our columns. This way we won’t get the MultiIndex columns, and the column names make more sense given the data they contain:


aggregate and get a list of strings

grp = df.groupby('A').agg(B_sum=('B','sum'),
                          C=('C', list)).reset_index()

print(grp)
   A     B_sum               C
0  1  1.615586  [This, string]
1  2  0.421821         [is, !]
2  3  0.463468             [a]
3  4  0.643961        [random]

aggregate and join the strings

grp = df.groupby('A').agg(B_sum=('B','sum'),
                          C=('C', ', '.join)).reset_index()

print(grp)
   A     B_sum             C
0  1  1.615586  This, string
1  2  0.421821         is, !
2  3  0.463468             a
3  4  0.643961        random

回答 6

如果您想覆盖数据框中的B列,则应该可以使用:

    df = df.groupby('A',as_index=False).agg(lambda x:'\n'.join(x))

If you’d like to overwrite column B in the dataframe, this should work:

    df = df.groupby('A',as_index=False).agg(lambda x:'\n'.join(x))

回答 7

遵循@Erfan的好答案,大多数时候,在分析聚合值时,您希望这些现有字符值的唯一可能组合:

unique_chars = lambda x: ', '.join(x.unique())
(df
 .groupby(['A'])
 .agg({'C': unique_chars}))

Following @Erfan’s good answer, most of the times in an analysis of aggregate values you want the unique possible combinations of these existing character values:

unique_chars = lambda x: ', '.join(x.unique())
(df
 .groupby(['A'])
 .agg({'C': unique_chars}))

如何在记事本++中执行Python文件?

问题:如何在记事本++中执行Python文件?

我更喜欢使用记事本++进行开发,

如何通过Notepad ++在Python中执行文件?

I prefer using Notepad ++ for developing,

How do I execute the files in Python through Notepad++?


回答 0

第一种选择:(最简单,推荐)

打开记事本++。在菜单上转到:运行->运行。(F5)。输入:

C:\Python26\python.exe "$(FULL_CURRENT_PATH)"

现在,不用按“运行”,而是按“保存”为其创建快捷方式。

笔记

  • 如果您拥有Python 3.1:请输入Python31而不是Python26
  • -i如果希望脚本完成后命令行窗口保持打开状态,请添加

第二选择

使用运行Python脚本的批处理脚本,然后从Notepad ++创建到该脚本的快捷方式。

如此处所述:http : //it-ride.blogspot.com/2009/08/notepad-and-python.html


第三种选择:(不安全)

该代码将打开“ HKEY_CURRENT_USER \ Software \ Python \ PythonCore”,如果该密钥存在,它将从该密钥的第一个子密钥获取路径。

检查此密钥是否存在,如果不存在,则可以尝试创建它。

First option: (Easiest, recommended)

Open Notepad++. On the menu go to: Run -> Run.. (F5). Type in:

C:\Python26\python.exe "$(FULL_CURRENT_PATH)"

Now, instead of pressing run, press save to create a shortcut for it.

Notes

  • If you have Python 3.1: type in Python31 instead of Python26
  • Add -i if you want the command line window to stay open after the script has finished

Second option

Use a batch script that runs the Python script and then create a shortcut to that from Notepad++.

As explained here: http://it-ride.blogspot.com/2009/08/notepad-and-python.html


Third option: (Not safe)

The code opens “HKEY_CURRENT_USER\Software\Python\PythonCore”, if the key exists it will get the path from the first child key of this key.

Check if this key exists, and if does not, you could try creating it.


回答 1

@ Ramiz Uddin的答案绝对值得更多关注:

  • 打开记事本++
  • 在菜单上转到:运行运行。(F5)
  • 输入: cmd /K python "$(FULL_CURRENT_PATH)"

@Ramiz Uddin’s answer definitely deserves more visibility :

  • Open Notepad++
  • On the menu go to: RunRun.. (F5)
  • Type in: cmd /K python "$(FULL_CURRENT_PATH)"

回答 2

这是为我工作的东西:

打开记事本++,然后按F5。您会看到一个小弹出框:

弹出框,输入要运行的程序

输入:C:\ Python27 \ python.exe -i“ $(FULL_CURRENT_PATH)”对于Python 2.7。

然后另存为…,并选择您自己的键组合,以在每次您要运行某些内容时启动它

Here is what’s worked for me:

Open notepad++ and press F5. You’ll get a little popup box:

Pop up box for entering the program to run

Type: C:\Python27\python.exe -i “$(FULL_CURRENT_PATH)” for Python 2.7.

and then Save As…, and pick your own key combo to start it each time you want to run something


回答 3

在菜单上转到:“运行”->“运行…”(或按F5)。

对于Python 2,输入:

py -2 -i "$(FULL_CURRENT_PATH)"

对于Python 3,输入:

py -3 -i "$(FULL_CURRENT_PATH)"

参考文献:

为了py更好地理解命令:

py -h

另一个有助于理解该py命令的链接:如何在Windows 7中运行python 2和3?

感谢Reshure的回答,这使我走上了正确的道路以解决这一问题。

On the menu go to: “Run” –> “Run…” (or just press F5).

For Python 2 type in:

py -2 -i "$(FULL_CURRENT_PATH)"

For Python 3 type in:

py -3 -i "$(FULL_CURRENT_PATH)"

References:

To understand the py command better:

py -h

Another helpful link to understand the py command: How do I run python 2 and 3 in windows 7?

Thanks to Reshure for his answer that got me on the right track to figure this out.


回答 4

首先 https://www.python.org/downloads/ 安装Python

运行安装程序

** 重要提示 **请确保同时选中以下两项:

  • 为所有用户安装启动器
  • 将Python 3.6添加到路径

单击立即安装并完成安装。

打开notepad ++,然后从插件管理器安装插件PyNPP。我正在使用N ++ 6.9.2

将新文件另存为new.py

输入N ++

import sys

print("Hello from Python!")
print("Your Python version is: " + sys.version) 

按Alt + Shift + F5

就那么简单。

First install Python from https://www.python.org/downloads/

Run the installer

** IMPORTANT ** Be sure you check both :

  • Install launcher for all users
  • Add Python 3.6 to path

Click install now and finish the installation.

Open notepad++ and install plugin PyNPP from Plugin Manager. I’m using N++ 6.9.2

Save a new file as new.py

Type in N++

import sys

print("Hello from Python!")
print("Your Python version is: " + sys.version) 

Press Alt+Shift+F5

Simple as that.


回答 5

“运行”->“运行”菜单选项的所有答案都与cmd的“ / K”开关一起使用,因此终端保持打开状态,对于python.exe则为“ -i”,因此python强制采用交互模式-两者都为您保留输出观察。

然而,在cmd /k你必须输入exit将其关闭,在python -iquit()。如果您喜欢输入太多内容(对我来说肯定是:),则要使用的“运行”命令为

cmd /k C:\Python27\python.exe  "$(FULL_CURRENT_PATH)" & pause & exit

C:\Python27\python.exe-显然是python安装的完整路径(或仅python当您想使用用户路径中的第一个可执行文件时)。

&是Windows中下一条命令的无条件执行-是无条件的,因为它会与上一条命令的RC无关而运行(&&为“ and”-仅在上一条成功完成时才运行,||-为“ or”)。

pause-打印“按任意键继续…”。并等待任何键(如果需要,可以取消该输出)。

exit -好吧,为您输入出口​​:)

因此,最后,cmd运行python.exe将执行当前文件并保持窗口打开,pause等待您按任意键,并在按任意键后exit最终关闭窗口。

All the answers for the Run->Run menu option go with the “/K” switch of cmd, so the terminal stays open, or “-i” for python.exe so python forces interactive mode – both to preserve the output for you to observe.

Yet in cmd /k you have to type exit to close it, in the python -iquit(). If that is too much typing for your liking (for me it sure is :), the Run command to use is

cmd /k C:\Python27\python.exe  "$(FULL_CURRENT_PATH)" & pause & exit

C:\Python27\python.exe – obviously the full path to your python install (or just python if you want to go with the first executable in your user’s path).

& is unconditional execution of the next command in Windows – unconditional as it runs regardless of the RC of the previous command (&& is “and” – run only if the previous completed successfully, || – is “or”).

pause – prints “Press any key to continue . . .” and waits for any key (that output can be suppressed if need).

exit – well, types the exit for you :)

So at the end, cmd runs python.exe which executes the current file and keeps the window opened, pause waits for you to press any key, and exit finally close the window once you press that any key.


回答 6

我还想直接从Notepad ++运行python文件。在线上最常见的选项是使用内置选项Run。然后,您有两个选择:

  1. 在控制台中运行python文件(在Windows中为Command Prompt),使用类似以下代码的代码(链接:):在此处输入图片说明 在此处输入图片说明 在此处输入图片说明

    C:\Path\to\Python\python.exe "$(FULL_CURRENT_PATH)"

    (如果控制台窗口在运行后立即关闭,则可以将其添加cmd /k代码中。链接:),它可以很好地工作,甚至可以通过在代码中添加交互式代码来运行文件(链接:)。在此处输入图片说明 在此处输入图片说明 在此处输入图片说明 在此处输入图片说明-i在此处输入图片说明 在此处输入图片说明 在此处输入图片说明 在此处输入图片说明 在此处输入图片说明 在此处输入图片说明 在此处输入图片说明 在此处输入图片说明

  2. 在运行Python程序空闲的代码类似这样(链接:在这些链接中使用,但我用的不是,因为将右设置当前工作目录自动):在此处输入图片说明 在此处输入图片说明 在此处输入图片说明 在此处输入图片说明C:\Path\to\Python\Lib\idlelib\idle.pyC:\Path\to\Python\Lib\idlelib\idle.batidle.bat

    C:\Path\to\Python\Lib\idlelib\idle.bat "$(FULL_CURRENT_PATH)"

    实际上,这不会在IDLE Shell中运行您的程序,而是在IDLE Editor中打开您的python文件,然后您需要单击Run Module(或单击F5)运行该程序。因此,它会在IDLE编辑器中打开您的文件,然后您需要从那里运行它,这违背了从Notepad ++运行python文件的目的。

    但是,在网上搜索时,我发现了在您的代码中添加“ -r”的选项(链接:):在此处输入图片说明 在此处输入图片说明 在此处输入图片说明 在此处输入图片说明 在此处输入图片说明

    C:\Path\to\Python\Lib\idlelib\idle.bat -r "$(FULL_CURRENT_PATH)"

    这将在IDLE Shell中运行您的python程序,并且由于它处于IDLE中,因此默认情况下处于交互模式。

通过内置Run选项运行python文件的问题是,每次运行python文件时,您都会打开新的控制台或IDLE窗口,并丢失先前执行中的所有输出。这对某些人可能并不重要,但是当我开始使用python进行编程时,我使用了Python IDLE,因此我习惯于在同一IDLE Shell窗口中多次运行python文件。从Notepad ++运行python程序的另一个问题是,您需要手动保存文件,然后单击Run(或按F5键)。要解决这些问题(AFAIK *),您需要使用Notepad ++插件。NppExec是从Notepad ++运行python文件的最佳插件 。(我还尝试了PyNPPPython脚本PyNPP在控制台中运行python文件,它可以工作,但是您可以通过内置Run选项在不使用插件的情况下做到这一点,并且Python Script用于运行与Notepad ++交互的脚本,因此您无法运行python文件。)要使用NppExec运行python文件您需要转到的插件Plugins -> NppExec -> Execute,然后输入如下所示的内容(链接:):在此处输入图片说明 在此处输入图片说明

C:\Path\to\Python\python.exe "$(FULL_CURRENT_PATH)"

随着NppExec您还可以保存与运行前,你的Python文件命令,设置工作与目录命令或运行Python程序交互模式命令。我在网上找到许多提到这些选项的链接(),但是最好使用NppExec运行python程序,这些程序是在NppExec的手册中找到的,该手册包含以下代码npp_save cd "$(CURRENT_DIRECTORY)" -i 在此处输入图片说明 在此处输入图片说明 在此处输入图片说明 在此处输入图片说明 在此处输入图片说明4.6.4. Running Python & wxPython

npp_console -  // disable any output to the Console
npp_save  // save current file (a .py file is expected)
cd "$(CURRENT_DIRECTORY)"  // use the current file's dir
set local @exit_cmd_silent = exit()  // allows to exit Python automatically
set local PATH_0 = $(SYS.PATH)  // current value of %PATH%
env_set PATH = $(SYS.PATH);C:\Python27  // use Python 2.7
npp_setfocus con  // set the focus to the Console
npp_console +  // enable output to the Console
python -i -u "$(FILE_NAME)"  // run Python's program interactively
npp_console -  // disable any output to the Console
env_set PATH = $(PATH_0)  // restore the value of %PATH%
npp_console +  // enable output to the Console

您需要做的就是复制此代码并在使用其他Python版本的情况下更改python目录(例如*我正在使用python 3.4,因此我的目录为C:\Python34)。这段代码可以正常运行,但是我在这段代码中添加了一行,因此我可以多次运行python程序而不会失去先前的输出:

npe_console m- a+

a+ 是启用“附加”模式,该模式将保留先前控制台的文本,并且不会清除它。

m- 关闭控制台的内部消息(绿色)

我在NppExec的“执行”窗口中使用的最终代码是:

npp_console -  // disable any output to the Console
npp_save  // save current file (a .py file is expected)
cd "$(CURRENT_DIRECTORY)"  // use the current file's dir
set local @exit_cmd_silent = exit()  // allows to exit Python automatically
set local PATH_0 = $(SYS.PATH)  // current value of %PATH%
env_set PATH = $(SYS.PATH);C:\Python34  // use Python 3.4
npp_setfocus con  // set the focus to the Console
npe_console m- a+
npp_console +  // enable output to the Console
python -i -u "$(FILE_NAME)"  // run Python's program interactively
npp_console -  // disable any output to the Console
env_set PATH = $(PATH_0)  // restore the value of %PATH%
npp_console +  // enable output to the Console

您可以保存NppExec的代码,并为该NppExec的脚本分配快捷键。(您需要打开NppExec插件的“高级”选项,Associated script下拉列表中选择脚本,按Add/Modify,重新启动Notepad ++,转到Notepad ++’es Settings->Shortcut Mapper -> Plugin commands,选择脚本,单击Modify分配快捷键。我想要把F5我的快捷键,这样做,你需要更改内置选项快捷键Run别的先某物)从链接到章节。NppExec手册介绍了如何为您节省NppExec的代码,并指定快捷键:NppExec's "Execute..."NppExec's script

PS *:使用NppExec插件,您可以添加Highlight Filters(位于中Console Output Filters...)突出显示某些行。我用它以红色突出显示错误行,为此您需要添加Highlight masks*File "%FILE%", line %LINE%, in <*>Traceback (most recent call last):就像这样

I also wanted to run python files directly from Notepad++. Most common option found online is using builtin option Run. Then you have two options:

  1. Run python file in console (in Windows it is Command Prompt) with code something like this (links: enter image description here enter image description here enter image description here):

    C:\Path\to\Python\python.exe "$(FULL_CURRENT_PATH)"
    

    (If your console window immediately closes after running then you can add cmd /k to your code. Links: enter image description here enter image description here enter image description here enter image description here) This works fine, and you can even run files in interactive mode by adding -i to your code (links: enter image description here enter image description here enter image description here enter image description here enter image description here enter image description here enter image description here enter image description here).

  2. Run python program in IDLE with code something like this (links: enter image description here enter image description here enter image description here enter image description here, in these links C:\Path\to\Python\Lib\idlelib\idle.py is used, but I am using C:\Path\to\Python\Lib\idlelib\idle.bat instead, because idle.bat sets the right current working directory automatically):

    C:\Path\to\Python\Lib\idlelib\idle.bat "$(FULL_CURRENT_PATH)"
    

    Actually, this doesn’t run your program in IDLE Shell, but instead it opens your python file in IDLE Editor and then you need to click Run Module (or click F5) to run the program. So it opens your file in IDLE Editor and then you need run it from there, which defeats the purpose of running python files from Notepad++.

    But, searching online, I found option which adds ‘-r’ to your code (links: enter image description here enter image description here enter image description here enter image description here enter image description here):

    C:\Path\to\Python\Lib\idlelib\idle.bat -r "$(FULL_CURRENT_PATH)"
    

    This will run your python program in IDLE Shell and because it is in IDLE it is by default in interactive mode.

Problem with running your python files via builtin Run option is that each time you run your python file, you open new console or IDLE window and lose all output from previous executions. This might not be important to some, but when I started to program in python, I used Python IDLE, so I got used to running python file multiple times in same IDLE Shell window. Also problem with running python programs from Notepad++ is that you need to manually save your file and then click Run (or press F5). To solve these problems (AFAIK*) you need to use Notepad++ Plugins. The best plugin for running python files from Notepad++ is NppExec. (I also tried PyNPP and Python Script. PyNPP runs python files in console, it works, but you can do that without plugin via builtin Run option and Python Script is used for running scripts that interact with Notepad++ so you can’t run your python files.) To run your python file with NppExec plugin you need to go to Plugins -> NppExec -> Execute and then type in something like this (links: enter image description here enter image description here):

C:\Path\to\Python\python.exe "$(FULL_CURRENT_PATH)"

With NppExec you can also save your python file before run with npp_save command, set working directory with cd "$(CURRENT_DIRECTORY)" command or run python program in interactive mode with -i command. I found many links (enter image description here enter image description here enter image description here enter image description here enter image description here) online that mention these options, but best use of NppExec to run python programs I found at NppExec’s Manual which has chapter 4.6.4. Running Python & wxPython with this code:

npp_console -  // disable any output to the Console
npp_save  // save current file (a .py file is expected)
cd "$(CURRENT_DIRECTORY)"  // use the current file's dir
set local @exit_cmd_silent = exit()  // allows to exit Python automatically
set local PATH_0 = $(SYS.PATH)  // current value of %PATH%
env_set PATH = $(SYS.PATH);C:\Python27  // use Python 2.7
npp_setfocus con  // set the focus to the Console
npp_console +  // enable output to the Console
python -i -u "$(FILE_NAME)"  // run Python's program interactively
npp_console -  // disable any output to the Console
env_set PATH = $(PATH_0)  // restore the value of %PATH%
npp_console +  // enable output to the Console

All you need to do is copy this code and change your python directory if you use some other python version (e.g.* I am using python 3.4 so my directory is C:\Python34). This code works perfectly, but there is one line I added to this code so I can run python program multiple times without loosing previous output:

npe_console m- a+

a+ is to enable the “append” mode which keeps the previous Console’s text and does not clear it.

m- turns off console’s internal messages (those are in green color)

The final code that I use in NppExec’s Execute window is:

npp_console -  // disable any output to the Console
npp_save  // save current file (a .py file is expected)
cd "$(CURRENT_DIRECTORY)"  // use the current file's dir
set local @exit_cmd_silent = exit()  // allows to exit Python automatically
set local PATH_0 = $(SYS.PATH)  // current value of %PATH%
env_set PATH = $(SYS.PATH);C:\Python34  // use Python 3.4
npp_setfocus con  // set the focus to the Console
npe_console m- a+
npp_console +  // enable output to the Console
python -i -u "$(FILE_NAME)"  // run Python's program interactively
npp_console -  // disable any output to the Console
env_set PATH = $(PATH_0)  // restore the value of %PATH%
npp_console +  // enable output to the Console

You can save your NppExec’s code, and assign a shortcut key to this NppExec’s script. (You need to open Advanced options of NppExec’s plugin, select your script in the Associated script drop-down list, press the Add/Modify, restart Notepad++ , go to Notepad++’es Settings -> Shortcut Mapper -> Plugin commands, select your script, click Modify and assign a shortcut key. I wanted to put F5 as my shortcut key, to do that you need to change shortcut key for builtin option Run to something else first.) Links to chapters from NppExec’s Manual that explain how to save you NppExec’s code and assign a shortcut key: NppExec's "Execute...", NppExec's script.

P.S.*: With NppExec plugin you can add Highlight Filters (found in Console Output Filters...) that highlight certain lines. I use it to highlight error lines in red, to do that you need to add Highlight masks: *File "%FILE%", line %LINE%, in <*> and Traceback (most recent call last): like this.


回答 7

先前提出的解决方案均不适合我。需要稍作修改。

在Notepad ++中按F5后,键入:

cmd /k "C:\Python27\python.exe $(FULL_CURRENT_PATH)"

命令提示符保持打开状态,因此您可以查看脚本的输出。

None of the previously proposed solutions worked for me. Slight modification needed.

After hitting F5 in Notepad++, type:

cmd /k "C:\Python27\python.exe $(FULL_CURRENT_PATH)"

The command prompt stays open so you can see the output of your script.


回答 8

我使用NPP_Exec插件(在插件管理器中找到)。安装完成后,打开控制台窗口(ctrl +〜)并输入:

cmd

这将启动命令提示符。然后输入:

C:\Program Files\Notepad++> **python "$(FULL_CURRENT_PATH)"**

执行您正在使用的当前文件。

I use the NPP_Exec plugin (Found in the plugins manager). Once that is installed, open the console window (ctrl+~) and type:

cmd

This will launch command prompt. Then type:

C:\Program Files\Notepad++> **python "$(FULL_CURRENT_PATH)"**

to execute the current file you are working with.


回答 9

我希望这里的人们会发布一些步骤,而不仅仅是整体概念。我最终得到了cmd / k版本的支持。

分步说明是:

  1. 在NPP中,单击菜单项:运行
  2. 在子菜单中,单击:运行
  3. 在“运行…”对话框中的“要运行的程序”字段中,删除所有现有文本并键入:cmd / K“ $(FULL_CURRENT_PATH)” / K是可选的,它将保持打开脚本运行时创建的窗口,如果您想要的话。
  4. 点击保存…按钮。
  5. 快捷方式对话框打开;如果您想要键盘快捷键,请填写它(有一条注释说“这将禁用加速器”,无论如何,所以也许您不想使用键盘快捷键,尽管在您分配键盘快捷键时不会有任何伤害不需要加速器)。我认为您必须告诉NPP Python.exe文件在哪里(例如,对我来说:C:\ Python33 \ python.exe)。我不知道您在何处或如何执行此操作,但是在这里尝试各种操作时,我能够做到这一点-我不记得哪个尝试成功了。

I wish people here would post steps instead of just overall concepts. I eventually got the cmd /k version to work.

The step-by-step instructions are:

  1. In NPP, click on the menu item: Run
  2. In the submenu, click on: Run
  3. In the Run… dialog box, in the field The Program to Run, delete any existing text and type in: cmd /K “$(FULL_CURRENT_PATH)” The /K is optional, it keeps open the window created when the script runs, if you want that.
  4. Hit the Save… button.
  5. The Shortcut dialogue box opens; fill it out if you want a keyboard shortcut (there’s a note saying “This will disable the accelerator” whatever that is, so maybe you don’t want to use the keyboard shortcut, though it probably doesn’t hurt to assign one when you don’t need an accelerator). Somewhere I think you have to tell NPP where the Python.exe file is (e.g., for me: C:\Python33\python.exe). I don’t know where or how you do this, but in trying various things here, I was able to do that–I don’t recall which attempt did the trick.

回答 10

这里没有答案,或者我找到的插件提供了我想要的。启动我在Notepad ++上编写的python代码的极简方法,方法是按下快捷键,最好没有插件。

我有适用于Windows 8.1 x86_64和Notepad ++ 32位的Python 3.6(64位)。在Notepad ++中编写Python脚本并将其保存后,请按F5键Run。然后写:

"C:\Path\to\Python\python.exe" -i "$(FULL_CURRENT_PATH)"

然后点击运行按钮。该i标志强制终端在代码执行终止后保持静止,以供您检查。此命令将在cmd终端中启动脚本,并且该终端仍将位于此处,直到您通过键入将其关闭exit()

您可以将其保存到快捷方式中以方便使用(我的是CTRL + SHIFT + P)。

No answer here, or plugin i found provided what i wanted. A minimalist method to launch my python code i wrote on Notepad++ with the press of a shortcut, with preferably no plugins.

I have Python 3.6 (64-bit), for Windows 8.1 x86_64 and Notepad++ 32bit. After you write your Python script in Notepad++ and save it, Hit F5 for Run. Then write:

"C:\Path\to\Python\python.exe" -i "$(FULL_CURRENT_PATH)"

and hit the Run button. The i flag forces the terminal to stay still after code execution has terminated, for you to inspect it. This command will launch the script in a cmd terminal and the terminal will still lie there, until you close it by typing exit().

You can save this to a shortcut for convenience (mine is CTRL + SHIFT + P).


回答 11

上述解决方案中没有解决一个问题。Python将当前工作目录设置为您从其启动解释器的任何位置。如果您需要当前工作目录与保存文件的目录相同,则可以按F5键并输入以下内容:

cmd /K cd "$(CURRENT_DIRECTORY)"&C:\Users\username\Python36-32\python.exe -i "$(FULL_CURRENT_PATH)"

除非您将C:\ Users \ username \ Python36-32 \ python.exe替换为计算机上python解释器的路径。

基本上,您是在启动命令行,将目录更改为包含要运行的.py文件的目录,然后再运行它。您可以使用’&’符号将任意多个命令行命令串在一起。

There is one issue that I didn’t see resolved in the above solutions. Python sets the current working directory to wherever you start the interpreter from. If you need the current working directory to be the same directory as where you saved the file on, then you could hit F5 and type this:

cmd /K cd "$(CURRENT_DIRECTORY)"&C:\Users\username\Python36-32\python.exe -i "$(FULL_CURRENT_PATH)"

Except you would replace C:\Users\username\Python36-32\python.exe with whatever the path to the python interpreter is on your machine.

Basically you’re starting up command line, changing the directory to the directory containing the .py file you’re trying to run, and then running it. You can string together as many command line commands as you like with the ‘&’ symbol.


回答 12

我的问题是,如copeland3300所述,我的脚本从notepad ++文件夹运行,因此无法找到其他项目文件,例如数据库文件,模块等。我使用标准notepad ++“运行”命令解决了该问题( F5)并输入:

cmd /k  "cd /d "$(CURRENT_DIRECTORY)" & python "$(FULL_CURRENT_PATH)""

Python在我的PATH中。脚本完成后,Cmd窗口保持打开状态。

My problem was, as it was mentioned by copeland3300, that my script is running from notepad++ folder, so it was impossible to locate other project files, such as database file, modules etc. I solved the problem using standard notepad++ “Run” command (F5) and typing in:

cmd /k  "cd /d "$(CURRENT_DIRECTORY)" & python "$(FULL_CURRENT_PATH)""

Python WAS in my PATH. Cmd window stayed open after script finished.


回答 13

扩展Reshure的答案

  1. 打开运行→运行…从记事本+ +的菜单栏(快捷键:F5

  2. 在给定的空间中,输入:

    "$(FULL_CURRENT_PATH)"  -1
  3. 点击运行

da!

Extending Reshure’s answer

  1. Open Run → Run… from the menubar in Notepad++ (shortcut: F5)

  2. In the given space, enter:

    "$(FULL_CURRENT_PATH)"  -1
    
  3. Click Run

ta da!


回答 14

我想避免在Notepad ++宏中使用完整的python目录路径。我尝试了此页面中提供的其他解决方案,但均失败了。

在我的PC上运行的是:

在记事本++中,按F5。

复制/粘贴此:

cmd /k cd /d $(CURRENT_DIRECTORY) && py -3 -i $(FULL_CURRENT_PATH)

输入。

I would like to avoid using full python directory path in the Notepad++ macro. I tried other solutions given in this page, they failed.

The one working on my PC is:

In Notepad++, press F5.

Copy/paste this:

cmd /k cd /d $(CURRENT_DIRECTORY) && py -3 -i $(FULL_CURRENT_PATH)

Enter.


回答 15

我最近开始为Python使用Notepad ++,并且发现此方法非常简单。准备好运行代码后,在Notepad ++窗口中右键单击代码选项卡,然后选择“在cmd中打开包含文件夹”。这会将命令提示符打开到存储当前程序的文件夹中。您现在要做的就是执行:

Python

这是在Notepad ++上完成的(2015年1月10日生成)。

我无法添加屏幕截图,因此这是带有屏幕截图的博客文章-http: //coder-decoder.blogspot.in/2015/03/using-notepad-in-windows-to-edit-and.html

I started using Notepad++ for Python very recently and I found this method very easy. Once you are ready to run the code,right-click on the tab of your code in Notepad++ window and select “Open Containing Folder in cmd”. This will open the Command Prompt into the folder where the current program is stored. All you need to do now is to execute:

python

This was done on Notepad++ (Build 10 Jan 2015).

I can’t add the screenshots, so here’s a blog post with the screenshots – http://coder-decoder.blogspot.in/2015/03/using-notepad-in-windows-to-edit-and.html


回答 16

在Notepad ++中,转到Run→Run …,选择idle.pyPython安装的路径和文件:

C:\Python27\Lib\idlelib\idle.py

添加一个空格,这:

"$(FULL_CURRENT_PATH)"

你在这里!

视频演示:

https://www.youtube.com/watch?v=sJipYE1JT38

In Notepad++, go to Run → Run…, select the path and idle.py file of your Python installation:

C:\Python27\Lib\idlelib\idle.py

add a space and this:

"$(FULL_CURRENT_PATH)"

and here you are!

Video demostration:

https://www.youtube.com/watch?v=sJipYE1JT38


回答 17

如果有人有兴趣将参数传递给cmd.exe并在虚拟环境中运行python脚本,这些是我使用的步骤:

在Notepad ++-> Run-> Run上,输入以下内容:

cmd /C cd $(CURRENT_DIRECTORY) && "PATH_to_.bat_file" $(FULL_CURRENT_PATH)

在这里,我进入.py文件所在的目录,以便可以访问.py代码目录中的所有其他相关文件。

在.bat文件中,我有:

@ECHO off
set File_Path=%1

call activate Venv
python %File_Path%
pause

In case someone is interested in passing arguments to cmd.exe and running the python script in a Virtual Environment, these are the steps I used:

On the Notepad++ -> Run -> Run , I enter the following:

cmd /C cd $(CURRENT_DIRECTORY) && "PATH_to_.bat_file" $(FULL_CURRENT_PATH)

Here I cd into the directory in which the .py file exists, so that it enables accessing any other relevant files which are in the directory of the .py code.

And on the .bat file I have:

@ECHO off
set File_Path=%1

call activate Venv
python %File_Path%
pause

回答 18

您可以通过cmd运行脚本,并位于脚本目录中:

cmd /k cd /d $(CURRENT_DIRECTORY) && python $(FULL_CURRENT_PATH)

You can run your script via cmd and be in script-directory:

cmd /k cd /d $(CURRENT_DIRECTORY) && python $(FULL_CURRENT_PATH)

回答 19

我通常更喜欢在python本地IDLE交互式shell上运行我的python脚本,而不是从命令提示符下运行。我已经尝试过了,它对我有用。只需打开“运行>运行…”,然后粘贴以下代码

python  -m idlelib.idle -r "$(FULL_CURRENT_PATH)"

之后,您可以使用热键保存它。

您必须确保已添加所需的python并将其注册到环境变量中。

I usually prefer running my python scripts on python native IDLE interactive shell rather than from command prompt or something like that. I’ve tried it, and it works for me. Just open “Run > Run…”, then paste the code below

python  -m idlelib.idle -r "$(FULL_CURRENT_PATH)"

After that, you can save it with your hotkey.

You must ensure your desired python is added and registered in your environment variables.