标签归档:pytorch

Pytorch+spark 进行大规模预测

虽然深度学习日益盛行,但目前spark还不支持深度学习算法。虽然也有相关库sparktorch能够将spark和pytorch结合起来,但是使用发现并非那么好用,而且此库目前活跃度较低,不方便debug。因此,本地训练深度学习模型并部署到spark中是一种有效的利用深度学习进行大规模预测的方法。

将pytorch模型嵌入部署到spark中进行大规模预测主要包括三步:

1.利用spark进行特征工程预处理,以保证训练集和测试集特征处理一致;

第一二步都比较简单,这里省去。主要对第三步进行说明。

模型分发(broadcast)分两种情况,第一种是简单可通过nn.Sequential定义的模型。对于这种情况可以,模型可以直接用。如下:

# 生成测试数据
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=50000, n_features=100, random_state=0)
df = pd.DataFrame(X)
df['label'] = np.random.randint(2,size=(50000))
df1 = spark.createDataFrame(df)
df1 = df1.withColumn('features', Func.array([col(f"{i}") for i in range(0, 100)])).repartition(1000)

# 创建模型并进行预测
%spark2_1.pyspark
import torch.nn as nn

network = nn.Sequential(
    nn.Linear(100, 2560),
    nn.ReLU(),
    nn.Linear(2560, 2560),
    nn.ReLU(),
    nn.Linear(2560, 2)
    #nn.Softmax(dim=1)
)

class network(nn.Module):
    def __init__(self):
        super(network, self).__init__()
        self.l1 = nn.Linear(100, 2560)
        self.l2 = nn.Linear(2560, 2560)
        self.l3 = nn.Linear(2560, 2)
        
    def forward(self, x):
        x = self.l1(x)
        x = self.l2(x)
        x = self.l3(x)
        return x

net = network()
bc_model_state = spark.sparkContext.broadcast(net.state_dict())

def get_model_for_eval():
  # Broadcast the model state_dict
  net.load_state_dict(bc_model_state.value)
  net.eval()
  return net
  
def one_row_predict(x):
    model = get_model_for_eval()
    t = torch.tensor(x, dtype=torch.float32)
    t = model(t).cpu().detach().numpy()
    #prediction = model(t).cpu().detach().item()
    # return prediction
    return list([float(i) for i in t])

one_row_udf = udf(one_row_predict, ArrayType(FloatType()))
df1 = df1.withColumn('pred_one_row', one_row_udf(col('features')))

在上面我们定义了一个简单模型,然后将其直接分发进行预测(这里省去了模型训练过程)。

但是当我们想使用一个比较复杂的模型来进行预测时(简单来讲就是不能使用 nn.Sequential 改写),使用上面的方法则会报错。

这时候需要将模型写入一个文件中,假设模型文件的路径为/export/models/item2vec.py, 使用pyspark中的addFile对其进行分发,然后import导入模型。

假设我们的模型文件/export/models/item2vec.py如下:

class Item2vec(nn.Module):
    def __init__(self, cv_dict, csr_cols):
        super(Item2vec, self).__init__()
        pass

    def forward(self, x):
        pass

    def predict(self, x):
        pass

假设模型已经训练好,现在要使用训练好的模型进行大规模预测:

from pyspark import SparkFiles
sc.addFile('/export/models/item2vec.py')
import sys
sys.path.append('/export/models/')

from item2vec import Item2vec

# model 表示训练好的模型
bc_model_state = sc.broadcast(model.state_dict())
net = Item2vec(cv_dict, csr_cols)

def get_model_for_eval_demo():
  # Broadcast the model state_dict
  net.load_state_dict(bc_model_state.value)
  net.eval()
  return net

上面的操作已经将模型分发(broadcast)出去,接下来就可以进行预测了。

预测这里介绍两种方式:一种是使用 udf + withColumn, 另一种则是使用 rdd + mapPartitions

由于这里使用的是 pyspark 2.1,还没有pandas udf,因此使用 udf + withColumn 时只能一行一行的预测,运行速度上来说是比不上 rdd + mapPartitions

对于pyspark 2.3以后的版本多了pandas udf后则可以使用batch predict了,具体可以参考

https://docs.databricks.com/static/notebooks/deep-learning/pytorch-images.html

udf + withColumn 的方式
# udf + withColumn 的方式
def one_row_predict_demo(x)
    x = torch.tensor(x, dtype=torch.float)
    _, prob = bc_model.predict(x)

    return round(float(prob[0]), 4)
    
one_row_predict_demo_udf = udf(one_row_predict_demo, DoubleType())

one_row_predict_demo_udf = udf(one_row_predict_demo, DoubleType())
df = demo.withColumn('demo_prob', one_row_predict_demo_udf('features'))
rdd + map 方式
def one_row_predict_map(rdds):
    bc_model = get_model_for_eval_demo()
    for row in rdds:
        x = torch.tensor(row.x, dtype=torch.float)
        _, prob = bc_model.predict(x)
    
        yield (row['id'], round(float(prob[0]), 4))

df = demo.rdd.mapPartitions(one_row_predict_map).toDF(['id', 'pred_prob'])

2. 效率优化(1)——mapPartition

上面的方法已经可以使得我们将训练好的深度学习模型部署到spark进行大规模预测了,但是其速度是非常慢的。通过在 mapPartitions 中进行一些处理,我们可以对预测进行加速:

# 代码源自 https://github.com/SaeedNajafi/infer-pytorch-pyspark

def basic_row_handler(row):
    return row

def predict_map(index, partition, ml_task,
                batch_size=16,
                row_preprocessor=basic_row_handler,
                row_postprocessor=basic_row_handler):

    # local model loading within each executor
    model = LocalPredictor(ml_task=ml_task, batch_size=batch_size,
                           partition_index=index)

    batch = []
    count = 0
    for row in partition:
        row_dict = row.asDict()
        # apply preprocessor on each row.
        row_dict_prep = row_preprocessor(row_dict)
        batch.append(row_dict_prep)
        count += 1
        if count == batch_size:
            # predict the ml and apply the postprocessor.
            for ret_row in model.predict(batch):  # ml prediction
                ret_row_post = row_postprocessor(ret_row)
                if ret_row_post is not None:
                    yield Row(**ret_row_post)

            batch = []
            count = 0

    # Flush remaining rows in the batches.
    if count != 0:
        for ret_row in model.predict(batch):  # ml prediction
            ret_row_post = row_postprocessor(ret_row)
            if ret_row_post is not None:
                yield Row(**ret_row_post)

        batch = []
        count = 0

上面的代码可以看作是在mapPartitions中进行了“延迟”预测——即先将一个partition中的多行数据进行处理然后合并为一个batch进行一起预测,这样能大大的提升运行效率。一个比较极端的情况是每个partition仅进行一次预测。

3. 效率优化(2)——pandas_udf

pandas_udf在udf的基础上进行了进一步的优化,利用pandas_udf程序运行效率更高。在这里我们可以借助于pandas_udf提升我们程序的运行效率:

# Enable Arrow support.
spark.conf.set("spark.sql.execution.arrow.enabled", "true")
spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", "64")

sc.addFile('get_model.py')
from get_model import get_model

model_path = '/path/to/model.pt'
data_path = '/path/to/data'

# model 表示训练好的模型
model = torch.load(model_path)
bc_model_state = sc.broadcast(model.state_dict())


def get_model_for_eval():
  # Broadcast the model state_dict  
  model = get_model()
  model.load_state_dict(bc_model_state.value)
  model.eval()
  return model

# model = torch.load(model_path)
# model = sc.broadcast(model)


@pandas_udf(FloatType())
def predict_batch_udf(arr: pd.Series) -> pd.Series:
    model = get_model_for_eval()
    # model.to(device)
    arr = np.vstack(arr.map(lambda x: eval(x)).values)
    arr = torch.tensor(arr).long()
    with torch.no_grad():
        predictions = list(model(arr).cpu().numpy())
            
    return pd.Series(predictions)

# 预测
data = data.withColumn('predictions', predict_batch_udf('features'))

作者:井底蛙蛙呱呱呱
链接:https://www.jianshu.com/p/fc60c967c8b8

我们的文章到此就结束啦,如果你喜欢今天的 Python 教程,请持续关注Python实用宝典。

有任何问题,可以在公众号后台回复:加群,回答相应验证信息,进入互助群询问。

原创不易,希望你能在下面点个赞和在看支持我继续创作,谢谢!

给作者打赏,选择打赏金额
¥1¥5¥10¥20¥50¥100¥200 自定义

​Python实用宝典 ( pythondict.com )
不只是一个宝典
欢迎关注公众号:Python实用宝典

pytorch中的模型摘要

问题:pytorch中的模型摘要

有什么办法,我可以像在Keras中的model.summary()方法那样在PyTorch中打印模型的摘要,如下所示?

Model Summary:
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_1 (InputLayer)             (None, 1, 15, 27)     0                                            
____________________________________________________________________________________________________
convolution2d_1 (Convolution2D)  (None, 8, 15, 27)     872         input_1[0][0]                    
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D)    (None, 8, 7, 27)      0           convolution2d_1[0][0]            
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 1512)          0           maxpooling2d_1[0][0]             
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 1)             1513        flatten_1[0][0]                  
====================================================================================================
Total params: 2,385
Trainable params: 2,385
Non-trainable params: 0

Is there any way, I can print the summary of a model in PyTorch like model.summary() method does in Keras as follows?

Model Summary:
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_1 (InputLayer)             (None, 1, 15, 27)     0                                            
____________________________________________________________________________________________________
convolution2d_1 (Convolution2D)  (None, 8, 15, 27)     872         input_1[0][0]                    
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D)    (None, 8, 7, 27)      0           convolution2d_1[0][0]            
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 1512)          0           maxpooling2d_1[0][0]             
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 1)             1513        flatten_1[0][0]                  
====================================================================================================
Total params: 2,385
Trainable params: 2,385
Non-trainable params: 0

回答 0

虽然您不会像Keras的模型那样获得关于模型的详细信息。

例如:

from torchvision import models
model = models.vgg16()
print(model)

在这种情况下,输出将如下所示:

VGG (
  (features): Sequential (
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU (inplace)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU (inplace)
    (4): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU (inplace)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU (inplace)
    (9): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU (inplace)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU (inplace)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU (inplace)
    (16): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU (inplace)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU (inplace)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU (inplace)
    (23): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU (inplace)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU (inplace)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU (inplace)
    (30): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
  )
  (classifier): Sequential (
    (0): Dropout (p = 0.5)
    (1): Linear (25088 -> 4096)
    (2): ReLU (inplace)
    (3): Dropout (p = 0.5)
    (4): Linear (4096 -> 4096)
    (5): ReLU (inplace)
    (6): Linear (4096 -> 1000)
  )
)

Kashyap所述,现在您可以使用该state_dict方法来获取不同图层的权重。但是,使用此层列表可能会提供更多指导,即创建一个辅助函数来获得类似模型摘要的Keras!希望这可以帮助!

While you will not get as detailed information about the model as in Keras’ model.summary, simply printing the model will give you some idea about the different layers involved and their specifications.

For instance:

from torchvision import models
model = models.vgg16()
print(model)

The output in this case would be something as follows:

VGG (
  (features): Sequential (
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU (inplace)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU (inplace)
    (4): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU (inplace)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU (inplace)
    (9): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU (inplace)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU (inplace)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU (inplace)
    (16): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU (inplace)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU (inplace)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU (inplace)
    (23): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU (inplace)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU (inplace)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU (inplace)
    (30): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
  )
  (classifier): Sequential (
    (0): Dropout (p = 0.5)
    (1): Linear (25088 -> 4096)
    (2): ReLU (inplace)
    (3): Dropout (p = 0.5)
    (4): Linear (4096 -> 4096)
    (5): ReLU (inplace)
    (6): Linear (4096 -> 1000)
  )
)

Now you could, as mentioned by Kashyap, use the state_dict method to get the weights of the different layers. But using this listing of the layers would perhaps provide more direction is creating a helper function to get that Keras like model summary! Hope this helps!


回答 1

是的,您可以使用pytorch-summary包获得准确的Keras表示形式。

VGG16的示例

from torchvision import models
from torchsummary import summary

vgg = models.vgg16()
summary(vgg, (3, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 224, 224]           1,792
              ReLU-2         [-1, 64, 224, 224]               0
            Conv2d-3         [-1, 64, 224, 224]          36,928
              ReLU-4         [-1, 64, 224, 224]               0
         MaxPool2d-5         [-1, 64, 112, 112]               0
            Conv2d-6        [-1, 128, 112, 112]          73,856
              ReLU-7        [-1, 128, 112, 112]               0
            Conv2d-8        [-1, 128, 112, 112]         147,584
              ReLU-9        [-1, 128, 112, 112]               0
        MaxPool2d-10          [-1, 128, 56, 56]               0
           Conv2d-11          [-1, 256, 56, 56]         295,168
             ReLU-12          [-1, 256, 56, 56]               0
           Conv2d-13          [-1, 256, 56, 56]         590,080
             ReLU-14          [-1, 256, 56, 56]               0
           Conv2d-15          [-1, 256, 56, 56]         590,080
             ReLU-16          [-1, 256, 56, 56]               0
        MaxPool2d-17          [-1, 256, 28, 28]               0
           Conv2d-18          [-1, 512, 28, 28]       1,180,160
             ReLU-19          [-1, 512, 28, 28]               0
           Conv2d-20          [-1, 512, 28, 28]       2,359,808
             ReLU-21          [-1, 512, 28, 28]               0
           Conv2d-22          [-1, 512, 28, 28]       2,359,808
             ReLU-23          [-1, 512, 28, 28]               0
        MaxPool2d-24          [-1, 512, 14, 14]               0
           Conv2d-25          [-1, 512, 14, 14]       2,359,808
             ReLU-26          [-1, 512, 14, 14]               0
           Conv2d-27          [-1, 512, 14, 14]       2,359,808
             ReLU-28          [-1, 512, 14, 14]               0
           Conv2d-29          [-1, 512, 14, 14]       2,359,808
             ReLU-30          [-1, 512, 14, 14]               0
        MaxPool2d-31            [-1, 512, 7, 7]               0
           Linear-32                 [-1, 4096]     102,764,544
             ReLU-33                 [-1, 4096]               0
          Dropout-34                 [-1, 4096]               0
           Linear-35                 [-1, 4096]      16,781,312
             ReLU-36                 [-1, 4096]               0
          Dropout-37                 [-1, 4096]               0
           Linear-38                 [-1, 1000]       4,097,000
================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 218.59
Params size (MB): 527.79
Estimated Total Size (MB): 746.96
----------------------------------------------------------------

Yes, you can get exact Keras representation, using pytorch-summary package.

Example for VGG16

from torchvision import models
from torchsummary import summary

vgg = models.vgg16()
summary(vgg, (3, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 224, 224]           1,792
              ReLU-2         [-1, 64, 224, 224]               0
            Conv2d-3         [-1, 64, 224, 224]          36,928
              ReLU-4         [-1, 64, 224, 224]               0
         MaxPool2d-5         [-1, 64, 112, 112]               0
            Conv2d-6        [-1, 128, 112, 112]          73,856
              ReLU-7        [-1, 128, 112, 112]               0
            Conv2d-8        [-1, 128, 112, 112]         147,584
              ReLU-9        [-1, 128, 112, 112]               0
        MaxPool2d-10          [-1, 128, 56, 56]               0
           Conv2d-11          [-1, 256, 56, 56]         295,168
             ReLU-12          [-1, 256, 56, 56]               0
           Conv2d-13          [-1, 256, 56, 56]         590,080
             ReLU-14          [-1, 256, 56, 56]               0
           Conv2d-15          [-1, 256, 56, 56]         590,080
             ReLU-16          [-1, 256, 56, 56]               0
        MaxPool2d-17          [-1, 256, 28, 28]               0
           Conv2d-18          [-1, 512, 28, 28]       1,180,160
             ReLU-19          [-1, 512, 28, 28]               0
           Conv2d-20          [-1, 512, 28, 28]       2,359,808
             ReLU-21          [-1, 512, 28, 28]               0
           Conv2d-22          [-1, 512, 28, 28]       2,359,808
             ReLU-23          [-1, 512, 28, 28]               0
        MaxPool2d-24          [-1, 512, 14, 14]               0
           Conv2d-25          [-1, 512, 14, 14]       2,359,808
             ReLU-26          [-1, 512, 14, 14]               0
           Conv2d-27          [-1, 512, 14, 14]       2,359,808
             ReLU-28          [-1, 512, 14, 14]               0
           Conv2d-29          [-1, 512, 14, 14]       2,359,808
             ReLU-30          [-1, 512, 14, 14]               0
        MaxPool2d-31            [-1, 512, 7, 7]               0
           Linear-32                 [-1, 4096]     102,764,544
             ReLU-33                 [-1, 4096]               0
          Dropout-34                 [-1, 4096]               0
           Linear-35                 [-1, 4096]      16,781,312
             ReLU-36                 [-1, 4096]               0
          Dropout-37                 [-1, 4096]               0
           Linear-38                 [-1, 1000]       4,097,000
================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 218.59
Params size (MB): 527.79
Estimated Total Size (MB): 746.96
----------------------------------------------------------------

回答 2

为了使用torchsummary类型:

from torchsummary import summary

如果没有,请先安装。

pip install torchsummary 

然后您可以尝试一下,但是请注意某些原因,除非我将模型设置为cuda,否则它将无法正常工作alexnet.cuda

from torchsummary import summary
help(summary)
import torchvision.models as models
alexnet = models.alexnet(pretrained=False)
alexnet.cuda()
summary(alexnet, (3, 224, 224))
print(alexnet)

summary必须输入尺寸和批量大小设置为-1任何批量大小,我们提供的意思。

如果设置,summary(alexnet, (3, 224, 224), 32)则使用bs=32

summary(model, input_size, batch_size=-1, device='cuda')

出:

Help on function summary in module torchsummary.torchsummary:

summary(model, input_size, batch_size=-1, device='cuda')

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [32, 64, 55, 55]          23,296
              ReLU-2           [32, 64, 55, 55]               0
         MaxPool2d-3           [32, 64, 27, 27]               0
            Conv2d-4          [32, 192, 27, 27]         307,392
              ReLU-5          [32, 192, 27, 27]               0
         MaxPool2d-6          [32, 192, 13, 13]               0
            Conv2d-7          [32, 384, 13, 13]         663,936
              ReLU-8          [32, 384, 13, 13]               0
            Conv2d-9          [32, 256, 13, 13]         884,992
             ReLU-10          [32, 256, 13, 13]               0
           Conv2d-11          [32, 256, 13, 13]         590,080
             ReLU-12          [32, 256, 13, 13]               0
        MaxPool2d-13            [32, 256, 6, 6]               0
AdaptiveAvgPool2d-14            [32, 256, 6, 6]               0
          Dropout-15                 [32, 9216]               0
           Linear-16                 [32, 4096]      37,752,832
             ReLU-17                 [32, 4096]               0
          Dropout-18                 [32, 4096]               0
           Linear-19                 [32, 4096]      16,781,312
             ReLU-20                 [32, 4096]               0
           Linear-21                 [32, 1000]       4,097,000
================================================================
Total params: 61,100,840
Trainable params: 61,100,840
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 18.38
Forward/backward pass size (MB): 268.12
Params size (MB): 233.08
Estimated Total Size (MB): 519.58
----------------------------------------------------------------
AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace)
    (3): Dropout(p=0.5)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

In order to use torchsummary type:

from torchsummary import summary

Install it first if you don’t have it.

pip install torchsummary 

And then you can try it, but note from some reason it is not working unless I set model to cuda alexnet.cuda:

from torchsummary import summary
help(summary)
import torchvision.models as models
alexnet = models.alexnet(pretrained=False)
alexnet.cuda()
summary(alexnet, (3, 224, 224))
print(alexnet)

The summary must take the input size and batch size is set to -1 meaning any batch size we provide.

If we set summary(alexnet, (3, 224, 224), 32) this means use the bs=32.

summary(model, input_size, batch_size=-1, device='cuda')

Out:

Help on function summary in module torchsummary.torchsummary:

summary(model, input_size, batch_size=-1, device='cuda')

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [32, 64, 55, 55]          23,296
              ReLU-2           [32, 64, 55, 55]               0
         MaxPool2d-3           [32, 64, 27, 27]               0
            Conv2d-4          [32, 192, 27, 27]         307,392
              ReLU-5          [32, 192, 27, 27]               0
         MaxPool2d-6          [32, 192, 13, 13]               0
            Conv2d-7          [32, 384, 13, 13]         663,936
              ReLU-8          [32, 384, 13, 13]               0
            Conv2d-9          [32, 256, 13, 13]         884,992
             ReLU-10          [32, 256, 13, 13]               0
           Conv2d-11          [32, 256, 13, 13]         590,080
             ReLU-12          [32, 256, 13, 13]               0
        MaxPool2d-13            [32, 256, 6, 6]               0
AdaptiveAvgPool2d-14            [32, 256, 6, 6]               0
          Dropout-15                 [32, 9216]               0
           Linear-16                 [32, 4096]      37,752,832
             ReLU-17                 [32, 4096]               0
          Dropout-18                 [32, 4096]               0
           Linear-19                 [32, 4096]      16,781,312
             ReLU-20                 [32, 4096]               0
           Linear-21                 [32, 1000]       4,097,000
================================================================
Total params: 61,100,840
Trainable params: 61,100,840
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 18.38
Forward/backward pass size (MB): 268.12
Params size (MB): 233.08
Estimated Total Size (MB): 519.58
----------------------------------------------------------------
AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace)
    (3): Dropout(p=0.5)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

回答 3

这将显示模型的权重和参数(但不显示输出形状)。

from torch.nn.modules.module import _addindent
import torch
import numpy as np
def torch_summarize(model, show_weights=True, show_parameters=True):
    """Summarizes torch model by showing trainable parameters and weights."""
    tmpstr = model.__class__.__name__ + ' (\n'
    for key, module in model._modules.items():
        # if it contains layers let call it recursively to get params and weights
        if type(module) in [
            torch.nn.modules.container.Container,
            torch.nn.modules.container.Sequential
        ]:
            modstr = torch_summarize(module)
        else:
            modstr = module.__repr__()
        modstr = _addindent(modstr, 2)

        params = sum([np.prod(p.size()) for p in module.parameters()])
        weights = tuple([tuple(p.size()) for p in module.parameters()])

        tmpstr += '  (' + key + '): ' + modstr 
        if show_weights:
            tmpstr += ', weights={}'.format(weights)
        if show_parameters:
            tmpstr +=  ', parameters={}'.format(params)
        tmpstr += '\n'   

    tmpstr = tmpstr + ')'
    return tmpstr

# Test
import torchvision.models as models
model = models.alexnet()
print(torch_summarize(model))

# # Output
# AlexNet (
#   (features): Sequential (
#     (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2)), weights=((64, 3, 11, 11), (64,)), parameters=23296
#     (1): ReLU (inplace), weights=(), parameters=0
#     (2): MaxPool2d (size=(3, 3), stride=(2, 2), dilation=(1, 1)), weights=(), parameters=0
#     (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)), weights=((192, 64, 5, 5), (192,)), parameters=307392
#     (4): ReLU (inplace), weights=(), parameters=0
#     (5): MaxPool2d (size=(3, 3), stride=(2, 2), dilation=(1, 1)), weights=(), parameters=0
#     (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), weights=((384, 192, 3, 3), (384,)), parameters=663936
#     (7): ReLU (inplace), weights=(), parameters=0
#     (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), weights=((256, 384, 3, 3), (256,)), parameters=884992
#     (9): ReLU (inplace), weights=(), parameters=0
#     (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), weights=((256, 256, 3, 3), (256,)), parameters=590080
#     (11): ReLU (inplace), weights=(), parameters=0
#     (12): MaxPool2d (size=(3, 3), stride=(2, 2), dilation=(1, 1)), weights=(), parameters=0
#   ), weights=((64, 3, 11, 11), (64,), (192, 64, 5, 5), (192,), (384, 192, 3, 3), (384,), (256, 384, 3, 3), (256,), (256, 256, 3, 3), (256,)), parameters=2469696
#   (classifier): Sequential (
#     (0): Dropout (p = 0.5), weights=(), parameters=0
#     (1): Linear (9216 -> 4096), weights=((4096, 9216), (4096,)), parameters=37752832
#     (2): ReLU (inplace), weights=(), parameters=0
#     (3): Dropout (p = 0.5), weights=(), parameters=0
#     (4): Linear (4096 -> 4096), weights=((4096, 4096), (4096,)), parameters=16781312
#     (5): ReLU (inplace), weights=(), parameters=0
#     (6): Linear (4096 -> 1000), weights=((1000, 4096), (1000,)), parameters=4097000
#   ), weights=((4096, 9216), (4096,), (4096, 4096), (4096,), (1000, 4096), (1000,)), parameters=58631144
# )

编辑:isaykatsman有一个pytorch PR来添加一个model.summary()完全类似于keras https://github.com/pytorch/pytorch/pull/3043/files的文件

This will show a model’s weights and parameters (but not output shape).

from torch.nn.modules.module import _addindent
import torch
import numpy as np
def torch_summarize(model, show_weights=True, show_parameters=True):
    """Summarizes torch model by showing trainable parameters and weights."""
    tmpstr = model.__class__.__name__ + ' (\n'
    for key, module in model._modules.items():
        # if it contains layers let call it recursively to get params and weights
        if type(module) in [
            torch.nn.modules.container.Container,
            torch.nn.modules.container.Sequential
        ]:
            modstr = torch_summarize(module)
        else:
            modstr = module.__repr__()
        modstr = _addindent(modstr, 2)

        params = sum([np.prod(p.size()) for p in module.parameters()])
        weights = tuple([tuple(p.size()) for p in module.parameters()])

        tmpstr += '  (' + key + '): ' + modstr 
        if show_weights:
            tmpstr += ', weights={}'.format(weights)
        if show_parameters:
            tmpstr +=  ', parameters={}'.format(params)
        tmpstr += '\n'   

    tmpstr = tmpstr + ')'
    return tmpstr

# Test
import torchvision.models as models
model = models.alexnet()
print(torch_summarize(model))

# # Output
# AlexNet (
#   (features): Sequential (
#     (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2)), weights=((64, 3, 11, 11), (64,)), parameters=23296
#     (1): ReLU (inplace), weights=(), parameters=0
#     (2): MaxPool2d (size=(3, 3), stride=(2, 2), dilation=(1, 1)), weights=(), parameters=0
#     (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)), weights=((192, 64, 5, 5), (192,)), parameters=307392
#     (4): ReLU (inplace), weights=(), parameters=0
#     (5): MaxPool2d (size=(3, 3), stride=(2, 2), dilation=(1, 1)), weights=(), parameters=0
#     (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), weights=((384, 192, 3, 3), (384,)), parameters=663936
#     (7): ReLU (inplace), weights=(), parameters=0
#     (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), weights=((256, 384, 3, 3), (256,)), parameters=884992
#     (9): ReLU (inplace), weights=(), parameters=0
#     (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), weights=((256, 256, 3, 3), (256,)), parameters=590080
#     (11): ReLU (inplace), weights=(), parameters=0
#     (12): MaxPool2d (size=(3, 3), stride=(2, 2), dilation=(1, 1)), weights=(), parameters=0
#   ), weights=((64, 3, 11, 11), (64,), (192, 64, 5, 5), (192,), (384, 192, 3, 3), (384,), (256, 384, 3, 3), (256,), (256, 256, 3, 3), (256,)), parameters=2469696
#   (classifier): Sequential (
#     (0): Dropout (p = 0.5), weights=(), parameters=0
#     (1): Linear (9216 -> 4096), weights=((4096, 9216), (4096,)), parameters=37752832
#     (2): ReLU (inplace), weights=(), parameters=0
#     (3): Dropout (p = 0.5), weights=(), parameters=0
#     (4): Linear (4096 -> 4096), weights=((4096, 4096), (4096,)), parameters=16781312
#     (5): ReLU (inplace), weights=(), parameters=0
#     (6): Linear (4096 -> 1000), weights=((1000, 4096), (1000,)), parameters=4097000
#   ), weights=((4096, 9216), (4096,), (4096, 4096), (4096,), (1000, 4096), (1000,)), parameters=58631144
# )

Edit: isaykatsman has a pytorch PR to add a model.summary() that is exactly like keras https://github.com/pytorch/pytorch/pull/3043/files


回答 4

最容易记住的(不像Keras那样漂亮):

print(model)

这也可以:

repr(model)

如果只需要参数数量:

sum([param.nelement() for param in model.parameters()])

来自:是否有与keras类似的pytorch函数与model.summary()?(论坛.PyTorch.org)

Simplest to remember (not as pretty as Keras):

print(model)

This also work:

repr(model)

If you just want the number of parameters:

sum([param.nelement() for param in model.parameters()])

From: Is there similar pytorch function as model.summary() as keras? (forum.PyTorch.org)


回答 5

您可以使用

from torchsummary import summary

您可以指定设备

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

您可以创建一个网络,如果您使用的是MNIST数据集,则以下命令将起作用并向您显示摘要

model = Network().to(device)
summary(model,(1,28,28))

You can use

from torchsummary import summary

You can specify device

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

You can create a Network, and if you are using MNIST datasets, then following commands will work and show you summary

model = Network().to(device)
summary(model,(1,28,28))

回答 6

AFAK没有pytorch中的等效model.summary()

同时,您可以引用szagoruyko的脚本,它提供了一个很好的可视化效果,如resnet18-example

干杯

AFAK there is no model.summary() like equivalent in pytorch

Meanwhile you can refer script by szagoruyko, which gives a nice visualizaton like in resnet18-example

Cheers


回答 7

在为模型类定义对象后,只需打印模型

class RNN(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim):
        super().__init__()

        self.embedding = nn.Embedding(input_dim, embedding_dim)
        self.rnn = nn.RNN(embedding_dim, hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)
    def forward():
        ...

model = RNN(input_dim, embedding_dim, hidden_dim, output_dim)
print(model)

Simply print the model after defining an object for the model class

class RNN(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim):
        super().__init__()

        self.embedding = nn.Embedding(input_dim, embedding_dim)
        self.rnn = nn.RNN(embedding_dim, hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)
    def forward():
        ...

model = RNN(input_dim, embedding_dim, hidden_dim, output_dim)
print(model)

回答 8

您可以使用x.shape,以测量张量的x尺寸

You can just use x.shape, in order to measure tensor’s x dimensions


回答 9

为了可视化和总结PyTorch模型,也可以使用tensorboardX

For visualization and summary of PyTorch models, tensorboardX can also can be utilized.


为什么我们需要在PyTorch中调用zero_grad()?

问题:为什么我们需要在PyTorch中调用zero_grad()?

zero_grad()训练期间需要调用该方法。但是文档不是很有帮助

|  zero_grad(self)
|      Sets gradients of all model parameters to zero.

为什么我们需要调用此方法?

The method zero_grad() needs to be called during training. But the documentation is not very helpful

|  zero_grad(self)
|      Sets gradients of all model parameters to zero.

Why do we need to call this method?


回答 0

在中PyTorch,我们需要在开始进行反向传播之前将梯度设置为零,因为PyTorch在随后的反向传递中累积梯度。在训练RNN时这很方便。因此,默认操作是在每次调用时累积(即求和)梯度loss.backward()

因此,理想情况下,当您开始训练循环时,应该zero out the gradients正确进行参数更新。否则,梯度将指向预期方向以外的其他方向,即朝向最小值(或最大化,如果达到最大化目标)。

这是一个简单的示例:

import torch
from torch.autograd import Variable
import torch.optim as optim

def linear_model(x, W, b):
    return torch.matmul(x, W) + b

data, targets = ...

W = Variable(torch.randn(4, 3), requires_grad=True)
b = Variable(torch.randn(3), requires_grad=True)

optimizer = optim.Adam([W, b])

for sample, target in zip(data, targets):
    # clear out the gradients of all Variables 
    # in this optimizer (i.e. W, b)
    optimizer.zero_grad()
    output = linear_model(sample, W, b)
    loss = (output - target) ** 2
    loss.backward()
    optimizer.step()

或者,如果您要进行香草梯度下降,则:

W = Variable(torch.randn(4, 3), requires_grad=True)
b = Variable(torch.randn(3), requires_grad=True)

for sample, target in zip(data, targets):
    # clear out the gradients of Variables 
    # (i.e. W, b)
    W.grad.data.zero_()
    b.grad.data.zero_()

    output = linear_model(sample, W, b)
    loss = (output - target) ** 2
    loss.backward()

    W -= learning_rate * W.grad.data
    b -= learning_rate * b.grad.data

注意:当张量上调用时,会发生梯度的累积(即求和)。.backward()loss

In PyTorch, we need to set the gradients to zero before starting to do backpropragation because PyTorch accumulates the gradients on subsequent backward passes. This is convenient while training RNNs. So, the default action is to accumulate (i.e. sum) the gradients on every loss.backward() call.

Because of this, when you start your training loop, ideally you should zero out the gradients so that you do the parameter update correctly. Else the gradient would point in some other direction than the intended direction towards the minimum (or maximum, in case of maximization objectives).

Here is a simple example:

import torch
from torch.autograd import Variable
import torch.optim as optim

def linear_model(x, W, b):
    return torch.matmul(x, W) + b

data, targets = ...

W = Variable(torch.randn(4, 3), requires_grad=True)
b = Variable(torch.randn(3), requires_grad=True)

optimizer = optim.Adam([W, b])

for sample, target in zip(data, targets):
    # clear out the gradients of all Variables 
    # in this optimizer (i.e. W, b)
    optimizer.zero_grad()
    output = linear_model(sample, W, b)
    loss = (output - target) ** 2
    loss.backward()
    optimizer.step()

Alternatively, if you’re doing a vanilla gradient descent, then:

W = Variable(torch.randn(4, 3), requires_grad=True)
b = Variable(torch.randn(3), requires_grad=True)

for sample, target in zip(data, targets):
    # clear out the gradients of Variables 
    # (i.e. W, b)
    W.grad.data.zero_()
    b.grad.data.zero_()

    output = linear_model(sample, W, b)
    loss = (output - target) ** 2
    loss.backward()

    W -= learning_rate * W.grad.data
    b -= learning_rate * b.grad.data

Note: The accumulation (i.e. sum) of gradients happen when .backward() is called on the loss tensor.


回答 1

如果您使用渐变方法来减少错误(或损失),zero_grad()将重新启动循环而不会损失上一步

如果您不使用zero_grad(),那么损失将减少而不是按需增加

例如,如果您使用zero_grad(),则会发现以下输出:

model training loss is 1.5
model training loss is 1.4
model training loss is 1.3
model training loss is 1.2

如果不使用zero_grad(),则会发现以下输出:

model training loss is 1.4
model training loss is 1.9
model training loss is 2
model training loss is 2.8
model training loss is 3.5

zero_grad() is restart looping without losses from last step if you use the gradient method for decreasing the error (or losses)

if you don’t use zero_grad() the loss will be decrease not increase as require

for example if you use zero_grad() you will find following output :

model training loss is 1.5
model training loss is 1.4
model training loss is 1.3
model training loss is 1.2

if you don’t use zero_grad() you will find following output :

model training loss is 1.4
model training loss is 1.9
model training loss is 2
model training loss is 2.8
model training loss is 3.5

如何在PyTorch中初始化权重?

问题:如何在PyTorch中初始化权重?

如何在PyTorch中的网络中初始化权重和偏差(例如,使用He或Xavier初始化)?

How to initialize the weights and biases (for example, with He or Xavier initialization) in a network in PyTorch?


回答 0

单层

要初始化单层的权重,请使用中的函数torch.nn.init。例如:

conv1 = torch.nn.Conv2d(...)
torch.nn.init.xavier_uniform(conv1.weight)

或者,您可以通过写入conv1.weight.data(是torch.Tensor)来修改参数。例:

conv1.weight.data.fill_(0.01)

偏见也是如此:

conv1.bias.data.fill_(0.01)

nn.Sequential 或自定义 nn.Module

将初始化函数传递给torch.nn.Module.apply。它将以nn.Module递归方式初始化整个权重。

申请(FN):适用fn递归到每个子模块(通过返回的.children()),以及自我。典型的用法包括初始化模型的参数(另请参见torch-nn-init)。

例:

def init_weights(m):
    if type(m) == nn.Linear:
        torch.nn.init.xavier_uniform(m.weight)
        m.bias.data.fill_(0.01)

net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
net.apply(init_weights)

Single layer

To initialize the weights of a single layer, use a function from torch.nn.init. For instance:

conv1 = torch.nn.Conv2d(...)
torch.nn.init.xavier_uniform(conv1.weight)

Alternatively, you can modify the parameters by writing to conv1.weight.data (which is a torch.Tensor). Example:

conv1.weight.data.fill_(0.01)

The same applies for biases:

conv1.bias.data.fill_(0.01)

nn.Sequential or custom nn.Module

Pass an initialization function to torch.nn.Module.apply. It will initialize the weights in the entire nn.Module recursively.

apply(fn): Applies fn recursively to every submodule (as returned by .children()) as well as self. Typical use includes initializing the parameters of a model (see also torch-nn-init).

Example:

def init_weights(m):
    if type(m) == nn.Linear:
        torch.nn.init.xavier_uniform(m.weight)
        m.bias.data.fill_(0.01)

net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
net.apply(init_weights)

回答 1

我们使用相同的神经网络(NN)结构比较权重初始化的不同模式。

全零或全零

如果您遵循以下原则 Occam剃刀,您可能会认为将所有权重设置为0或1将是最佳解决方案。不是这种情况。

每个权重相同时,每一层的所有神经元都产生相同的输出。这使得很难决定要调整的权重。

    # initialize two NN's with 0 and 1 constant weights
    model_0 = Net(constant_weight=0)
    model_1 = Net(constant_weight=1)
  • 2个纪元后:

Validation Accuracy
9.625% -- All Zeros
10.050% -- All Ones
Training Loss
2.304  -- All Zeros
1552.281  -- All Ones

统一初始化

一个均匀分布具有从一组数字拾取任何数量的相等概率。

让我们看看神经网络使用统一权重初始化的训练效果如何,low=0.0以及high=1.0

下面,我们将看到另一种方法(在Net类代码中)以初始化网络的权重。要在模型定义之外定义权重,我们可以:

  1. 定义一个根据网络层类型分配权重的函数,然后
  2. 使用model.apply(fn),将这些权重应用于初始化的模型,后者将函数应用于每个模型层。
    # takes in a module and applies the specified weight initialization
    def weights_init_uniform(m):
        classname = m.__class__.__name__
        # for every Linear layer in a model..
        if classname.find('Linear') != -1:
            # apply a uniform distribution to the weights and a bias=0
            m.weight.data.uniform_(0.0, 1.0)
            m.bias.data.fill_(0)

    model_uniform = Net()
    model_uniform.apply(weights_init_uniform)
  • 2个纪元后:

Validation Accuracy
36.667% -- Uniform Weights
Training Loss
3.208  -- Uniform Weights

设置权重的一般规则

在神经网络中设置权重的一般规则是将权重设置为接近零而又不会太小。

优良作法是在[-y,y]的范围内开始权重,其中y=1/sqrt(n)
(n是给定神经元的输入数量)。

    # takes in a module and applies the specified weight initialization
    def weights_init_uniform_rule(m):
        classname = m.__class__.__name__
        # for every Linear layer in a model..
        if classname.find('Linear') != -1:
            # get the number of the inputs
            n = m.in_features
            y = 1.0/np.sqrt(n)
            m.weight.data.uniform_(-y, y)
            m.bias.data.fill_(0)

    # create a new model with these weights
    model_rule = Net()
    model_rule.apply(weights_init_uniform_rule)

下面我们比较一下NN的性能,权重使用均匀分布[-0.5,0.5)初始化,权重使用通用规则初始化

  • 2个纪元后:

Validation Accuracy
75.817% -- Centered Weights [-0.5, 0.5)
85.208% -- General Rule [-y, y)
Training Loss
0.705  -- Centered Weights [-0.5, 0.5)
0.469  -- General Rule [-y, y)

正态分布以初始化权重

正态分布的平均值应为0,标准差应为y=1/sqrt(n),其中n是NN的输入数量

    ## takes in a module and applies the specified weight initialization
    def weights_init_normal(m):
        '''Takes in a module and initializes all linear layers with weight
           values taken from a normal distribution.'''

        classname = m.__class__.__name__
        # for every Linear layer in a model
        if classname.find('Linear') != -1:
            y = m.in_features
        # m.weight.data shoud be taken from a normal distribution
            m.weight.data.normal_(0.0,1/np.sqrt(y))
        # m.bias.data should be 0
            m.bias.data.fill_(0)

下面我们展示了两种神经网络的性能,一种使用均匀分布初始化,另一种使用正态分布

  • 2个纪元后:

Validation Accuracy
85.775% -- Uniform Rule [-y, y)
84.717% -- Normal Distribution
Training Loss
0.329  -- Uniform Rule [-y, y)
0.443  -- Normal Distribution

We compare different mode of weight-initialization using the same neural-network(NN) architecture.

All Zeros or Ones

If you follow the principle of Occam’s razor, you might think setting all the weights to 0 or 1 would be the best solution. This is not the case.

With every weight the same, all the neurons at each layer are producing the same output. This makes it hard to decide which weights to adjust.

    # initialize two NN's with 0 and 1 constant weights
    model_0 = Net(constant_weight=0)
    model_1 = Net(constant_weight=1)
  • After 2 epochs:

Validation Accuracy
9.625% -- All Zeros
10.050% -- All Ones
Training Loss
2.304  -- All Zeros
1552.281  -- All Ones

Uniform Initialization

A uniform distribution has the equal probability of picking any number from a set of numbers.

Let’s see how well the neural network trains using a uniform weight initialization, where low=0.0 and high=1.0.

Below, we’ll see another way (besides in the Net class code) to initialize the weights of a network. To define weights outside of the model definition, we can:

  1. Define a function that assigns weights by the type of network layer, then
  2. Apply those weights to an initialized model using model.apply(fn), which applies a function to each model layer.
    # takes in a module and applies the specified weight initialization
    def weights_init_uniform(m):
        classname = m.__class__.__name__
        # for every Linear layer in a model..
        if classname.find('Linear') != -1:
            # apply a uniform distribution to the weights and a bias=0
            m.weight.data.uniform_(0.0, 1.0)
            m.bias.data.fill_(0)

    model_uniform = Net()
    model_uniform.apply(weights_init_uniform)
  • After 2 epochs:

Validation Accuracy
36.667% -- Uniform Weights
Training Loss
3.208  -- Uniform Weights

General rule for setting weights

The general rule for setting the weights in a neural network is to set them to be close to zero without being too small.

Good practice is to start your weights in the range of [-y, y] where y=1/sqrt(n)
(n is the number of inputs to a given neuron).

    # takes in a module and applies the specified weight initialization
    def weights_init_uniform_rule(m):
        classname = m.__class__.__name__
        # for every Linear layer in a model..
        if classname.find('Linear') != -1:
            # get the number of the inputs
            n = m.in_features
            y = 1.0/np.sqrt(n)
            m.weight.data.uniform_(-y, y)
            m.bias.data.fill_(0)

    # create a new model with these weights
    model_rule = Net()
    model_rule.apply(weights_init_uniform_rule)

below we compare performance of NN, weights initialized with uniform distribution [-0.5,0.5) versus the one whose weight is initialized using general rule

  • After 2 epochs:

Validation Accuracy
75.817% -- Centered Weights [-0.5, 0.5)
85.208% -- General Rule [-y, y)
Training Loss
0.705  -- Centered Weights [-0.5, 0.5)
0.469  -- General Rule [-y, y)

normal distribution to initialize the weights

The normal distribution should have a mean of 0 and a standard deviation of y=1/sqrt(n), where n is the number of inputs to NN

    ## takes in a module and applies the specified weight initialization
    def weights_init_normal(m):
        '''Takes in a module and initializes all linear layers with weight
           values taken from a normal distribution.'''

        classname = m.__class__.__name__
        # for every Linear layer in a model
        if classname.find('Linear') != -1:
            y = m.in_features
        # m.weight.data shoud be taken from a normal distribution
            m.weight.data.normal_(0.0,1/np.sqrt(y))
        # m.bias.data should be 0
            m.bias.data.fill_(0)

below we show the performance of two NN one initialized using uniform-distribution and the other using normal-distribution

  • After 2 epochs:

Validation Accuracy
85.775% -- Uniform Rule [-y, y)
84.717% -- Normal Distribution
Training Loss
0.329  -- Uniform Rule [-y, y)
0.443  -- Normal Distribution

回答 2

要初始化图层,通常不需要执行任何操作。

PyTorch将为您做到。如果您考虑一下,这很有道理。当PyTorch可以按照最新趋势进行操作时,为什么还要初始化图层。

例如检查线性层

在该__init__方法中,它将调用Kaiming He的 init函数。

    def reset_parameters(self):
        init.kaiming_uniform_(self.weight, a=math.sqrt(3))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

其他图层类型也是如此。例如在这里conv2d检查

注意:正确初始化的好处是更快的训练速度。如果您的问题需要特殊的初始化,则可以进行后续处理。

To initialize layers you typically don’t need to do anything.

PyTorch will do it for you. If you think about, this has lot of sense. Why should we initialize layers, when PyTorch can do that following the latest trends.

Check for instance the Linear layer.

In the __init__ method it will call Kaiming He init function.

    def reset_parameters(self):
        init.kaiming_uniform_(self.weight, a=math.sqrt(3))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

The similar is for other layers types. For conv2d for instance check here.

To note : The gain of proper initialization is the faster training speed. If your problem deserves special initialization you can do it afterwords.


回答 3

    import torch.nn as nn        

    # a simple network
    rand_net = nn.Sequential(nn.Linear(in_features, h_size),
                             nn.BatchNorm1d(h_size),
                             nn.ReLU(),
                             nn.Linear(h_size, h_size),
                             nn.BatchNorm1d(h_size),
                             nn.ReLU(),
                             nn.Linear(h_size, 1),
                             nn.ReLU())

    # initialization function, first checks the module type,
    # then applies the desired changes to the weights
    def init_normal(m):
        if type(m) == nn.Linear:
            nn.init.uniform_(m.weight)

    # use the modules apply function to recursively apply the initialization
    rand_net.apply(init_normal)
    import torch.nn as nn        

    # a simple network
    rand_net = nn.Sequential(nn.Linear(in_features, h_size),
                             nn.BatchNorm1d(h_size),
                             nn.ReLU(),
                             nn.Linear(h_size, h_size),
                             nn.BatchNorm1d(h_size),
                             nn.ReLU(),
                             nn.Linear(h_size, 1),
                             nn.ReLU())

    # initialization function, first checks the module type,
    # then applies the desired changes to the weights
    def init_normal(m):
        if type(m) == nn.Linear:
            nn.init.uniform_(m.weight)

    # use the modules apply function to recursively apply the initialization
    rand_net.apply(init_normal)

回答 4

抱歉这么晚,希望我的回答会有所帮助。

用a初始化权重 normal distribution使用:

torch.nn.init.normal_(tensor, mean=0, std=1)

或使用 constant distribution写:

torch.nn.init.constant_(tensor, value)

或使用 uniform distribution

torch.nn.init.uniform_(tensor, a=0, b=1) # a: lower_bound, b: upper_bound

您可以在此处检查其他方法来初始化张量

Sorry for being so late, I hope my answer will help.

To initialise weights with a normal distribution use:

torch.nn.init.normal_(tensor, mean=0, std=1)

Or to use a constant distribution write:

torch.nn.init.constant_(tensor, value)

Or to use an uniform distribution:

torch.nn.init.uniform_(tensor, a=0, b=1) # a: lower_bound, b: upper_bound

You can check other methods to initialise tensors here


回答 5

如果需要更多灵活性,也可以手动设置权重

假设您输入了所有内容:

import torch
import torch.nn as nn

input = torch.ones((8, 8))
print(input)
tensor([[1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.]])

而且您想制作一个没有偏差的致密层(因此我们可以可视化):

d = nn.Linear(8, 8, bias=False)

将所有权重设置为0.5(或其他任何值):

d.weight.data = torch.full((8, 8), 0.5)
print(d.weight.data)

重量:

Out[14]: 
tensor([[0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000]])

现在,您所有的权重均为0.5。通过以下方式传递数据:

d(input)
Out[13]: 
tensor([[4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.]], grad_fn=<MmBackward>)

请记住,每个神经元接收8个输入,所有这些输入的权重为0.5,值为1(且无偏差),因此每个总和为4。

If you want some extra flexibility, you can also set the weights manually.

Say you have input of all ones:

import torch
import torch.nn as nn

input = torch.ones((8, 8))
print(input)
tensor([[1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.]])

And you want to make a dense layer with no bias (so we can visualize):

d = nn.Linear(8, 8, bias=False)

Set all the weights to 0.5 (or anything else):

d.weight.data = torch.full((8, 8), 0.5)
print(d.weight.data)

The weights:

Out[14]: 
tensor([[0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000]])

All your weights are now 0.5. Pass the data through:

d(input)
Out[13]: 
tensor([[4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.]], grad_fn=<MmBackward>)

Remember that each neuron receives 8 inputs, all of which have weight 0.5 and value of 1 (and no bias), so it sums up to 4 for each.


回答 6

遍历参数

如果您不能使用apply例如模型没有Sequential直接实现:

所有人都一样

# see UNet at https://github.com/milesial/Pytorch-UNet/tree/master/unet


def init_all(model, init_func, *params, **kwargs):
    for p in model.parameters():
        init_func(p, *params, **kwargs)

model = UNet(3, 10)
init_all(model, torch.nn.init.normal_, mean=0., std=1) 
# or
init_all(model, torch.nn.init.constant_, 1.) 

根据形状

def init_all(model, init_funcs):
    for p in model.parameters():
        init_func = init_funcs.get(len(p.shape), init_funcs["default"])
        init_func(p)

model = UNet(3, 10)
init_funcs = {
    1: lambda x: torch.nn.init.normal_(x, mean=0., std=1.), # can be bias
    2: lambda x: torch.nn.init.xavier_normal_(x, gain=1.), # can be weight
    3: lambda x: torch.nn.init.xavier_uniform_(x, gain=1.), # can be conv1D filter
    4: lambda x: torch.nn.init.xavier_uniform_(x, gain=1.), # can be conv2D filter
    "default": lambda x: torch.nn.init.constant(x, 1.), # everything else
}

init_all(model, init_funcs)

您可以尝试torch.nn.init.constant_(x, len(x.shape))检查它们是否已正确初始化:

init_funcs = {
    "default": lambda x: torch.nn.init.constant_(x, len(x.shape))
}

Iterate over parameters

If you cannot use apply for instance if the model does not implement Sequential directly:

Same for all

# see UNet at https://github.com/milesial/Pytorch-UNet/tree/master/unet


def init_all(model, init_func, *params, **kwargs):
    for p in model.parameters():
        init_func(p, *params, **kwargs)

model = UNet(3, 10)
init_all(model, torch.nn.init.normal_, mean=0., std=1) 
# or
init_all(model, torch.nn.init.constant_, 1.) 

Depending on shape

def init_all(model, init_funcs):
    for p in model.parameters():
        init_func = init_funcs.get(len(p.shape), init_funcs["default"])
        init_func(p)

model = UNet(3, 10)
init_funcs = {
    1: lambda x: torch.nn.init.normal_(x, mean=0., std=1.), # can be bias
    2: lambda x: torch.nn.init.xavier_normal_(x, gain=1.), # can be weight
    3: lambda x: torch.nn.init.xavier_uniform_(x, gain=1.), # can be conv1D filter
    4: lambda x: torch.nn.init.xavier_uniform_(x, gain=1.), # can be conv2D filter
    "default": lambda x: torch.nn.init.constant(x, 1.), # everything else
}

init_all(model, init_funcs)

You can try with torch.nn.init.constant_(x, len(x.shape)) to check that they are appropriately initialized:

init_funcs = {
    "default": lambda x: torch.nn.init.constant_(x, len(x.shape))
}

回答 7

如果看到弃用警告(@FábioPerez)…

def init_weights(m):
    if type(m) == nn.Linear:
        torch.nn.init.xavier_uniform_(m.weight)
        m.bias.data.fill_(0.01)

net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
net.apply(init_weights)

If you see a deprecation warning (@Fábio Perez)…

def init_weights(m):
    if type(m) == nn.Linear:
        torch.nn.init.xavier_uniform_(m.weight)
        m.bias.data.fill_(0.01)

net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
net.apply(init_weights)

回答 8

因为到目前为止我还没有足够的名声,我不能在下面添加评论

prosti196月26日13:16发表的答案。

    def reset_parameters(self):
        init.kaiming_uniform_(self.weight, a=math.sqrt(3))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

但我想指出的是,其实我们知道的纸一些假设开明他深入研究了整流器:上ImageNet分类超越人类水平的性能,是不恰当的,虽然它看起来像故意设计初始化方法,使一击实践。

例如,在“ 向后传播案例 ”小节中,他们假设$ w_l $和$ \ delta y_l $是彼此独立的。但是众所周知,以得分图$ \ delta y ^ L_i $为例,如果我们使用典型的交叉熵损失函数目标。

因此,我认为他的初始化工作正常的根本原因尚待阐明。因为每个人都见证了其加强深度学习培训的力量。

Cuz I haven’t had the enough reputation so far, I can’t add a comment under

the answer posted by prosti in Jun 26 ’19 at 13:16.

    def reset_parameters(self):
        init.kaiming_uniform_(self.weight, a=math.sqrt(3))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

But I wanna point out that actually we know some assumptions in the paper of Kaiming He, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, are not appropriate, though it looks like the deliberately designed initialization method makes a hit in practice.

E.g., within the subsection of Backward Propagation Case, they assume that $w_l$ and $\delta y_l$ are independent of each other. But as we all known, take the score map $\delta y^L_i$ as an instance, it often is $y_i-softmax(y^L_i)=y_i-softmax(w^L_ix^L_i)$ if we use a typical cross entropy loss function objective.

So I think the true underlying reason why He’s Initialization works well remains to unravel. Cuz everyone has witnessed its power on boosting deep learning training.


如何检查pytorch是否正在使用GPU?

问题:如何检查pytorch是否正在使用GPU?

我想知道是否pytorch正在使用我的GPU。nvidia-smi在此过程中,可以检测是否有来自GPU的任何活动,但是我想要在python脚本中编写一些东西。

有办法吗?

I would like to know if pytorch is using my GPU. It’s possible to detect with nvidia-smi if there is any activity from the GPU during the process, but I want something written in a python script.

Is there a way to do so?


回答 0

这将起作用:

In [1]: import torch

In [2]: torch.cuda.current_device()
Out[2]: 0

In [3]: torch.cuda.device(0)
Out[3]: <torch.cuda.device at 0x7efce0b03be0>

In [4]: torch.cuda.device_count()
Out[4]: 1

In [5]: torch.cuda.get_device_name(0)
Out[5]: 'GeForce GTX 950M'

In [6]: torch.cuda.is_available()
Out[6]: True

这告诉我GPU GeForce GTX 950M正在被使用PyTorch

This is going to work :

In [1]: import torch

In [2]: torch.cuda.current_device()
Out[2]: 0

In [3]: torch.cuda.device(0)
Out[3]: <torch.cuda.device at 0x7efce0b03be0>

In [4]: torch.cuda.device_count()
Out[4]: 1

In [5]: torch.cuda.get_device_name(0)
Out[5]: 'GeForce GTX 950M'

In [6]: torch.cuda.is_available()
Out[6]: True

This tells me the GPU GeForce GTX 950M is being used by PyTorch.


回答 1

因为这里没有提出,所以我添加了一个使用的方法torch.device,因为这很方便,也可以在正确的上初始化张量device

# setting device on GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()

#Additional Info when using cuda
if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))
    print('Memory Usage:')
    print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
    print('Cached:   ', round(torch.cuda.memory_cached(0)/1024**3,1), 'GB')

输出:

Using device: cuda

Tesla K80
Memory Usage:
Allocated: 0.3 GB
Cached:    0.6 GB

如上所述,使用device它是可能的

  • 移至张量到各自的device

    torch.rand(10).to(device)
  • 创建直接在张量device

    torch.rand(10, device=device)

这使得在CPUGPU之间切换变得舒适,而无需更改实际代码。


编辑:

由于对缓存分配的内存存在一些疑问和困惑,因此我添加了一些有关它的其他信息:


您可以直接device在帖子中上面指定的位置移交一个,也可以将其保留为None,它将使用current_device()

As it hasn’t been proposed here, I’m adding a method using torch.device, as this is quite handy, also when initializing tensors on the correct device.

# setting device on GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()

#Additional Info when using cuda
if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))
    print('Memory Usage:')
    print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
    print('Cached:   ', round(torch.cuda.memory_reserved(0)/1024**3,1), 'GB')

Edit: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved. So use memory_cached for older versions.

Output:

Using device: cuda

Tesla K80
Memory Usage:
Allocated: 0.3 GB
Cached:    0.6 GB

As mentioned above, using device it is possible to:

  • To move tensors to the respective device:

      torch.rand(10).to(device)
    
  • To create a tensor directly on the device:

      torch.rand(10, device=device)
    

Which makes switching between CPU and GPU comfortable without changing the actual code.


Edit:

As there has been some questions and confusion about the cached and allocated memory I’m adding some additional information about it:


You can either directly hand over a device as specified further above in the post or you can leave it None and it will use the current_device().


Additional note: Old graphic cards with Cuda compute capability 3.0 or lower may be visible but cannot be used by Pytorch!
Thanks to hekimgil for pointing this out! – “Found GPU0 GeForce GT 750M which is of cuda capability 3.0. PyTorch no longer supports this GPU because it is too old. The minimum cuda capability that we support is 3.5.”


回答 2

在开始运行训练循环之后,如果要从终端手动查看它,则您的程序是否正在使用GPU资源以及使用程度如何,则可以像下面这样简单地使用watch

$ watch -n 2 nvidia-smi

这将持续每2秒更新一次使用情况统计信息,直到您按ctrl+c


如果您需要对可能需要的更多GPU统计信息进行更多控制,则可以使用with的更复杂的版本nvidia-smi--query-gpu=...。以下是对此的简单说明:

$ watch -n 3 nvidia-smi --query-gpu=index,gpu_name,memory.total,memory.used,memory.free,temperature.gpu,pstate,utilization.gpu,utilization.memory --format=csv

这将输出统计信息,例如:

注意:中的逗号分隔查询名称之间不应有任何空格--query-gpu=...。否则,这些值将被忽略,并且不返回任何统计信息。


另外,您可以通过执行以下操作来检查您的PyTorch安装是否正确检测到CUDA安装:

In [13]: import  torch

In [14]: torch.cuda.is_available()
Out[14]: True

True状态表示PyTorch已正确配置并正在使用GPU,尽管您必须在代码中使用必需的语句移动/放置张量。


如果要在Python代码中执行此操作,请查看以下模块:

https://github.com/jonsafari/nvidia-ml-py或在pypi中:https ://pypi.python.org/pypi/nvidia-ml-py/

After you start running the training loop, if you want to manually watch it from the terminal whether your program is utilizing the GPU resources and to what extent, then you can simply use watch as in:

$ watch -n 2 nvidia-smi

This will continuously update the usage stats for every 2 seconds until you press ctrl+c


If you need more control on more GPU stats you might need, you can use more sophisticated version of nvidia-smi with --query-gpu=.... Below is a simple illustration of this:

$ watch -n 3 nvidia-smi --query-gpu=index,gpu_name,memory.total,memory.used,memory.free,temperature.gpu,pstate,utilization.gpu,utilization.memory --format=csv

which would output the stats something like:

Note: There should not be any space between the comma separated query names in --query-gpu=.... Else those values will be ignored and no stats are returned.


Also, you can check whether your installation of PyTorch detects your CUDA installation correctly by doing:

In [13]: import  torch

In [14]: torch.cuda.is_available()
Out[14]: True

True status means that PyTorch is configured correctly and is using the GPU although you have to move/place the tensors with necessary statements in your code.


If you want to do this inside Python code, then look into this module:

https://github.com/jonsafari/nvidia-ml-py or in pypi here: https://pypi.python.org/pypi/nvidia-ml-py/


回答 3

在办公室站点和“入门”页面上,如下检查PyTorch的GPU:

import torch
torch.cuda.is_available()

参考:PyTorch |开始

On the office site and the get start page, check GPU for PyTorch as below:

import torch
torch.cuda.is_available()

Reference: PyTorch|Get Start


回答 4

从实际的角度来看,只有一个小题外话:

import torch
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

dev现在知道,如果CUDA或CPU。

当移至cuda时,如何处理模型和张量是有区别的。起初有点奇怪。

import torch
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
t1 = torch.randn(1,2)
t2 = torch.randn(1,2).to(dev)
print(t1)  # tensor([[-0.2678,  1.9252]])
print(t2)  # tensor([[ 0.5117, -3.6247]], device='cuda:0')
t1.to(dev) 
print(t1)  # tensor([[-0.2678,  1.9252]]) 
print(t1.is_cuda) # False
t1=t1.to(dev)
print(t1)  # tensor([[-0.2678,  1.9252]], device='cuda:0') 
print(t1.is_cuda) # True

class M(nn.Module):
def __init__(self):        
    super().__init__()        
    self.l1 = nn.Linear(1,2)

def forward(self, x):                      
    x = self.l1(x)
    return x
model = M()   # not on cuda
model.to(dev) # is on cuda (all parameters)
print(next(model.parameters()).is_cuda) #True

这一切都是棘手的,一旦理解就可以帮助您以更少的调试快速处理。

From practical standpoint just one minor digression:

import torch
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

This dev now knows if cuda or cpu.

And there is a difference how you deal with model and with tensors when moving to cuda. It is a bit strange at first.

import torch
import torch.nn as nn
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
t1 = torch.randn(1,2)
t2 = torch.randn(1,2).to(dev)
print(t1)  # tensor([[-0.2678,  1.9252]])
print(t2)  # tensor([[ 0.5117, -3.6247]], device='cuda:0')
t1.to(dev) 
print(t1)  # tensor([[-0.2678,  1.9252]]) 
print(t1.is_cuda) # False
t1 = t1.to(dev)
print(t1)  # tensor([[-0.2678,  1.9252]], device='cuda:0') 
print(t1.is_cuda) # True

class M(nn.Module):
    def __init__(self):        
        super().__init__()        
        self.l1 = nn.Linear(1,2)

    def forward(self, x):                      
        x = self.l1(x)
        return x
model = M()   # not on cuda
model.to(dev) # is on cuda (all parameters)
print(next(model.parameters()).is_cuda) # True

This all is tricky and understanding it once, helps you to deal fast with less debugging.


回答 5

要检查是否有可用的GPU:

torch.cuda.is_available()

如果以上函数返回False

  1. 你要么没有GPU,
  2. 或尚未安装Nvidia驱动程序,因此操作系统看不到GPU,
  3. 或者GPU被环境变量隐藏CUDA_VISIBLE_DEVICES。当值CUDA_VISIBLE_DEVICES是-1时,所有设备都被隐藏。您可以使用以下代码在代码中检查该值:os.environ['CUDA_VISIBLE_DEVICES']

如果以上函数返回True,则不一定表示您正在使用GPU。在Pytorch中,您可以在创建张量时为设备分配张量。默认情况下,张量分配给cpu。要检查张量的分配位置,请执行以下操作:

# assuming that 'a' is a tensor created somewhere else
a.device  # returns the device where the tensor is allocated

请注意,您无法对在不同设备中分配的张量进行操作。要查看如何为GPU分配张量,请参见此处:https : //pytorch.org/docs/stable/notes/cuda.html

To check if there is a GPU available:

torch.cuda.is_available()

If the above function returns False,

  1. you either have no GPU,
  2. or the Nvidia drivers have not been installed so the OS does not see the GPU,
  3. or the GPU is being hidden by the environmental variable CUDA_VISIBLE_DEVICES. When the value of CUDA_VISIBLE_DEVICES is -1, then all your devices are being hidden. You can check that value in code with this line: os.environ['CUDA_VISIBLE_DEVICES']

If the above function returns True that does not necessarily mean that you are using the GPU. In Pytorch you can allocate tensors to devices when you create them. By default, tensors get allocated to the cpu. To check where your tensor is allocated do:

# assuming that 'a' is a tensor created somewhere else
a.device  # returns the device where the tensor is allocated

Note that you cannot operate on tensors allocated in different devices. To see how to allocate a tensor to the GPU, see here: https://pytorch.org/docs/stable/notes/cuda.html


回答 6

几乎所有答案都在这里参考torch.cuda.is_available()。但是,那只是硬币的一部分。它告诉您GPU(实际上是CUDA)是否可用,而不是实际上是否在使用它。在典型的设置中,您可以通过以下方式设置设备:

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

但是在较大的环境(例如研究)中,通常也为用户提供更多选项,因此,基于输入,他们可以禁用CUDA,指定CUDA ID等。在这种情况下,是否使用GPU不仅取决于是否可用。将设备设置为割炬设备后,您可以获取其type属性以验证它是否为CUDA。

if device.type == 'cuda':
    # do something

Almost all answers here reference torch.cuda.is_available(). However, that’s only one part of the coin. It tells you whether the GPU (actually CUDA) is available, not whether it’s actually being used. In a typical setup, you would set your device with something like this:

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

but in larger environments (e.g. research) it is also common to give the user more options, so based on input they can disable CUDA, specify CUDA IDs, and so on. In such case, whether or not the GPU is used is not only based on whether it is available or not. After the device has been set to a torch device, you can get its type property to verify whether it’s CUDA or not.

if device.type == 'cuda':
    # do something

回答 7

只需从命令提示符或Linux环境运行以下命令。

python -c 'import torch; print(torch.cuda.is_available())'

上面应该打印 True

python -c 'import torch; print(torch.rand(2,3).cuda())'

这应该打印以下内容:

tensor([[0.7997, 0.6170, 0.7042], [0.4174, 0.1494, 0.0516]], device='cuda:0')

Simply from command prompt or Linux environment run the following command.

python -c 'import torch; print(torch.cuda.is_available())'

The above should print True

python -c 'import torch; print(torch.rand(2,3).cuda())'

This one should print the following:

tensor([[0.7997, 0.6170, 0.7042], [0.4174, 0.1494, 0.0516]], device='cuda:0')

回答 8

如果您在这里是因为您的pytorch总是False为此付出代价torch.cuda.is_available(),则可能是因为您安装的pytorch版本没有GPU支持。(例如:您先在笔记本电脑中编码,然后在服务器上进行测试)。

解决方案是使用pytorch 下载页面中的正确命令再次卸载并安装pytorch 。另请参阅 pytorch问题。

If you are here because your pytorch always gives False for torch.cuda.is_available() that’s probably because you installed your pytorch version without GPU support. (Eg: you coded up in laptop then testing on server).

The solution is to uninstall and install pytorch again with the right command from pytorch downloads page. Also refer this pytorch issue.


回答 9

在GPU上创建张量,如下所示:

$ python
>>> import torch
>>> print(torch.rand(3,3).cuda()) 

不要退出,打开另一个终端,并使用以下命令检查python进程是否正在使用GPU:

$ nvidia-smi

Create a tensor on the GPU as follows:

$ python
>>> import torch
>>> print(torch.rand(3,3).cuda()) 

Do not quit, open another terminal and check if the python process is using the GPU using:

$ nvidia-smi

PyTorch中的“视图”方法如何工作?

问题:PyTorch中的“视图”方法如何工作?

我对方法感到困惑 view()对以下代码片段中。

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool  = nn.MaxPool2d(2,2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1   = nn.Linear(16*5*5, 120)
        self.fc2   = nn.Linear(120, 84)
        self.fc3   = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

我的困惑是关于以下几行。

x = x.view(-1, 16*5*5)

是什么 tensor.view()函数有作用?我已经在很多地方看到了它的用法,但是我不明白它是如何解释其参数的。

如果我给负值作为参数,会发生什么? view()函数怎样?例如,如果我打电话给我tensor_variable.view(1, 1, -1)怎么办?

谁能view()用一些例子解释功能的主要原理?

I am confused about the method view() in the following code snippet.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool  = nn.MaxPool2d(2,2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1   = nn.Linear(16*5*5, 120)
        self.fc2   = nn.Linear(120, 84)
        self.fc3   = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

My confusion is regarding the following line.

x = x.view(-1, 16*5*5)

What does tensor.view() function do? I have seen its usage in many places, but I can’t understand how it interprets its parameters.

What happens if I give negative values as parameters to the view() function? For example, what happens if I call, tensor_variable.view(1, 1, -1)?

Can anyone explain the main principle of view() function with some examples?


回答 0

视图功能旨在重塑张量。

说你有张量

import torch
a = torch.range(1, 16)

a是具有16个元素(从1到16(包括))的张量。如果要重塑该张量以使其成为4 x 4张量,则可以使用

a = a.view(4, 4)

现在a将是4 x 4张量。请注意,在重塑后,元素总数必须保持不变。重塑张a3 x 5张量是不恰当的。

参数-1是什么意思?

如果在某些情况下您不知道要多少行,但是确定了列数,则可以将其指定为-1。(请注意,您可以将其扩展到具有更大尺寸的张量。轴值之一只能是-1)。这是一种告诉库的方法:“给我一个具有这么多列的张量,然后您就可以计算出实现此目的所需的适当行数”。

可以在上面给出的神经网络代码中看到。在x = self.pool(F.relu(self.conv2(x)))前进功能中的线之后,您将具有16深度特征图。您必须将其展平以将其分配给完全连接的层。因此,您告诉pytorch重塑所获得的张量,使其具有特定的列数,并告诉它自己决定行数。

在numpy和pytorch之间绘制相似之处, view类似于numpy的重塑功能。

The view function is meant to reshape the tensor.

Say you have a tensor

import torch
a = torch.range(1, 16)

a is a tensor that has 16 elements from 1 to 16(included). If you want to reshape this tensor to make it a 4 x 4 tensor then you can use

a = a.view(4, 4)

Now a will be a 4 x 4 tensor. Note that after the reshape the total number of elements need to remain the same. Reshaping the tensor a to a 3 x 5 tensor would not be appropriate.

What is the meaning of parameter -1?

If there is any situation that you don’t know how many rows you want but are sure of the number of columns, then you can specify this with a -1. (Note that you can extend this to tensors with more dimensions. Only one of the axis value can be -1). This is a way of telling the library: “give me a tensor that has these many columns and you compute the appropriate number of rows that is necessary to make this happen”.

This can be seen in the neural network code that you have given above. After the line x = self.pool(F.relu(self.conv2(x))) in the forward function, you will have a 16 depth feature map. You have to flatten this to give it to the fully connected layer. So you tell pytorch to reshape the tensor you obtained to have specific number of columns and tell it to decide the number of rows by itself.

Drawing a similarity between numpy and pytorch, view is similar to numpy’s reshape function.


回答 1

让我们做一些例子,从简单到困难。

  1. view方法返回的张量具有与张量相同的数据self(这意味着返回的张量具有相同数量的元素),但形状不同。例如:

    a = torch.arange(1, 17)  # a's shape is (16,)
    
    a.view(4, 4) # output below
      1   2   3   4
      5   6   7   8
      9  10  11  12
     13  14  15  16
    [torch.FloatTensor of size 4x4]
    
    a.view(2, 2, 4) # output below
    (0 ,.,.) = 
    1   2   3   4
    5   6   7   8
    
    (1 ,.,.) = 
     9  10  11  12
    13  14  15  16
    [torch.FloatTensor of size 2x2x4]
  2. 假设这-1不是参数之一,则将它们相乘时,结果必须等于张量中的元素数量。如果您执行以下操作:a.view(3, 3),它将引发一个RuntimeError原因,因为形状(3 x 3)不适用于具有16个元素的输入。换句话说:3 x 3不等于16而是9。

  3. 您可以将其-1用作传递给函数的参数之一,但只能使用一次。所有发生的事情是该方法将为您完成如何填充该维​​度的数学运算。例如a.view(2, -1, 4)等于a.view(2, 2, 4)。[16 /(2 x 4)= 2]

  4. 请注意,返回的张量共享相同的数据。如果您在“视图”中进行了更改,那么您正在更改原始张量的数据:

    b = a.view(4, 4)
    b[0, 2] = 2
    a[2] == 3.0
    False
  5. 现在,对于更复杂的用例。该文档说,每个新视图维必须是原始维的子空间,或者只能是跨度d,d + 1,…,d + k,它们满足以下所有i = 0,…的连续性条件。 ..,k-1,stride [i] = stride [i +1] x size [i +1]。否则,contiguous()需要先调用才能查看张量。例如:

    a = torch.rand(5, 4, 3, 2) # size (5, 4, 3, 2)
    a_t = a.permute(0, 2, 3, 1) # size (5, 3, 2, 4)
    
    # The commented line below will raise a RuntimeError, because one dimension
    # spans across two contiguous subspaces
    # a_t.view(-1, 4)
    
    # instead do:
    a_t.contiguous().view(-1, 4)
    
    # To see why the first one does not work and the second does,
    # compare a.stride() and a_t.stride()
    a.stride() # (24, 6, 2, 1)
    a_t.stride() # (24, 2, 1, 6)

    请注意,对于a_t,因为24!= 2 x 3,所以stride [0]!= stride [1] x size [1]

Let’s do some examples, from simpler to more difficult.

  1. The view method returns a tensor with the same data as the self tensor (which means that the returned tensor has the same number of elements), but with a different shape. For example:

    a = torch.arange(1, 17)  # a's shape is (16,)
    
    a.view(4, 4) # output below
      1   2   3   4
      5   6   7   8
      9  10  11  12
     13  14  15  16
    [torch.FloatTensor of size 4x4]
    
    a.view(2, 2, 4) # output below
    (0 ,.,.) = 
    1   2   3   4
    5   6   7   8
    
    (1 ,.,.) = 
     9  10  11  12
    13  14  15  16
    [torch.FloatTensor of size 2x2x4]
    
  2. Assuming that -1 is not one of the parameters, when you multiply them together, the result must be equal to the number of elements in the tensor. If you do: a.view(3, 3), it will raise a RuntimeError because shape (3 x 3) is invalid for input with 16 elements. In other words: 3 x 3 does not equal 16 but 9.

  3. You can use -1 as one of the parameters that you pass to the function, but only once. All that happens is that the method will do the math for you on how to fill that dimension. For example a.view(2, -1, 4) is equivalent to a.view(2, 2, 4). [16 / (2 x 4) = 2]

  4. Notice that the returned tensor shares the same data. If you make a change in the “view” you are changing the original tensor’s data:

    b = a.view(4, 4)
    b[0, 2] = 2
    a[2] == 3.0
    False
    
  5. Now, for a more complex use case. The documentation says that each new view dimension must either be a subspace of an original dimension, or only span d, d + 1, …, d + k that satisfy the following contiguity-like condition that for all i = 0, …, k – 1, stride[i] = stride[i + 1] x size[i + 1]. Otherwise, contiguous() needs to be called before the tensor can be viewed. For example:

    a = torch.rand(5, 4, 3, 2) # size (5, 4, 3, 2)
    a_t = a.permute(0, 2, 3, 1) # size (5, 3, 2, 4)
    
    # The commented line below will raise a RuntimeError, because one dimension
    # spans across two contiguous subspaces
    # a_t.view(-1, 4)
    
    # instead do:
    a_t.contiguous().view(-1, 4)
    
    # To see why the first one does not work and the second does,
    # compare a.stride() and a_t.stride()
    a.stride() # (24, 6, 2, 1)
    a_t.stride() # (24, 2, 1, 6)
    

    Notice that for a_t, stride[0] != stride[1] x size[1] since 24 != 2 x 3


回答 2

torch.Tensor.view()

简而言之,torch.Tensor.view()numpy.ndarray.reshape()或启发numpy.reshape(),创建了一个新视图,只要新形状与原始张量的形状兼容张量。

让我们通过一个具体的例子来详细了解这一点。

In [43]: t = torch.arange(18) 

In [44]: t 
Out[44]: 
tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17])

有了这个张量t的形状(18,),新观点可以为以下形状创建:

(1, 18)或等效 (1, -1)或 或等效 或 或等效 或 或等效 或 或等效 或或等效 或(-1, 18)
(2, 9)(2, -1)(-1, 9)
(3, 6)(3, -1)(-1, 6)
(6, 3)(6, -1)(-1, 3)
(9, 2)(9, -1)(-1, 2)
(18, 1)(18, -1)(-1, 1)

正如我们可以从已经上述形状元组观察,形状元组(例如中的元素的乘法运算2*93*6等)必须始终等于在原始张量元素的总数(18在我们的例子)。

要观察的另一件事是,我们-1在每个形状元组的一个位置中使用了a 。通过使用a -1,我们懒于自己进行计算,而是将任务委托给PyTorch来在形状创建新视图时对该形状进行该值的计算。需要注意的重要一件事是,我们只能-1在形状元组中使用单个。其余值应由我们明确提供。其他PyTorch会抱怨RuntimeError

RuntimeError:只能推断一个维度

因此,使用上述所有形状,PyTorch将始终返回原始张量的新视图t。这基本上意味着,它只是针对所请求的每个新视图更改张量的步幅信息。

下面是一些示例,说明每个新视图如何改变张量的步幅。

# stride of our original tensor `t`
In [53]: t.stride() 
Out[53]: (1,)

现在,我们将看到新视图的大步前进:

# shape (1, 18)
In [54]: t1 = t.view(1, -1)
# stride tensor `t1` with shape (1, 18)
In [55]: t1.stride() 
Out[55]: (18, 1)

# shape (2, 9)
In [56]: t2 = t.view(2, -1)
# stride of tensor `t2` with shape (2, 9)
In [57]: t2.stride()       
Out[57]: (9, 1)

# shape (3, 6)
In [59]: t3 = t.view(3, -1) 
# stride of tensor `t3` with shape (3, 6)
In [60]: t3.stride() 
Out[60]: (6, 1)

# shape (6, 3)
In [62]: t4 = t.view(6,-1)
# stride of tensor `t4` with shape (6, 3)
In [63]: t4.stride() 
Out[63]: (3, 1)

# shape (9, 2)
In [65]: t5 = t.view(9, -1) 
# stride of tensor `t5` with shape (9, 2)
In [66]: t5.stride()
Out[66]: (2, 1)

# shape (18, 1)
In [68]: t6 = t.view(18, -1)
# stride of tensor `t6` with shape (18, 1)
In [69]: t6.stride()
Out[69]: (1, 1)

这就是view()功能的魔力。它只是改变(原始)张量的步幅为每个新的观点,只要新的形状视图是与原来的形状相容。

从跨步元组可能会观察到的另一件有趣的事情是,在形状元组的第0 位置的元素的值等于在形状元组的第一个位置的元素的值。

In [74]: t3.shape 
Out[74]: torch.Size([3, 6])
                        |
In [75]: t3.stride()    |
Out[75]: (6, 1)         |
          |_____________|

这是因为:

In [76]: t3 
Out[76]: 
tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11],
        [12, 13, 14, 15, 16, 17]])

步幅(6, 1)说,从一个元素到下一个元素沿0 维度,我们要或采取6个步骤。(即从去06,人们必须采取6个步骤。)但是,从一个元素去的1个一个元素ST层面,我们只需要只差一步(例如,用于从去23)。

因此,步幅信息是如何从存储器访问元素以执行计算的核心。


torch.reshape()

此函数将返回一个视图,并且与使用完全相同torch.Tensor.view()只要新形状与原始张量的形状兼容,与之。否则,它将返回一个副本。

但是,注意事项torch.reshape()警告:

连续的输入和具有兼容步幅的输入可以在不复制的情况下进行重塑,但其中一个不应依赖于复制与查看行为。

torch.Tensor.view()

Simply put, torch.Tensor.view() which is inspired by numpy.ndarray.reshape() or numpy.reshape(), creates a new view of the tensor, as long as the new shape is compatible with the shape of the original tensor.

Let’s understand this in detail using a concrete example.

In [43]: t = torch.arange(18) 

In [44]: t 
Out[44]: 
tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17])

With this tensor t of shape (18,), new views can only be created for the following shapes:

(1, 18) or equivalently (1, -1) or (-1, 18)
(2, 9) or equivalently (2, -1) or (-1, 9)
(3, 6) or equivalently (3, -1) or (-1, 6)
(6, 3) or equivalently (6, -1) or (-1, 3)
(9, 2) or equivalently (9, -1) or (-1, 2)
(18, 1) or equivalently (18, -1) or (-1, 1)

As we can already observe from the above shape tuples, the multiplication of the elements of the shape tuple (e.g. 2*9, 3*6 etc.) must always be equal to the total number of elements in the original tensor (18 in our example).

Another thing to observe is that we used a -1 in one of the places in each of the shape tuples. By using a -1, we are being lazy in doing the computation ourselves and rather delegate the task to PyTorch to do calculation of that value for the shape when it creates the new view. One important thing to note is that we can only use a single -1 in the shape tuple. The remaining values should be explicitly supplied by us. Else PyTorch will complain by throwing a RuntimeError:

RuntimeError: only one dimension can be inferred

So, with all of the above mentioned shapes, PyTorch will always return a new view of the original tensor t. This basically means that it just changes the stride information of the tensor for each of the new views that are requested.

Below are some examples illustrating how the strides of the tensors are changed with each new view.

# stride of our original tensor `t`
In [53]: t.stride() 
Out[53]: (1,)

Now, we will see the strides for the new views:

# shape (1, 18)
In [54]: t1 = t.view(1, -1)
# stride tensor `t1` with shape (1, 18)
In [55]: t1.stride() 
Out[55]: (18, 1)

# shape (2, 9)
In [56]: t2 = t.view(2, -1)
# stride of tensor `t2` with shape (2, 9)
In [57]: t2.stride()       
Out[57]: (9, 1)

# shape (3, 6)
In [59]: t3 = t.view(3, -1) 
# stride of tensor `t3` with shape (3, 6)
In [60]: t3.stride() 
Out[60]: (6, 1)

# shape (6, 3)
In [62]: t4 = t.view(6,-1)
# stride of tensor `t4` with shape (6, 3)
In [63]: t4.stride() 
Out[63]: (3, 1)

# shape (9, 2)
In [65]: t5 = t.view(9, -1) 
# stride of tensor `t5` with shape (9, 2)
In [66]: t5.stride()
Out[66]: (2, 1)

# shape (18, 1)
In [68]: t6 = t.view(18, -1)
# stride of tensor `t6` with shape (18, 1)
In [69]: t6.stride()
Out[69]: (1, 1)

So that’s the magic of the view() function. It just changes the strides of the (original) tensor for each of the new views, as long as the shape of the new view is compatible with the original shape.

Another interesting thing one might observe from the strides tuples is that the value of the element in the 0th position is equal to the value of the element in the 1st position of the shape tuple.

In [74]: t3.shape 
Out[74]: torch.Size([3, 6])
                        |
In [75]: t3.stride()    |
Out[75]: (6, 1)         |
          |_____________|

This is because:

In [76]: t3 
Out[76]: 
tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11],
        [12, 13, 14, 15, 16, 17]])

the stride (6, 1) says that to go from one element to the next element along the 0th dimension, we have to jump or take 6 steps. (i.e. to go from 0 to 6, one has to take 6 steps.) But to go from one element to the next element in the 1st dimension, we just need only one step (for e.g. to go from 2 to 3).

Thus, the strides information is at the heart of how the elements are accessed from memory for performing the computation.


torch.reshape()

This function would return a view and is exactly the same as using torch.Tensor.view() as long as the new shape is compatible with the shape of the original tensor. Otherwise, it will return a copy.

However, the notes of torch.reshape() warns that:

contiguous inputs and inputs with compatible strides can be reshaped without copying, but one should not depend on the copying vs. viewing behavior.


回答 3

我发现它x.view(-1, 16 * 5 * 5)等效于x.flatten(1),其中参数1指示扁平化过程从第一维开始(而不是扁平化“样本”维),如您所见,后者的用法在语义上更加清晰并且易于使用,因此我喜欢flatten()

I figured it out that x.view(-1, 16 * 5 * 5) is equivalent to x.flatten(1), where the parameter 1 indicates the flatten process starts from the 1st dimension(not flattening the ‘sample’ dimension) As you can see, the latter usage is semantically more clear and easier to use, so I prefer flatten().


回答 4

参数-1是什么意思?

您可以读取-1为动态数量的参数或“任何内容”。正因为如此,只能有一个参数-1view()

如果您要求,x.view(-1,1)将输出张量形状,[anything, 1]具体取决于中的元素数量x。例如:

import torch
x = torch.tensor([1, 2, 3, 4])
print(x,x.shape)
print("...")
print(x.view(-1,1), x.view(-1,1).shape)
print(x.view(1,-1), x.view(1,-1).shape)

将输出:

tensor([1, 2, 3, 4]) torch.Size([4])
...
tensor([[1],
        [2],
        [3],
        [4]]) torch.Size([4, 1])
tensor([[1, 2, 3, 4]]) torch.Size([1, 4])

What is the meaning of parameter -1?

You can read -1 as dynamic number of parameters or “anything”. Because of that there can be only one parameter -1 in view().

If you ask x.view(-1,1) this will output tensor shape [anything, 1] depending on the number of elements in x. For example:

import torch
x = torch.tensor([1, 2, 3, 4])
print(x,x.shape)
print("...")
print(x.view(-1,1), x.view(-1,1).shape)
print(x.view(1,-1), x.view(1,-1).shape)

Will output:

tensor([1, 2, 3, 4]) torch.Size([4])
...
tensor([[1],
        [2],
        [3],
        [4]]) torch.Size([4, 1])
tensor([[1, 2, 3, 4]]) torch.Size([1, 4])

回答 5

weights.reshape(a, b) 将返回一个新的张量,该张量的数据与权重为(a,b)的权重相同,因为它会将数据复制到内存的另一部分。

weights.resize_(a, b)返回具有不同形状的相同张量。但是,如果新形状导致的元素数量少于原始张量,则某些元素将从张量中删除(但不会从内存中删除)。如果新形状导致的元素数量多于原始张量,则新元素将在内存中未初始化。

weights.view(a, b) 将返回与具有权重(a,b)的权重相同的数据的新张量

weights.reshape(a, b) will return a new tensor with the same data as weights with size (a, b) as in it copies the data to another part of memory.

weights.resize_(a, b) returns the same tensor with a different shape. However, if the new shape results in fewer elements than the original tensor, some elements will be removed from the tensor (but not from memory). If the new shape results in more elements than the original tensor, new elements will be uninitialized in memory.

weights.view(a, b) will return a new tensor with the same data as weights with size (a, b)


回答 6

我真的很喜欢@Jadiel de Armas的例子。

我想对.view(…)的元素排序方式有一点了解

  • 对于形状为(a,b,c)的张量,其元素的顺序由编号系统确定:其中第一个数字 数字,第二个数字为b数字,第三个数字为c数字。
  • .view(…)返回的新Tensor中的元素映射将保留原始Tensor的此顺序

I really liked @Jadiel de Armas examples.

I would like to add a small insight to how elements are ordered for .view(…)

  • For a Tensor with shape (a,b,c), the order of it’s elements are determined by a numbering system: where the first digit has a numbers, second digit has b numbers and third digit has c numbers.
  • The mapping of the elements in the new Tensor returned by .view(…) preserves this order of the original Tensor.

回答 7

让我们尝试通过以下示例了解视图:

    a=torch.range(1,16)

print(a)

    tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13., 14.,
            15., 16.])

print(a.view(-1,2))

    tensor([[ 1.,  2.],
            [ 3.,  4.],
            [ 5.,  6.],
            [ 7.,  8.],
            [ 9., 10.],
            [11., 12.],
            [13., 14.],
            [15., 16.]])

print(a.view(2,-1,4))   #3d tensor

    tensor([[[ 1.,  2.,  3.,  4.],
             [ 5.,  6.,  7.,  8.]],

            [[ 9., 10., 11., 12.],
             [13., 14., 15., 16.]]])
print(a.view(2,-1,2))

    tensor([[[ 1.,  2.],
             [ 3.,  4.],
             [ 5.,  6.],
             [ 7.,  8.]],

            [[ 9., 10.],
             [11., 12.],
             [13., 14.],
             [15., 16.]]])

print(a.view(4,-1,2))

    tensor([[[ 1.,  2.],
             [ 3.,  4.]],

            [[ 5.,  6.],
             [ 7.,  8.]],

            [[ 9., 10.],
             [11., 12.]],

            [[13., 14.],
             [15., 16.]]])

如果我们知道y,z的值,则将-1作为参数值是计算x值的一种简便方法;在3d的情况下,反之亦然;对于2d,它又是计算x值的一种简便方法知道y的值,反之亦然。

Let’s try to understand view by the following examples:

    a=torch.range(1,16)

print(a)

    tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13., 14.,
            15., 16.])

print(a.view(-1,2))

    tensor([[ 1.,  2.],
            [ 3.,  4.],
            [ 5.,  6.],
            [ 7.,  8.],
            [ 9., 10.],
            [11., 12.],
            [13., 14.],
            [15., 16.]])

print(a.view(2,-1,4))   #3d tensor

    tensor([[[ 1.,  2.,  3.,  4.],
             [ 5.,  6.,  7.,  8.]],

            [[ 9., 10., 11., 12.],
             [13., 14., 15., 16.]]])
print(a.view(2,-1,2))

    tensor([[[ 1.,  2.],
             [ 3.,  4.],
             [ 5.,  6.],
             [ 7.,  8.]],

            [[ 9., 10.],
             [11., 12.],
             [13., 14.],
             [15., 16.]]])

print(a.view(4,-1,2))

    tensor([[[ 1.,  2.],
             [ 3.,  4.]],

            [[ 5.,  6.],
             [ 7.,  8.]],

            [[ 9., 10.],
             [11., 12.]],

            [[13., 14.],
             [15., 16.]]])

-1 as an argument value is an easy way to compute the value of say x provided we know values of y, z or the other way round in case of 3d and for 2d again an easy way to compute the value of say x provided we know values of y or vice versa..


在PyTorch中保存经过训练的模型的最佳方法?

问题:在PyTorch中保存经过训练的模型的最佳方法?

我一直在寻找其他方法来在PyTorch中保存经过训练的模型。到目前为止,我发现了两种选择。

  1. 使用torch.save()保存模型,使用torch.load()加载模型。
  2. model.state_dict()保存训练的模型,model.load_state_dict()加载保存的模型。

我碰到过这种讨论,其中建议方法2优于方法1。

我的问题是,为什么选择第二种方法呢?仅仅是因为torch.nn模块具有这两个功能,我们被鼓励使用它们吗?

I was looking for alternative ways to save a trained model in PyTorch. So far, I have found two alternatives.

  1. torch.save() to save a model and torch.load() to load a model.
  2. model.state_dict() to save a trained model and model.load_state_dict() to load the saved model.

I have come across to this discussion where approach 2 is recommended over approach 1.

My question is, why the second approach is preferred? Is it only because torch.nn modules have those two function and we are encouraged to use them?


回答 0

我在他们的github仓库中找到了此页面,我将内容粘贴在这里。


推荐的模型保存方法

序列化和还原模型有两种主要方法。

第一个(推荐)仅保存和加载模型参数:

torch.save(the_model.state_dict(), PATH)

然后再:

the_model = TheModelClass(*args, **kwargs)
the_model.load_state_dict(torch.load(PATH))

第二个保存并加载整个模型:

torch.save(the_model, PATH)

然后再:

the_model = torch.load(PATH)

但是,在这种情况下,序列化的数据将绑定到所使用的特定类和确切的目录结构,因此在其他项目中使用时或经过一些严重的重构后,它可能以各种方式中断。

I’ve found this page on their github repo, I’ll just paste the content here.


Recommended approach for saving a model

There are two main approaches for serializing and restoring a model.

The first (recommended) saves and loads only the model parameters:

torch.save(the_model.state_dict(), PATH)

Then later:

the_model = TheModelClass(*args, **kwargs)
the_model.load_state_dict(torch.load(PATH))

The second saves and loads the entire model:

torch.save(the_model, PATH)

Then later:

the_model = torch.load(PATH)

However in this case, the serialized data is bound to the specific classes and the exact directory structure used, so it can break in various ways when used in other projects, or after some serious refactors.


回答 1

这取决于您想做什么。

案例1:保存模型以供您自己进行推断:保存模型,还原模型,然后将模型更改为评估模式。这样做是因为您通常在构造上具有BatchNormDropout图层,这些图层默认情况下处于训练模式:

torch.save(model.state_dict(), filepath)

#Later to restore:
model.load_state_dict(torch.load(filepath))
model.eval()

案例2:保存模型以便以后继续训练:如果您需要继续训练将要保存的模型,则需要保存的不仅仅是模型。您还需要保存优化器的状态,时期,得分等。您可以这样操作:

state = {
    'epoch': epoch,
    'state_dict': model.state_dict(),
    'optimizer': optimizer.state_dict(),
    ...
}
torch.save(state, filepath)

要恢复训练,您将执行以下操作:state = torch.load(filepath),然后恢复每个对象的状态,如下所示:

model.load_state_dict(state['state_dict'])
optimizer.load_state_dict(state['optimizer'])

由于您正在恢复训练,因此在加载时恢复状态后,请勿model.eval()再调用。

案例3:无法访问您的代码的其他人可以使用的模型:在Tensorflow中,您可以创建一个.pb文件,该文件定义了体系结构和模型权重。这非常方便,尤其是在使用时Tensorflow serve。在Pytorch中执行此操作的等效方法是:

torch.save(model, filepath)

# Then later:
model = torch.load(filepath)

这种方式仍然不能保证安全,并且由于pytorch仍在进行大量更改,因此我不建议这样做。

It depends on what you want to do.

Case # 1: Save the model to use it yourself for inference: You save the model, you restore it, and then you change the model to evaluation mode. This is done because you usually have BatchNorm and Dropout layers that by default are in train mode on construction:

torch.save(model.state_dict(), filepath)

#Later to restore:
model.load_state_dict(torch.load(filepath))
model.eval()

Case # 2: Save model to resume training later: If you need to keep training the model that you are about to save, you need to save more than just the model. You also need to save the state of the optimizer, epochs, score, etc. You would do it like this:

state = {
    'epoch': epoch,
    'state_dict': model.state_dict(),
    'optimizer': optimizer.state_dict(),
    ...
}
torch.save(state, filepath)

To resume training you would do things like: state = torch.load(filepath), and then, to restore the state of each individual object, something like this:

model.load_state_dict(state['state_dict'])
optimizer.load_state_dict(state['optimizer'])

Since you are resuming training, DO NOT call model.eval() once you restore the states when loading.

Case # 3: Model to be used by someone else with no access to your code: In Tensorflow you can create a .pb file that defines both the architecture and the weights of the model. This is very handy, specially when using Tensorflow serve. The equivalent way to do this in Pytorch would be:

torch.save(model, filepath)

# Then later:
model = torch.load(filepath)

This way is still not bullet proof and since pytorch is still undergoing a lot of changes, I wouldn’t recommend it.


回答 2

泡菜的Python库实现二进制协议的序列化和反序列化Python对象。

当您import torch(或当您使用PyTorch)时,它将import pickle为您而您不需要调用pickle.dump()pickle.load()直接调用,这是保存和加载对象的方法。

事实上,torch.save()torch.load()将包裹pickle.dump()pickle.load()为您服务。

一个state_dict对方的回答值得提及的只是几个音符。

什么state_dict我们有内部PyTorch?实际上有两个state_dict秒。

PyTorch模型torch.nn.Module具有model.parameters()调用以获取可学习的参数(w和b)。这些可学习的参数,一旦被随机设置,将随着我们的学习而随着时间而更新。可学习的参数是第一个state_dict

第二个state_dict是优化器状态字典。您还记得优化器用于改善我们的可学习参数。但是优化器state_dict是固定的。在那没什么可学的。

由于state_dict对象是Python字典,因此可以轻松地保存,更新,更改和还原对象,从而为PyTorch模型和优化器增加了很多模块化。

让我们创建一个超级简单的模型来解释这一点:

import torch
import torch.optim as optim

model = torch.nn.Linear(5, 2)

# Initialize optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

print("Model's state_dict:")
for param_tensor in model.state_dict():
    print(param_tensor, "\t", model.state_dict()[param_tensor].size())

print("Model weight:")    
print(model.weight)

print("Model bias:")    
print(model.bias)

print("---")
print("Optimizer's state_dict:")
for var_name in optimizer.state_dict():
    print(var_name, "\t", optimizer.state_dict()[var_name])

此代码将输出以下内容:

Model's state_dict:
weight   torch.Size([2, 5])
bias     torch.Size([2])
Model weight:
Parameter containing:
tensor([[ 0.1328,  0.1360,  0.1553, -0.1838, -0.0316],
        [ 0.0479,  0.1760,  0.1712,  0.2244,  0.1408]], requires_grad=True)
Model bias:
Parameter containing:
tensor([ 0.4112, -0.0733], requires_grad=True)
---
Optimizer's state_dict:
state    {}
param_groups     [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [140695321443856, 140695321443928]}]

请注意,这是最小模型。您可以尝试添加顺序堆栈

model = torch.nn.Sequential(
          torch.nn.Linear(D_in, H),
          torch.nn.Conv2d(A, B, C)
          torch.nn.Linear(H, D_out),
        )

请注意,只有具有可学习参数的层(卷积层,线性层等)和已注册的缓冲区(batchnorm层)才在模型的中具有条目state_dict

不可学习的东西属于优化器对象state_dict,该对象包含有关优化器状态以及所用超参数的信息。

故事的其余部分是相同的。在推理阶段(这是我们训练后使用模型的阶段)进行预测;我们会根据所学的参数进行预测。因此,为了进行推断,我们只需要保存参数model.state_dict()

torch.save(model.state_dict(), filepath)

并在以后使用model.load_state_dict(torch.load(filepath))model.eval()

注意:不要忘了最后一行,model.eval()在加载模型之后,这是至关重要的。

也不要试图保存torch.save(model.parameters(), filepath)。该model.parameters()只是生成对象。

另一方面,torch.save(model, filepath)保存模型对象本身,但请记住,模型没有优化程序state_dict。检查@Jadiel de Armas的其他出色答案,以保存优化程序的状态字典。

The pickle Python library implements binary protocols for serializing and de-serializing a Python object.

When you import torch (or when you use PyTorch) it will import pickle for you and you don’t need to call pickle.dump() and pickle.load() directly, which are the methods to save and to load the object.

In fact, torch.save() and torch.load() will wrap pickle.dump() and pickle.load() for you.

A state_dict the other answer mentioned deserves just few more notes.

What state_dict do we have inside PyTorch? There are actually two state_dicts.

The PyTorch model is torch.nn.Module has model.parameters() call to get learnable parameters (w and b). These learnable parameters, once randomly set, will update over time as we learn. Learnable parameters are the first state_dict.

The second state_dict is the optimizer state dict. You recall that the optimizer is used to improve our learnable parameters. But the optimizer state_dict is fixed. Nothing to learn in there.

Because state_dict objects are Python dictionaries, they can be easily saved, updated, altered, and restored, adding a great deal of modularity to PyTorch models and optimizers.

Let’s create a super simple model to explain this:

import torch
import torch.optim as optim

model = torch.nn.Linear(5, 2)

# Initialize optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

print("Model's state_dict:")
for param_tensor in model.state_dict():
    print(param_tensor, "\t", model.state_dict()[param_tensor].size())

print("Model weight:")    
print(model.weight)

print("Model bias:")    
print(model.bias)

print("---")
print("Optimizer's state_dict:")
for var_name in optimizer.state_dict():
    print(var_name, "\t", optimizer.state_dict()[var_name])

This code will output the following:

Model's state_dict:
weight   torch.Size([2, 5])
bias     torch.Size([2])
Model weight:
Parameter containing:
tensor([[ 0.1328,  0.1360,  0.1553, -0.1838, -0.0316],
        [ 0.0479,  0.1760,  0.1712,  0.2244,  0.1408]], requires_grad=True)
Model bias:
Parameter containing:
tensor([ 0.4112, -0.0733], requires_grad=True)
---
Optimizer's state_dict:
state    {}
param_groups     [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [140695321443856, 140695321443928]}]

Note this is a minimal model. You may try to add stack of sequential

model = torch.nn.Sequential(
          torch.nn.Linear(D_in, H),
          torch.nn.Conv2d(A, B, C)
          torch.nn.Linear(H, D_out),
        )

Note that only layers with learnable parameters (convolutional layers, linear layers, etc.) and registered buffers (batchnorm layers) have entries in the model’s state_dict.

Non learnable things, belong to the optimizer object state_dict, which contains information about the optimizer’s state, as well as the hyperparameters used.

The rest of the story is the same; in the inference phase (this is a phase when we use the model after training) for predicting; we do predict based on the parameters we learned. So for the inference, we just need to save the parameters model.state_dict().

torch.save(model.state_dict(), filepath)

And to use later model.load_state_dict(torch.load(filepath)) model.eval()

Note: Don’t forget the last line model.eval() this is crucial after loading the model.

Also don’t try to save torch.save(model.parameters(), filepath). The model.parameters() is just the generator object.

On the other side, torch.save(model, filepath) saves the model object itself, but keep in mind the model doesn’t have the optimizer’s state_dict. Check the other excellent answer by @Jadiel de Armas to save the optimizer’s state dict.


回答 3

常见的PyTorch约定是使用.pt或.pth文件扩展名保存模型。

保存/加载整个模型 保存:

path = "username/directory/lstmmodelgpu.pth"
torch.save(trainer, path)

加载:

模型类必须在某处定义

model = torch.load(PATH)
model.eval()

A common PyTorch convention is to save models using either a .pt or .pth file extension.

Save/Load Entire Model Save:

path = "username/directory/lstmmodelgpu.pth"
torch.save(trainer, path)

Load:

Model class must be defined somewhere

model = torch.load(PATH)
model.eval()

回答 4

如果您要保存模型并希望以后继续训练,请执行以下操作:

单个GPU: 保存:

state = {
        'epoch': epoch,
        'state_dict': model.state_dict(),
        'optimizer': optimizer.state_dict(),
}
savepath='checkpoint.t7'
torch.save(state,savepath)

加载:

checkpoint = torch.load('checkpoint.t7')
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']

多GPU: 保存

state = {
        'epoch': epoch,
        'state_dict': model.module.state_dict(),
        'optimizer': optimizer.state_dict(),
}
savepath='checkpoint.t7'
torch.save(state,savepath)

加载:

checkpoint = torch.load('checkpoint.t7')
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']

#Don't call DataParallel before loading the model otherwise you will get an error

model = nn.DataParallel(model) #ignore the line if you want to load on Single GPU

If you want to save the model and wants to resume the training later:

Single GPU: Save:

state = {
        'epoch': epoch,
        'state_dict': model.state_dict(),
        'optimizer': optimizer.state_dict(),
}
savepath='checkpoint.t7'
torch.save(state,savepath)

Load:

checkpoint = torch.load('checkpoint.t7')
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']

Multiple GPU: Save

state = {
        'epoch': epoch,
        'state_dict': model.module.state_dict(),
        'optimizer': optimizer.state_dict(),
}
savepath='checkpoint.t7'
torch.save(state,savepath)

Load:

checkpoint = torch.load('checkpoint.t7')
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']

#Don't call DataParallel before loading the model otherwise you will get an error

model = nn.DataParallel(model) #ignore the line if you want to load on Single GPU

Pyro-基于Python和PyTorch的深度通用概率编程

Pyro是一个构建在PyTorch之上的灵活的、可伸缩的深度概率编程库。值得注意的是,它的设计考虑到了以下原则:
  • 通用的:pyro是一种通用的PPL-它可以表示任何可计算的概率分布
  • 可扩展:pyro可扩展到大型数据集,与手写代码相比开销很小
  • 极小值:pyro是灵活的,可维护的。它是由强大的、可组合的抽象组成的一个小核心来实现的
  • 灵活性:PYRO的目标是在需要的时候实现自动化,在需要的时候进行控制。这是通过高级抽象来表示生成和推理模型来实现的,同时允许专家方便地访问自定义推理

Pyro最初是由Uber AI开发的,现在由社区贡献者积极维护,包括在Broad Institute2019年,PyrobecameLinux基金会的一个项目,这是一个在开源软件、开放标准、开放数据和开放硬件方面进行协作的中立空间

有关Pyro的高级动机的更多信息,请查看我们的launch blog post有关其他博客帖子,请查看experimental designtime-to-event modeling在Pyro中

正在安装

安装稳定的Pyro版本

使用pip安装:

Pyro支持Python 3.6+

pip install pyro-ppl

从源安装:

git clone git@github.com:pyro-ppl/pyro.git
cd pyro
git checkout master  # master is pinned to the latest release
pip install .

使用额外的软件包进行安装:

要安装运行中包含的概率模型所需的依赖项,请执行以下操作examples/tutorials目录,请使用以下命令:

pip install pyro-ppl[extras]

确保这些模型来自同一版本的Pyro source code因为您已经安装了

安装Pyro dev分支

对于最新的功能,您可以从源代码安装Pyro

使用pip安装Pyro:

pip install git+https://github.com/pyro-ppl/pyro.git

或者,使用extras依赖项来运行包含在examples/tutorials目录:

pip install git+https://github.com/pyro-ppl/pyro.git#egg=project[extras]

从源安装Pyro:

git clone https://github.com/pyro-ppl/pyro
cd pyro
pip install .  # pip install .[extras] for running models in examples/tutorials

从Docker容器运行Pyro

请参阅说明here

引文

如果您使用Pyro,请考虑引用:

@article{bingham2019pyro,
  author    = {Eli Bingham and
               Jonathan P. Chen and
               Martin Jankowiak and
               Fritz Obermeyer and
               Neeraj Pradhan and
               Theofanis Karaletsos and
               Rohit Singh and
               Paul A. Szerlip and
               Paul Horsfall and
               Noah D. Goodman},
  title     = {Pyro: Deep Universal Probabilistic Programming},
  journal   = {J. Mach. Learn. Res.},
  volume    = {20},
  pages     = {28:1--28:6},
  year      = {2019},
  url       = {http://jmlr.org/papers/v20/18-403.html}
}

DeepLearningZeroToAll-TensorFlow基础教程实验

这是TensorFlow基础教程(韩文)中介绍的实验代码,地址为https://youtu.be/BS6O0zOGX4E(我们还计划用英语录制视频。)

这是正在进行的工作,可能有错误。然而,我们呼吁您的意见和拉请求。请查看我们的风格指南:

实验幻灯片:

我们欢迎您对幻灯片发表意见

文件命名规则:

  • kLab-XX-X-[名称].py:Kera实验室代码
  • Lab-XX-X-[名称].py:TensorFlow实验室代码
  • mxlab-XX-X-[名称].py:MXNet实验室代码

安装要求

pip install -r requirements.txt

运行测试和自动寻呼8

TODO:需要添加更多测试用例

python -m unittest discover -s tests;

# http://stackoverflow.com/questions/14328406/
pip install autopep8 # if you haven't install
autopep8 . --recursive --in-place --pep8-passes 2000 --verbose

自动创建Requirements.txt

pip install pipreqs

pipreqs /path/to/project

http://stackoverflow.com/questions/31684375

投稿/评论

我们始终欢迎您的意见和请求

参考实现

PySyft-用于使用看不到的数据回答问题的库

用于对数据进行计算的库
你不是自己的,也看不到


PySyft是一个Python库,用于安全和私有的深度学习

PySyft将私有数据与模型训练解耦,使用Federated LearningDifferential Privacy和加密计算(如Multi-Party Computation (MPC)Homomorphic Encryption (HE))在主要的深度学习框架(如PyTorch和TensorFlow)中。加入到这场运动中来吧Slack


大多数软件库允许您计算您拥有的信息,并查看您控制的机器的内部情况。但是,这意味着在没有首先获得(至少是部分)信息所有权的情况下,您不能对该信息进行计算。这还意味着,如果不先获得对机器的控制,您就不能使用这些机器进行计算。这对人类协作非常有限,并且系统地推动了数据的集中化,因为如果不先将数据全部放在一个(中心)位置,您就无法处理大量数据

Syft生态系统试图改变这一系统,允许您编写软件,在您无法(完全)控制的计算机上计算您不拥有的信息。这不仅包括云中的服务器,还包括个人台式机、笔记本电脑、手机、网站和边缘设备。无论您的数据想要存放在您拥有的任何地方,Syft生态系统都可以帮助您将其保留在那里,同时允许将其私下用于计算

单声道回购🚝

此repo包含多个协同工作的项目,即PySyft和PyGrid。很快就会添加PyGrid,同时这是目录结构

OpenMined/PySyft
├── README.md   <-- You are here 📌
└── packages
    ├── grid    <-- The Grid droids from OpenMined/PyGrid
    └── syft    <-- The Syft droids you are looking for 👋🏽

注意事项更改整个文件夹结构可能会导致一些小问题。如果您发现了一个,请通知我们或打开公关

PySyft

PySyft是Syft生态系统的核心。它有两个主要目的。您可以使用PySyft执行两种类型的计算:

  1. 动态:直接计算您看不到的数据
  2. 静电:创建静电计算图,以后可以在不同的计算上部署/扩展

这个PyGrid library用作大规模管理和部署PySyft的API。它还允许您使用以下Syft工作器库扩展PySyft,以便在Web、移动和边缘设备上进行联合学习:

  • KotlinSyft(Android)
  • SwiftSyft(IOS)
  • syft.js(JavaScript)
  • PySyft(Python,您可以将PySyft本身用作这些“FL Worker库”之一)

但是,Syft生态系统只关注跨这些语言的一致对象序列化/反序列化、核心抽象和算法设计/执行。仅靠这些库不能将您与现实世界中的数据连接起来。Syft生态系统由网格生态系统提供支持,网格生态系统重点关注运行真实系统以计算和处理数据(如数据合规性Web应用程序)的部署、可扩展性和其他其他问题

  • PySyft是定义对象、抽象和算法的库
  • PyGrid是让您可以在真正的机构中部署它们的平台
  • PyGrid Admin是允许数据所有者管理其PyGrid部署的UI

有关PySyft的更详细说明,请参阅white paper on Arxiv

YouTube上的视频也解释了PySyft:

安装前

PySyft在PyPI和CONDA上可用

我们建议您在如下所示的虚拟环境中安装PySyftConda,因为它易于使用。如果您使用的是Windows,我们建议您安装Anaconda and using the Anaconda
Prompt
要从命令行工作,请执行以下操作

$ conda create -n pysyft python=3.9
$ conda activate pysyft
$ conda install jupyter notebook

版本支持

我们支持Linux操作系统MacOS窗口以及以下Python和Torch版本。旧版本可能可以运行,但我们已停止测试和支持它们

PY/手电筒 1.6 1.7 1.8
3.7
3.8
3.9

安装

管道

$ pip install syft

这将根据需要自动安装PyTorch和其他依赖项,以运行示例和教程。有关从源代码构建的更多信息,请参阅贡献指南here

文档

马上就来!在此之前,请查看以下示例

示例

可以找到示例的综合列表here

这些教程涵盖了各种用于数据科学和机器学习的Python库

所有示例都可以通过启动Jupyter笔记本并导航到examples文件夹

$ jupyter notebook

二重唱

Duet是PySyft中的一个点对点工具,它为数据所有者提供了一个研究友好的API来私下公开他们的数据,而数据科学家可以通过零知识访问控制机制访问或操作所有者端的数据。它旨在降低研究和隐私保护机制之间的屏障,这样就可以在目前无法访问或严格控制的数据上取得科学进展。使用Duet的主要好处是允许您开始使用PySyft,而无需管理完整的PyGrid部署。这是使用Syft的最简单途径,无需安装任何东西(Syft除外😉)

您可以找到所有Duet examplesexamples/duet文件夹

贡献

投稿人指南可以在here它涵盖了现在开始向PySyft贡献代码所需了解的所有内容

此外,加入快速增长的超过12,000人的社区SlackSlake社区非常友好,能够快速回答有关PySyft使用和开发的问题!

免责声明

这个软件是测试版。使用风险自负

关于0.2.x的快速说明

PySyft 0.2.x代码库现在位于自己的分支中here,但OpenMines不会对此版本范围提供官方支持。我们已经编制了一份名单FAQs与此版本相关。_

支持

要获得使用此库的支持,请加入#支持#松弛的渠道。Click here to join our Slack community!

组织贡献

我们非常感谢以下组织对PySyft的贡献!

  • |

许可证

Apache License 2.0