在PyTorch中保存经过训练的模型的最佳方法?

问题:在PyTorch中保存经过训练的模型的最佳方法?

我一直在寻找其他方法来在PyTorch中保存经过训练的模型。到目前为止,我发现了两种选择。

  1. 使用torch.save()保存模型,使用torch.load()加载模型。
  2. model.state_dict()保存训练的模型,model.load_state_dict()加载保存的模型。

我碰到过这种讨论,其中建议方法2优于方法1。

我的问题是,为什么选择第二种方法呢?仅仅是因为torch.nn模块具有这两个功能,我们被鼓励使用它们吗?

I was looking for alternative ways to save a trained model in PyTorch. So far, I have found two alternatives.

  1. torch.save() to save a model and torch.load() to load a model.
  2. model.state_dict() to save a trained model and model.load_state_dict() to load the saved model.

I have come across to this discussion where approach 2 is recommended over approach 1.

My question is, why the second approach is preferred? Is it only because torch.nn modules have those two function and we are encouraged to use them?


回答 0

我在他们的github仓库中找到了此页面,我将内容粘贴在这里。


推荐的模型保存方法

序列化和还原模型有两种主要方法。

第一个(推荐)仅保存和加载模型参数:

torch.save(the_model.state_dict(), PATH)

然后再:

the_model = TheModelClass(*args, **kwargs)
the_model.load_state_dict(torch.load(PATH))

第二个保存并加载整个模型:

torch.save(the_model, PATH)

然后再:

the_model = torch.load(PATH)

但是,在这种情况下,序列化的数据将绑定到所使用的特定类和确切的目录结构,因此在其他项目中使用时或经过一些严重的重构后,它可能以各种方式中断。

I’ve found this page on their github repo, I’ll just paste the content here.


Recommended approach for saving a model

There are two main approaches for serializing and restoring a model.

The first (recommended) saves and loads only the model parameters:

torch.save(the_model.state_dict(), PATH)

Then later:

the_model = TheModelClass(*args, **kwargs)
the_model.load_state_dict(torch.load(PATH))

The second saves and loads the entire model:

torch.save(the_model, PATH)

Then later:

the_model = torch.load(PATH)

However in this case, the serialized data is bound to the specific classes and the exact directory structure used, so it can break in various ways when used in other projects, or after some serious refactors.


回答 1

这取决于您想做什么。

案例1:保存模型以供您自己进行推断:保存模型,还原模型,然后将模型更改为评估模式。这样做是因为您通常在构造上具有BatchNormDropout图层,这些图层默认情况下处于训练模式:

torch.save(model.state_dict(), filepath)

#Later to restore:
model.load_state_dict(torch.load(filepath))
model.eval()

案例2:保存模型以便以后继续训练:如果您需要继续训练将要保存的模型,则需要保存的不仅仅是模型。您还需要保存优化器的状态,时期,得分等。您可以这样操作:

state = {
    'epoch': epoch,
    'state_dict': model.state_dict(),
    'optimizer': optimizer.state_dict(),
    ...
}
torch.save(state, filepath)

要恢复训练,您将执行以下操作:state = torch.load(filepath),然后恢复每个对象的状态,如下所示:

model.load_state_dict(state['state_dict'])
optimizer.load_state_dict(state['optimizer'])

由于您正在恢复训练,因此在加载时恢复状态后,请勿model.eval()再调用。

案例3:无法访问您的代码的其他人可以使用的模型:在Tensorflow中,您可以创建一个.pb文件,该文件定义了体系结构和模型权重。这非常方便,尤其是在使用时Tensorflow serve。在Pytorch中执行此操作的等效方法是:

torch.save(model, filepath)

# Then later:
model = torch.load(filepath)

这种方式仍然不能保证安全,并且由于pytorch仍在进行大量更改,因此我不建议这样做。

It depends on what you want to do.

Case # 1: Save the model to use it yourself for inference: You save the model, you restore it, and then you change the model to evaluation mode. This is done because you usually have BatchNorm and Dropout layers that by default are in train mode on construction:

torch.save(model.state_dict(), filepath)

#Later to restore:
model.load_state_dict(torch.load(filepath))
model.eval()

Case # 2: Save model to resume training later: If you need to keep training the model that you are about to save, you need to save more than just the model. You also need to save the state of the optimizer, epochs, score, etc. You would do it like this:

state = {
    'epoch': epoch,
    'state_dict': model.state_dict(),
    'optimizer': optimizer.state_dict(),
    ...
}
torch.save(state, filepath)

To resume training you would do things like: state = torch.load(filepath), and then, to restore the state of each individual object, something like this:

model.load_state_dict(state['state_dict'])
optimizer.load_state_dict(state['optimizer'])

Since you are resuming training, DO NOT call model.eval() once you restore the states when loading.

Case # 3: Model to be used by someone else with no access to your code: In Tensorflow you can create a .pb file that defines both the architecture and the weights of the model. This is very handy, specially when using Tensorflow serve. The equivalent way to do this in Pytorch would be:

torch.save(model, filepath)

# Then later:
model = torch.load(filepath)

This way is still not bullet proof and since pytorch is still undergoing a lot of changes, I wouldn’t recommend it.


回答 2

泡菜的Python库实现二进制协议的序列化和反序列化Python对象。

当您import torch(或当您使用PyTorch)时,它将import pickle为您而您不需要调用pickle.dump()pickle.load()直接调用,这是保存和加载对象的方法。

事实上,torch.save()torch.load()将包裹pickle.dump()pickle.load()为您服务。

一个state_dict对方的回答值得提及的只是几个音符。

什么state_dict我们有内部PyTorch?实际上有两个state_dict秒。

PyTorch模型torch.nn.Module具有model.parameters()调用以获取可学习的参数(w和b)。这些可学习的参数,一旦被随机设置,将随着我们的学习而随着时间而更新。可学习的参数是第一个state_dict

第二个state_dict是优化器状态字典。您还记得优化器用于改善我们的可学习参数。但是优化器state_dict是固定的。在那没什么可学的。

由于state_dict对象是Python字典,因此可以轻松地保存,更新,更改和还原对象,从而为PyTorch模型和优化器增加了很多模块化。

让我们创建一个超级简单的模型来解释这一点:

import torch
import torch.optim as optim

model = torch.nn.Linear(5, 2)

# Initialize optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

print("Model's state_dict:")
for param_tensor in model.state_dict():
    print(param_tensor, "\t", model.state_dict()[param_tensor].size())

print("Model weight:")    
print(model.weight)

print("Model bias:")    
print(model.bias)

print("---")
print("Optimizer's state_dict:")
for var_name in optimizer.state_dict():
    print(var_name, "\t", optimizer.state_dict()[var_name])

此代码将输出以下内容:

Model's state_dict:
weight   torch.Size([2, 5])
bias     torch.Size([2])
Model weight:
Parameter containing:
tensor([[ 0.1328,  0.1360,  0.1553, -0.1838, -0.0316],
        [ 0.0479,  0.1760,  0.1712,  0.2244,  0.1408]], requires_grad=True)
Model bias:
Parameter containing:
tensor([ 0.4112, -0.0733], requires_grad=True)
---
Optimizer's state_dict:
state    {}
param_groups     [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [140695321443856, 140695321443928]}]

请注意,这是最小模型。您可以尝试添加顺序堆栈

model = torch.nn.Sequential(
          torch.nn.Linear(D_in, H),
          torch.nn.Conv2d(A, B, C)
          torch.nn.Linear(H, D_out),
        )

请注意,只有具有可学习参数的层(卷积层,线性层等)和已注册的缓冲区(batchnorm层)才在模型的中具有条目state_dict

不可学习的东西属于优化器对象state_dict,该对象包含有关优化器状态以及所用超参数的信息。

故事的其余部分是相同的。在推理阶段(这是我们训练后使用模型的阶段)进行预测;我们会根据所学的参数进行预测。因此,为了进行推断,我们只需要保存参数model.state_dict()

torch.save(model.state_dict(), filepath)

并在以后使用model.load_state_dict(torch.load(filepath))model.eval()

注意:不要忘了最后一行,model.eval()在加载模型之后,这是至关重要的。

也不要试图保存torch.save(model.parameters(), filepath)。该model.parameters()只是生成对象。

另一方面,torch.save(model, filepath)保存模型对象本身,但请记住,模型没有优化程序state_dict。检查@Jadiel de Armas的其他出色答案,以保存优化程序的状态字典。

The pickle Python library implements binary protocols for serializing and de-serializing a Python object.

When you import torch (or when you use PyTorch) it will import pickle for you and you don’t need to call pickle.dump() and pickle.load() directly, which are the methods to save and to load the object.

In fact, torch.save() and torch.load() will wrap pickle.dump() and pickle.load() for you.

A state_dict the other answer mentioned deserves just few more notes.

What state_dict do we have inside PyTorch? There are actually two state_dicts.

The PyTorch model is torch.nn.Module has model.parameters() call to get learnable parameters (w and b). These learnable parameters, once randomly set, will update over time as we learn. Learnable parameters are the first state_dict.

The second state_dict is the optimizer state dict. You recall that the optimizer is used to improve our learnable parameters. But the optimizer state_dict is fixed. Nothing to learn in there.

Because state_dict objects are Python dictionaries, they can be easily saved, updated, altered, and restored, adding a great deal of modularity to PyTorch models and optimizers.

Let’s create a super simple model to explain this:

import torch
import torch.optim as optim

model = torch.nn.Linear(5, 2)

# Initialize optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

print("Model's state_dict:")
for param_tensor in model.state_dict():
    print(param_tensor, "\t", model.state_dict()[param_tensor].size())

print("Model weight:")    
print(model.weight)

print("Model bias:")    
print(model.bias)

print("---")
print("Optimizer's state_dict:")
for var_name in optimizer.state_dict():
    print(var_name, "\t", optimizer.state_dict()[var_name])

This code will output the following:

Model's state_dict:
weight   torch.Size([2, 5])
bias     torch.Size([2])
Model weight:
Parameter containing:
tensor([[ 0.1328,  0.1360,  0.1553, -0.1838, -0.0316],
        [ 0.0479,  0.1760,  0.1712,  0.2244,  0.1408]], requires_grad=True)
Model bias:
Parameter containing:
tensor([ 0.4112, -0.0733], requires_grad=True)
---
Optimizer's state_dict:
state    {}
param_groups     [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [140695321443856, 140695321443928]}]

Note this is a minimal model. You may try to add stack of sequential

model = torch.nn.Sequential(
          torch.nn.Linear(D_in, H),
          torch.nn.Conv2d(A, B, C)
          torch.nn.Linear(H, D_out),
        )

Note that only layers with learnable parameters (convolutional layers, linear layers, etc.) and registered buffers (batchnorm layers) have entries in the model’s state_dict.

Non learnable things, belong to the optimizer object state_dict, which contains information about the optimizer’s state, as well as the hyperparameters used.

The rest of the story is the same; in the inference phase (this is a phase when we use the model after training) for predicting; we do predict based on the parameters we learned. So for the inference, we just need to save the parameters model.state_dict().

torch.save(model.state_dict(), filepath)

And to use later model.load_state_dict(torch.load(filepath)) model.eval()

Note: Don’t forget the last line model.eval() this is crucial after loading the model.

Also don’t try to save torch.save(model.parameters(), filepath). The model.parameters() is just the generator object.

On the other side, torch.save(model, filepath) saves the model object itself, but keep in mind the model doesn’t have the optimizer’s state_dict. Check the other excellent answer by @Jadiel de Armas to save the optimizer’s state dict.


回答 3

常见的PyTorch约定是使用.pt或.pth文件扩展名保存模型。

保存/加载整个模型 保存:

path = "username/directory/lstmmodelgpu.pth"
torch.save(trainer, path)

加载:

模型类必须在某处定义

model = torch.load(PATH)
model.eval()

A common PyTorch convention is to save models using either a .pt or .pth file extension.

Save/Load Entire Model Save:

path = "username/directory/lstmmodelgpu.pth"
torch.save(trainer, path)

Load:

Model class must be defined somewhere

model = torch.load(PATH)
model.eval()

回答 4

如果您要保存模型并希望以后继续训练,请执行以下操作:

单个GPU: 保存:

state = {
        'epoch': epoch,
        'state_dict': model.state_dict(),
        'optimizer': optimizer.state_dict(),
}
savepath='checkpoint.t7'
torch.save(state,savepath)

加载:

checkpoint = torch.load('checkpoint.t7')
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']

多GPU: 保存

state = {
        'epoch': epoch,
        'state_dict': model.module.state_dict(),
        'optimizer': optimizer.state_dict(),
}
savepath='checkpoint.t7'
torch.save(state,savepath)

加载:

checkpoint = torch.load('checkpoint.t7')
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']

#Don't call DataParallel before loading the model otherwise you will get an error

model = nn.DataParallel(model) #ignore the line if you want to load on Single GPU

If you want to save the model and wants to resume the training later:

Single GPU: Save:

state = {
        'epoch': epoch,
        'state_dict': model.state_dict(),
        'optimizer': optimizer.state_dict(),
}
savepath='checkpoint.t7'
torch.save(state,savepath)

Load:

checkpoint = torch.load('checkpoint.t7')
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']

Multiple GPU: Save

state = {
        'epoch': epoch,
        'state_dict': model.module.state_dict(),
        'optimizer': optimizer.state_dict(),
}
savepath='checkpoint.t7'
torch.save(state,savepath)

Load:

checkpoint = torch.load('checkpoint.t7')
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']

#Don't call DataParallel before loading the model otherwise you will get an error

model = nn.DataParallel(model) #ignore the line if you want to load on Single GPU