为什么我们需要在PyTorch中调用zero_grad（）？

Question 1

zero_grad()训练期间需要调用该方法。但是文档不是很有帮助

|  zero_grad(self)
|      Sets gradients of all model parameters to zero.

为什么我们需要调用此方法？

Question 2

The method zero_grad() needs to be called during training. But the documentation is not very helpful

|  zero_grad(self)
|      Sets gradients of all model parameters to zero.

Why do we need to call this method?

Question 3

在中，我们需要在开始进行反向传播之前将梯度设置为零，因为PyTorch会在随后的反向传递中累积梯度。在训练RNN时这很方便。因此，默认操作是在每次调用时累积（即求和）梯度loss.backward()。

因此，理想情况下，当您开始训练循环时，应该zero out the gradients正确进行参数更新。否则，梯度将指向预期方向以外的其他方向，即朝向最小值（或最大化，如果达到最大化目标）。

这是一个简单的示例：

import torch
from torch.autograd import Variable
import torch.optim as optim

def linear_model(x, W, b):
    return torch.matmul(x, W) + b

data, targets = ...

W = Variable(torch.randn(4, 3), requires_grad=True)
b = Variable(torch.randn(3), requires_grad=True)

optimizer = optim.Adam([W, b])

for sample, target in zip(data, targets):
    # clear out the gradients of all Variables 
    # in this optimizer (i.e. W, b)
    optimizer.zero_grad()
    output = linear_model(sample, W, b)
    loss = (output - target) ** 2
    loss.backward()
    optimizer.step()

或者，如果您要进行香草梯度下降，则：

W = Variable(torch.randn(4, 3), requires_grad=True)
b = Variable(torch.randn(3), requires_grad=True)

for sample, target in zip(data, targets):
    # clear out the gradients of Variables 
    # (i.e. W, b)
    W.grad.data.zero_()
    b.grad.data.zero_()

    output = linear_model(sample, W, b)
    loss = (output - target) ** 2
    loss.backward()

    W -= learning_rate * W.grad.data
    b -= learning_rate * b.grad.data

注意：当在张量上调用时，会发生梯度的累积（即求和）。.backward()loss

Question 4

In , we need to set the gradients to zero before starting to do backpropragation because PyTorch accumulates the gradients on subsequent backward passes. This is convenient while training RNNs. So, the default action is to accumulate (i.e. sum) the gradients on every loss.backward() call.

Because of this, when you start your training loop, ideally you should zero out the gradients so that you do the parameter update correctly. Else the gradient would point in some other direction than the intended direction towards the minimum (or maximum, in case of maximization objectives).

Here is a simple example:

import torch
from torch.autograd import Variable
import torch.optim as optim

def linear_model(x, W, b):
    return torch.matmul(x, W) + b

data, targets = ...

W = Variable(torch.randn(4, 3), requires_grad=True)
b = Variable(torch.randn(3), requires_grad=True)

optimizer = optim.Adam([W, b])

for sample, target in zip(data, targets):
    # clear out the gradients of all Variables 
    # in this optimizer (i.e. W, b)
    optimizer.zero_grad()
    output = linear_model(sample, W, b)
    loss = (output - target) ** 2
    loss.backward()
    optimizer.step()

Alternatively, if you’re doing a vanilla gradient descent, then:

W = Variable(torch.randn(4, 3), requires_grad=True)
b = Variable(torch.randn(3), requires_grad=True)

for sample, target in zip(data, targets):
    # clear out the gradients of Variables 
    # (i.e. W, b)
    W.grad.data.zero_()
    b.grad.data.zero_()

    output = linear_model(sample, W, b)
    loss = (output - target) ** 2
    loss.backward()

    W -= learning_rate * W.grad.data
    b -= learning_rate * b.grad.data

Note: The accumulation (i.e. sum) of gradients happen when .backward() is called on the loss tensor.

Question 5

如果您使用渐变方法来减少错误（或损失），zero_grad（）将重新启动循环而不会损失上一步

如果您不使用zero_grad（），那么损失将减少而不是按需增加

例如，如果您使用zero_grad（），则会发现以下输出：

model training loss is 1.5
model training loss is 1.4
model training loss is 1.3
model training loss is 1.2

如果不使用zero_grad（），则会发现以下输出：

model training loss is 1.4
model training loss is 1.9
model training loss is 2
model training loss is 2.8
model training loss is 3.5

Question 6

zero_grad() is restart looping without losses from last step if you use the gradient method for decreasing the error (or losses)

if you don’t use zero_grad() the loss will be decrease not increase as require

for example if you use zero_grad() you will find following output :

model training loss is 1.5
model training loss is 1.4
model training loss is 1.3
model training loss is 1.2

if you don’t use zero_grad() you will find following output :

model training loss is 1.4
model training loss is 1.9
model training loss is 2
model training loss is 2.8
model training loss is 3.5

为什么我们需要在PyTorch中调用zero_grad（）？

问题：为什么我们需要在PyTorch中调用zero_grad（）？

回答 0

回答 1

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

7行代码 Python热力图可视化分析缺失数据处理

Python 流程图 — 一键转化代码为流程图

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

我们什么时候应该调用multiprocessing.Pool.join？

如何为容器对象实现iter （self）（Python）

Pandas DataFrame到字典列表

SQLAlchemy IN子句

numpy数组的argmax返回非固定索引

Python 一行语句算出每个省的面积—geopandas

为什么我们需要在PyTorch中调用zero_grad（）？

问题：为什么我们需要在PyTorch中调用zero_grad（）？

回答 0

回答 1

相关文章

排行榜展示

文章展示