标签归档:tensor

PyTorch中的“视图”方法如何工作?

问题:PyTorch中的“视图”方法如何工作?

我对方法感到困惑 view()对以下代码片段中。

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool  = nn.MaxPool2d(2,2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1   = nn.Linear(16*5*5, 120)
        self.fc2   = nn.Linear(120, 84)
        self.fc3   = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

我的困惑是关于以下几行。

x = x.view(-1, 16*5*5)

是什么 tensor.view()函数有作用?我已经在很多地方看到了它的用法,但是我不明白它是如何解释其参数的。

如果我给负值作为参数,会发生什么? view()函数怎样?例如,如果我打电话给我tensor_variable.view(1, 1, -1)怎么办?

谁能view()用一些例子解释功能的主要原理?

I am confused about the method view() in the following code snippet.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool  = nn.MaxPool2d(2,2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1   = nn.Linear(16*5*5, 120)
        self.fc2   = nn.Linear(120, 84)
        self.fc3   = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

My confusion is regarding the following line.

x = x.view(-1, 16*5*5)

What does tensor.view() function do? I have seen its usage in many places, but I can’t understand how it interprets its parameters.

What happens if I give negative values as parameters to the view() function? For example, what happens if I call, tensor_variable.view(1, 1, -1)?

Can anyone explain the main principle of view() function with some examples?


回答 0

视图功能旨在重塑张量。

说你有张量

import torch
a = torch.range(1, 16)

a是具有16个元素(从1到16(包括))的张量。如果要重塑该张量以使其成为4 x 4张量,则可以使用

a = a.view(4, 4)

现在a将是4 x 4张量。请注意,在重塑后,元素总数必须保持不变。重塑张a3 x 5张量是不恰当的。

参数-1是什么意思?

如果在某些情况下您不知道要多少行,但是确定了列数,则可以将其指定为-1。(请注意,您可以将其扩展到具有更大尺寸的张量。轴值之一只能是-1)。这是一种告诉库的方法:“给我一个具有这么多列的张量,然后您就可以计算出实现此目的所需的适当行数”。

可以在上面给出的神经网络代码中看到。在x = self.pool(F.relu(self.conv2(x)))前进功能中的线之后,您将具有16深度特征图。您必须将其展平以将其分配给完全连接的层。因此,您告诉pytorch重塑所获得的张量,使其具有特定的列数,并告诉它自己决定行数。

在numpy和pytorch之间绘制相似之处, view类似于numpy的重塑功能。

The view function is meant to reshape the tensor.

Say you have a tensor

import torch
a = torch.range(1, 16)

a is a tensor that has 16 elements from 1 to 16(included). If you want to reshape this tensor to make it a 4 x 4 tensor then you can use

a = a.view(4, 4)

Now a will be a 4 x 4 tensor. Note that after the reshape the total number of elements need to remain the same. Reshaping the tensor a to a 3 x 5 tensor would not be appropriate.

What is the meaning of parameter -1?

If there is any situation that you don’t know how many rows you want but are sure of the number of columns, then you can specify this with a -1. (Note that you can extend this to tensors with more dimensions. Only one of the axis value can be -1). This is a way of telling the library: “give me a tensor that has these many columns and you compute the appropriate number of rows that is necessary to make this happen”.

This can be seen in the neural network code that you have given above. After the line x = self.pool(F.relu(self.conv2(x))) in the forward function, you will have a 16 depth feature map. You have to flatten this to give it to the fully connected layer. So you tell pytorch to reshape the tensor you obtained to have specific number of columns and tell it to decide the number of rows by itself.

Drawing a similarity between numpy and pytorch, view is similar to numpy’s reshape function.


回答 1

让我们做一些例子,从简单到困难。

  1. view方法返回的张量具有与张量相同的数据self(这意味着返回的张量具有相同数量的元素),但形状不同。例如:

    a = torch.arange(1, 17)  # a's shape is (16,)
    
    a.view(4, 4) # output below
      1   2   3   4
      5   6   7   8
      9  10  11  12
     13  14  15  16
    [torch.FloatTensor of size 4x4]
    
    a.view(2, 2, 4) # output below
    (0 ,.,.) = 
    1   2   3   4
    5   6   7   8
    
    (1 ,.,.) = 
     9  10  11  12
    13  14  15  16
    [torch.FloatTensor of size 2x2x4]
  2. 假设这-1不是参数之一,则将它们相乘时,结果必须等于张量中的元素数量。如果您执行以下操作:a.view(3, 3),它将引发一个RuntimeError原因,因为形状(3 x 3)不适用于具有16个元素的输入。换句话说:3 x 3不等于16而是9。

  3. 您可以将其-1用作传递给函数的参数之一,但只能使用一次。所有发生的事情是该方法将为您完成如何填充该维​​度的数学运算。例如a.view(2, -1, 4)等于a.view(2, 2, 4)。[16 /(2 x 4)= 2]

  4. 请注意,返回的张量共享相同的数据。如果您在“视图”中进行了更改,那么您正在更改原始张量的数据:

    b = a.view(4, 4)
    b[0, 2] = 2
    a[2] == 3.0
    False
  5. 现在,对于更复杂的用例。该文档说,每个新视图维必须是原始维的子空间,或者只能是跨度d,d + 1,…,d + k,它们满足以下所有i = 0,…的连续性条件。 ..,k-1,stride [i] = stride [i +1] x size [i +1]。否则,contiguous()需要先调用才能查看张量。例如:

    a = torch.rand(5, 4, 3, 2) # size (5, 4, 3, 2)
    a_t = a.permute(0, 2, 3, 1) # size (5, 3, 2, 4)
    
    # The commented line below will raise a RuntimeError, because one dimension
    # spans across two contiguous subspaces
    # a_t.view(-1, 4)
    
    # instead do:
    a_t.contiguous().view(-1, 4)
    
    # To see why the first one does not work and the second does,
    # compare a.stride() and a_t.stride()
    a.stride() # (24, 6, 2, 1)
    a_t.stride() # (24, 2, 1, 6)

    请注意,对于a_t,因为24!= 2 x 3,所以stride [0]!= stride [1] x size [1]

Let’s do some examples, from simpler to more difficult.

  1. The view method returns a tensor with the same data as the self tensor (which means that the returned tensor has the same number of elements), but with a different shape. For example:

    a = torch.arange(1, 17)  # a's shape is (16,)
    
    a.view(4, 4) # output below
      1   2   3   4
      5   6   7   8
      9  10  11  12
     13  14  15  16
    [torch.FloatTensor of size 4x4]
    
    a.view(2, 2, 4) # output below
    (0 ,.,.) = 
    1   2   3   4
    5   6   7   8
    
    (1 ,.,.) = 
     9  10  11  12
    13  14  15  16
    [torch.FloatTensor of size 2x2x4]
    
  2. Assuming that -1 is not one of the parameters, when you multiply them together, the result must be equal to the number of elements in the tensor. If you do: a.view(3, 3), it will raise a RuntimeError because shape (3 x 3) is invalid for input with 16 elements. In other words: 3 x 3 does not equal 16 but 9.

  3. You can use -1 as one of the parameters that you pass to the function, but only once. All that happens is that the method will do the math for you on how to fill that dimension. For example a.view(2, -1, 4) is equivalent to a.view(2, 2, 4). [16 / (2 x 4) = 2]

  4. Notice that the returned tensor shares the same data. If you make a change in the “view” you are changing the original tensor’s data:

    b = a.view(4, 4)
    b[0, 2] = 2
    a[2] == 3.0
    False
    
  5. Now, for a more complex use case. The documentation says that each new view dimension must either be a subspace of an original dimension, or only span d, d + 1, …, d + k that satisfy the following contiguity-like condition that for all i = 0, …, k – 1, stride[i] = stride[i + 1] x size[i + 1]. Otherwise, contiguous() needs to be called before the tensor can be viewed. For example:

    a = torch.rand(5, 4, 3, 2) # size (5, 4, 3, 2)
    a_t = a.permute(0, 2, 3, 1) # size (5, 3, 2, 4)
    
    # The commented line below will raise a RuntimeError, because one dimension
    # spans across two contiguous subspaces
    # a_t.view(-1, 4)
    
    # instead do:
    a_t.contiguous().view(-1, 4)
    
    # To see why the first one does not work and the second does,
    # compare a.stride() and a_t.stride()
    a.stride() # (24, 6, 2, 1)
    a_t.stride() # (24, 2, 1, 6)
    

    Notice that for a_t, stride[0] != stride[1] x size[1] since 24 != 2 x 3


回答 2

torch.Tensor.view()

简而言之,torch.Tensor.view()numpy.ndarray.reshape()或启发numpy.reshape(),创建了一个新视图,只要新形状与原始张量的形状兼容张量。

让我们通过一个具体的例子来详细了解这一点。

In [43]: t = torch.arange(18) 

In [44]: t 
Out[44]: 
tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17])

有了这个张量t的形状(18,),新观点可以为以下形状创建:

(1, 18)或等效 (1, -1)或 或等效 或 或等效 或 或等效 或 或等效 或或等效 或(-1, 18)
(2, 9)(2, -1)(-1, 9)
(3, 6)(3, -1)(-1, 6)
(6, 3)(6, -1)(-1, 3)
(9, 2)(9, -1)(-1, 2)
(18, 1)(18, -1)(-1, 1)

正如我们可以从已经上述形状元组观察,形状元组(例如中的元素的乘法运算2*93*6等)必须始终等于在原始张量元素的总数(18在我们的例子)。

要观察的另一件事是,我们-1在每个形状元组的一个位置中使用了a 。通过使用a -1,我们懒于自己进行计算,而是将任务委托给PyTorch来在形状创建新视图时对该形状进行该值的计算。需要注意的重要一件事是,我们只能-1在形状元组中使用单个。其余值应由我们明确提供。其他PyTorch会抱怨RuntimeError

RuntimeError:只能推断一个维度

因此,使用上述所有形状,PyTorch将始终返回原始张量的新视图t。这基本上意味着,它只是针对所请求的每个新视图更改张量的步幅信息。

下面是一些示例,说明每个新视图如何改变张量的步幅。

# stride of our original tensor `t`
In [53]: t.stride() 
Out[53]: (1,)

现在,我们将看到新视图的大步前进:

# shape (1, 18)
In [54]: t1 = t.view(1, -1)
# stride tensor `t1` with shape (1, 18)
In [55]: t1.stride() 
Out[55]: (18, 1)

# shape (2, 9)
In [56]: t2 = t.view(2, -1)
# stride of tensor `t2` with shape (2, 9)
In [57]: t2.stride()       
Out[57]: (9, 1)

# shape (3, 6)
In [59]: t3 = t.view(3, -1) 
# stride of tensor `t3` with shape (3, 6)
In [60]: t3.stride() 
Out[60]: (6, 1)

# shape (6, 3)
In [62]: t4 = t.view(6,-1)
# stride of tensor `t4` with shape (6, 3)
In [63]: t4.stride() 
Out[63]: (3, 1)

# shape (9, 2)
In [65]: t5 = t.view(9, -1) 
# stride of tensor `t5` with shape (9, 2)
In [66]: t5.stride()
Out[66]: (2, 1)

# shape (18, 1)
In [68]: t6 = t.view(18, -1)
# stride of tensor `t6` with shape (18, 1)
In [69]: t6.stride()
Out[69]: (1, 1)

这就是view()功能的魔力。它只是改变(原始)张量的步幅为每个新的观点,只要新的形状视图是与原来的形状相容。

从跨步元组可能会观察到的另一件有趣的事情是,在形状元组的第0 位置的元素的值等于在形状元组的第一个位置的元素的值。

In [74]: t3.shape 
Out[74]: torch.Size([3, 6])
                        |
In [75]: t3.stride()    |
Out[75]: (6, 1)         |
          |_____________|

这是因为:

In [76]: t3 
Out[76]: 
tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11],
        [12, 13, 14, 15, 16, 17]])

步幅(6, 1)说,从一个元素到下一个元素沿0 维度,我们要或采取6个步骤。(即从去06,人们必须采取6个步骤。)但是,从一个元素去的1个一个元素ST层面,我们只需要只差一步(例如,用于从去23)。

因此,步幅信息是如何从存储器访问元素以执行计算的核心。


torch.reshape()

此函数将返回一个视图,并且与使用完全相同torch.Tensor.view()只要新形状与原始张量的形状兼容,与之。否则,它将返回一个副本。

但是,注意事项torch.reshape()警告:

连续的输入和具有兼容步幅的输入可以在不复制的情况下进行重塑,但其中一个不应依赖于复制与查看行为。

torch.Tensor.view()

Simply put, torch.Tensor.view() which is inspired by numpy.ndarray.reshape() or numpy.reshape(), creates a new view of the tensor, as long as the new shape is compatible with the shape of the original tensor.

Let’s understand this in detail using a concrete example.

In [43]: t = torch.arange(18) 

In [44]: t 
Out[44]: 
tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17])

With this tensor t of shape (18,), new views can only be created for the following shapes:

(1, 18) or equivalently (1, -1) or (-1, 18)
(2, 9) or equivalently (2, -1) or (-1, 9)
(3, 6) or equivalently (3, -1) or (-1, 6)
(6, 3) or equivalently (6, -1) or (-1, 3)
(9, 2) or equivalently (9, -1) or (-1, 2)
(18, 1) or equivalently (18, -1) or (-1, 1)

As we can already observe from the above shape tuples, the multiplication of the elements of the shape tuple (e.g. 2*9, 3*6 etc.) must always be equal to the total number of elements in the original tensor (18 in our example).

Another thing to observe is that we used a -1 in one of the places in each of the shape tuples. By using a -1, we are being lazy in doing the computation ourselves and rather delegate the task to PyTorch to do calculation of that value for the shape when it creates the new view. One important thing to note is that we can only use a single -1 in the shape tuple. The remaining values should be explicitly supplied by us. Else PyTorch will complain by throwing a RuntimeError:

RuntimeError: only one dimension can be inferred

So, with all of the above mentioned shapes, PyTorch will always return a new view of the original tensor t. This basically means that it just changes the stride information of the tensor for each of the new views that are requested.

Below are some examples illustrating how the strides of the tensors are changed with each new view.

# stride of our original tensor `t`
In [53]: t.stride() 
Out[53]: (1,)

Now, we will see the strides for the new views:

# shape (1, 18)
In [54]: t1 = t.view(1, -1)
# stride tensor `t1` with shape (1, 18)
In [55]: t1.stride() 
Out[55]: (18, 1)

# shape (2, 9)
In [56]: t2 = t.view(2, -1)
# stride of tensor `t2` with shape (2, 9)
In [57]: t2.stride()       
Out[57]: (9, 1)

# shape (3, 6)
In [59]: t3 = t.view(3, -1) 
# stride of tensor `t3` with shape (3, 6)
In [60]: t3.stride() 
Out[60]: (6, 1)

# shape (6, 3)
In [62]: t4 = t.view(6,-1)
# stride of tensor `t4` with shape (6, 3)
In [63]: t4.stride() 
Out[63]: (3, 1)

# shape (9, 2)
In [65]: t5 = t.view(9, -1) 
# stride of tensor `t5` with shape (9, 2)
In [66]: t5.stride()
Out[66]: (2, 1)

# shape (18, 1)
In [68]: t6 = t.view(18, -1)
# stride of tensor `t6` with shape (18, 1)
In [69]: t6.stride()
Out[69]: (1, 1)

So that’s the magic of the view() function. It just changes the strides of the (original) tensor for each of the new views, as long as the shape of the new view is compatible with the original shape.

Another interesting thing one might observe from the strides tuples is that the value of the element in the 0th position is equal to the value of the element in the 1st position of the shape tuple.

In [74]: t3.shape 
Out[74]: torch.Size([3, 6])
                        |
In [75]: t3.stride()    |
Out[75]: (6, 1)         |
          |_____________|

This is because:

In [76]: t3 
Out[76]: 
tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11],
        [12, 13, 14, 15, 16, 17]])

the stride (6, 1) says that to go from one element to the next element along the 0th dimension, we have to jump or take 6 steps. (i.e. to go from 0 to 6, one has to take 6 steps.) But to go from one element to the next element in the 1st dimension, we just need only one step (for e.g. to go from 2 to 3).

Thus, the strides information is at the heart of how the elements are accessed from memory for performing the computation.


torch.reshape()

This function would return a view and is exactly the same as using torch.Tensor.view() as long as the new shape is compatible with the shape of the original tensor. Otherwise, it will return a copy.

However, the notes of torch.reshape() warns that:

contiguous inputs and inputs with compatible strides can be reshaped without copying, but one should not depend on the copying vs. viewing behavior.


回答 3

我发现它x.view(-1, 16 * 5 * 5)等效于x.flatten(1),其中参数1指示扁平化过程从第一维开始(而不是扁平化“样本”维),如您所见,后者的用法在语义上更加清晰并且易于使用,因此我喜欢flatten()

I figured it out that x.view(-1, 16 * 5 * 5) is equivalent to x.flatten(1), where the parameter 1 indicates the flatten process starts from the 1st dimension(not flattening the ‘sample’ dimension) As you can see, the latter usage is semantically more clear and easier to use, so I prefer flatten().


回答 4

参数-1是什么意思?

您可以读取-1为动态数量的参数或“任何内容”。正因为如此,只能有一个参数-1view()

如果您要求,x.view(-1,1)将输出张量形状,[anything, 1]具体取决于中的元素数量x。例如:

import torch
x = torch.tensor([1, 2, 3, 4])
print(x,x.shape)
print("...")
print(x.view(-1,1), x.view(-1,1).shape)
print(x.view(1,-1), x.view(1,-1).shape)

将输出:

tensor([1, 2, 3, 4]) torch.Size([4])
...
tensor([[1],
        [2],
        [3],
        [4]]) torch.Size([4, 1])
tensor([[1, 2, 3, 4]]) torch.Size([1, 4])

What is the meaning of parameter -1?

You can read -1 as dynamic number of parameters or “anything”. Because of that there can be only one parameter -1 in view().

If you ask x.view(-1,1) this will output tensor shape [anything, 1] depending on the number of elements in x. For example:

import torch
x = torch.tensor([1, 2, 3, 4])
print(x,x.shape)
print("...")
print(x.view(-1,1), x.view(-1,1).shape)
print(x.view(1,-1), x.view(1,-1).shape)

Will output:

tensor([1, 2, 3, 4]) torch.Size([4])
...
tensor([[1],
        [2],
        [3],
        [4]]) torch.Size([4, 1])
tensor([[1, 2, 3, 4]]) torch.Size([1, 4])

回答 5

weights.reshape(a, b) 将返回一个新的张量,该张量的数据与权重为(a,b)的权重相同,因为它会将数据复制到内存的另一部分。

weights.resize_(a, b)返回具有不同形状的相同张量。但是,如果新形状导致的元素数量少于原始张量,则某些元素将从张量中删除(但不会从内存中删除)。如果新形状导致的元素数量多于原始张量,则新元素将在内存中未初始化。

weights.view(a, b) 将返回与具有权重(a,b)的权重相同的数据的新张量

weights.reshape(a, b) will return a new tensor with the same data as weights with size (a, b) as in it copies the data to another part of memory.

weights.resize_(a, b) returns the same tensor with a different shape. However, if the new shape results in fewer elements than the original tensor, some elements will be removed from the tensor (but not from memory). If the new shape results in more elements than the original tensor, new elements will be uninitialized in memory.

weights.view(a, b) will return a new tensor with the same data as weights with size (a, b)


回答 6

我真的很喜欢@Jadiel de Armas的例子。

我想对.view(…)的元素排序方式有一点了解

  • 对于形状为(a,b,c)的张量,其元素的顺序由编号系统确定:其中第一个数字 数字,第二个数字为b数字,第三个数字为c数字。
  • .view(…)返回的新Tensor中的元素映射将保留原始Tensor的此顺序

I really liked @Jadiel de Armas examples.

I would like to add a small insight to how elements are ordered for .view(…)

  • For a Tensor with shape (a,b,c), the order of it’s elements are determined by a numbering system: where the first digit has a numbers, second digit has b numbers and third digit has c numbers.
  • The mapping of the elements in the new Tensor returned by .view(…) preserves this order of the original Tensor.

回答 7

让我们尝试通过以下示例了解视图:

    a=torch.range(1,16)

print(a)

    tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13., 14.,
            15., 16.])

print(a.view(-1,2))

    tensor([[ 1.,  2.],
            [ 3.,  4.],
            [ 5.,  6.],
            [ 7.,  8.],
            [ 9., 10.],
            [11., 12.],
            [13., 14.],
            [15., 16.]])

print(a.view(2,-1,4))   #3d tensor

    tensor([[[ 1.,  2.,  3.,  4.],
             [ 5.,  6.,  7.,  8.]],

            [[ 9., 10., 11., 12.],
             [13., 14., 15., 16.]]])
print(a.view(2,-1,2))

    tensor([[[ 1.,  2.],
             [ 3.,  4.],
             [ 5.,  6.],
             [ 7.,  8.]],

            [[ 9., 10.],
             [11., 12.],
             [13., 14.],
             [15., 16.]]])

print(a.view(4,-1,2))

    tensor([[[ 1.,  2.],
             [ 3.,  4.]],

            [[ 5.,  6.],
             [ 7.,  8.]],

            [[ 9., 10.],
             [11., 12.]],

            [[13., 14.],
             [15., 16.]]])

如果我们知道y,z的值,则将-1作为参数值是计算x值的一种简便方法;在3d的情况下,反之亦然;对于2d,它又是计算x值的一种简便方法知道y的值,反之亦然。

Let’s try to understand view by the following examples:

    a=torch.range(1,16)

print(a)

    tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13., 14.,
            15., 16.])

print(a.view(-1,2))

    tensor([[ 1.,  2.],
            [ 3.,  4.],
            [ 5.,  6.],
            [ 7.,  8.],
            [ 9., 10.],
            [11., 12.],
            [13., 14.],
            [15., 16.]])

print(a.view(2,-1,4))   #3d tensor

    tensor([[[ 1.,  2.,  3.,  4.],
             [ 5.,  6.,  7.,  8.]],

            [[ 9., 10., 11., 12.],
             [13., 14., 15., 16.]]])
print(a.view(2,-1,2))

    tensor([[[ 1.,  2.],
             [ 3.,  4.],
             [ 5.,  6.],
             [ 7.,  8.]],

            [[ 9., 10.],
             [11., 12.],
             [13., 14.],
             [15., 16.]]])

print(a.view(4,-1,2))

    tensor([[[ 1.,  2.],
             [ 3.,  4.]],

            [[ 5.,  6.],
             [ 7.,  8.]],

            [[ 9., 10.],
             [11., 12.]],

            [[13., 14.],
             [15., 16.]]])

-1 as an argument value is an easy way to compute the value of say x provided we know values of y, z or the other way round in case of 3d and for 2d again an easy way to compute the value of say x provided we know values of y or vice versa..


在PyTorch中保存经过训练的模型的最佳方法?

问题:在PyTorch中保存经过训练的模型的最佳方法?

我一直在寻找其他方法来在PyTorch中保存经过训练的模型。到目前为止,我发现了两种选择。

  1. 使用torch.save()保存模型,使用torch.load()加载模型。
  2. model.state_dict()保存训练的模型,model.load_state_dict()加载保存的模型。

我碰到过这种讨论,其中建议方法2优于方法1。

我的问题是,为什么选择第二种方法呢?仅仅是因为torch.nn模块具有这两个功能,我们被鼓励使用它们吗?

I was looking for alternative ways to save a trained model in PyTorch. So far, I have found two alternatives.

  1. torch.save() to save a model and torch.load() to load a model.
  2. model.state_dict() to save a trained model and model.load_state_dict() to load the saved model.

I have come across to this discussion where approach 2 is recommended over approach 1.

My question is, why the second approach is preferred? Is it only because torch.nn modules have those two function and we are encouraged to use them?


回答 0

我在他们的github仓库中找到了此页面,我将内容粘贴在这里。


推荐的模型保存方法

序列化和还原模型有两种主要方法。

第一个(推荐)仅保存和加载模型参数:

torch.save(the_model.state_dict(), PATH)

然后再:

the_model = TheModelClass(*args, **kwargs)
the_model.load_state_dict(torch.load(PATH))

第二个保存并加载整个模型:

torch.save(the_model, PATH)

然后再:

the_model = torch.load(PATH)

但是,在这种情况下,序列化的数据将绑定到所使用的特定类和确切的目录结构,因此在其他项目中使用时或经过一些严重的重构后,它可能以各种方式中断。

I’ve found this page on their github repo, I’ll just paste the content here.


Recommended approach for saving a model

There are two main approaches for serializing and restoring a model.

The first (recommended) saves and loads only the model parameters:

torch.save(the_model.state_dict(), PATH)

Then later:

the_model = TheModelClass(*args, **kwargs)
the_model.load_state_dict(torch.load(PATH))

The second saves and loads the entire model:

torch.save(the_model, PATH)

Then later:

the_model = torch.load(PATH)

However in this case, the serialized data is bound to the specific classes and the exact directory structure used, so it can break in various ways when used in other projects, or after some serious refactors.


回答 1

这取决于您想做什么。

案例1:保存模型以供您自己进行推断:保存模型,还原模型,然后将模型更改为评估模式。这样做是因为您通常在构造上具有BatchNormDropout图层,这些图层默认情况下处于训练模式:

torch.save(model.state_dict(), filepath)

#Later to restore:
model.load_state_dict(torch.load(filepath))
model.eval()

案例2:保存模型以便以后继续训练:如果您需要继续训练将要保存的模型,则需要保存的不仅仅是模型。您还需要保存优化器的状态,时期,得分等。您可以这样操作:

state = {
    'epoch': epoch,
    'state_dict': model.state_dict(),
    'optimizer': optimizer.state_dict(),
    ...
}
torch.save(state, filepath)

要恢复训练,您将执行以下操作:state = torch.load(filepath),然后恢复每个对象的状态,如下所示:

model.load_state_dict(state['state_dict'])
optimizer.load_state_dict(state['optimizer'])

由于您正在恢复训练,因此在加载时恢复状态后,请勿model.eval()再调用。

案例3:无法访问您的代码的其他人可以使用的模型:在Tensorflow中,您可以创建一个.pb文件,该文件定义了体系结构和模型权重。这非常方便,尤其是在使用时Tensorflow serve。在Pytorch中执行此操作的等效方法是:

torch.save(model, filepath)

# Then later:
model = torch.load(filepath)

这种方式仍然不能保证安全,并且由于pytorch仍在进行大量更改,因此我不建议这样做。

It depends on what you want to do.

Case # 1: Save the model to use it yourself for inference: You save the model, you restore it, and then you change the model to evaluation mode. This is done because you usually have BatchNorm and Dropout layers that by default are in train mode on construction:

torch.save(model.state_dict(), filepath)

#Later to restore:
model.load_state_dict(torch.load(filepath))
model.eval()

Case # 2: Save model to resume training later: If you need to keep training the model that you are about to save, you need to save more than just the model. You also need to save the state of the optimizer, epochs, score, etc. You would do it like this:

state = {
    'epoch': epoch,
    'state_dict': model.state_dict(),
    'optimizer': optimizer.state_dict(),
    ...
}
torch.save(state, filepath)

To resume training you would do things like: state = torch.load(filepath), and then, to restore the state of each individual object, something like this:

model.load_state_dict(state['state_dict'])
optimizer.load_state_dict(state['optimizer'])

Since you are resuming training, DO NOT call model.eval() once you restore the states when loading.

Case # 3: Model to be used by someone else with no access to your code: In Tensorflow you can create a .pb file that defines both the architecture and the weights of the model. This is very handy, specially when using Tensorflow serve. The equivalent way to do this in Pytorch would be:

torch.save(model, filepath)

# Then later:
model = torch.load(filepath)

This way is still not bullet proof and since pytorch is still undergoing a lot of changes, I wouldn’t recommend it.


回答 2

泡菜的Python库实现二进制协议的序列化和反序列化Python对象。

当您import torch(或当您使用PyTorch)时,它将import pickle为您而您不需要调用pickle.dump()pickle.load()直接调用,这是保存和加载对象的方法。

事实上,torch.save()torch.load()将包裹pickle.dump()pickle.load()为您服务。

一个state_dict对方的回答值得提及的只是几个音符。

什么state_dict我们有内部PyTorch?实际上有两个state_dict秒。

PyTorch模型torch.nn.Module具有model.parameters()调用以获取可学习的参数(w和b)。这些可学习的参数,一旦被随机设置,将随着我们的学习而随着时间而更新。可学习的参数是第一个state_dict

第二个state_dict是优化器状态字典。您还记得优化器用于改善我们的可学习参数。但是优化器state_dict是固定的。在那没什么可学的。

由于state_dict对象是Python字典,因此可以轻松地保存,更新,更改和还原对象,从而为PyTorch模型和优化器增加了很多模块化。

让我们创建一个超级简单的模型来解释这一点:

import torch
import torch.optim as optim

model = torch.nn.Linear(5, 2)

# Initialize optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

print("Model's state_dict:")
for param_tensor in model.state_dict():
    print(param_tensor, "\t", model.state_dict()[param_tensor].size())

print("Model weight:")    
print(model.weight)

print("Model bias:")    
print(model.bias)

print("---")
print("Optimizer's state_dict:")
for var_name in optimizer.state_dict():
    print(var_name, "\t", optimizer.state_dict()[var_name])

此代码将输出以下内容:

Model's state_dict:
weight   torch.Size([2, 5])
bias     torch.Size([2])
Model weight:
Parameter containing:
tensor([[ 0.1328,  0.1360,  0.1553, -0.1838, -0.0316],
        [ 0.0479,  0.1760,  0.1712,  0.2244,  0.1408]], requires_grad=True)
Model bias:
Parameter containing:
tensor([ 0.4112, -0.0733], requires_grad=True)
---
Optimizer's state_dict:
state    {}
param_groups     [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [140695321443856, 140695321443928]}]

请注意,这是最小模型。您可以尝试添加顺序堆栈

model = torch.nn.Sequential(
          torch.nn.Linear(D_in, H),
          torch.nn.Conv2d(A, B, C)
          torch.nn.Linear(H, D_out),
        )

请注意,只有具有可学习参数的层(卷积层,线性层等)和已注册的缓冲区(batchnorm层)才在模型的中具有条目state_dict

不可学习的东西属于优化器对象state_dict,该对象包含有关优化器状态以及所用超参数的信息。

故事的其余部分是相同的。在推理阶段(这是我们训练后使用模型的阶段)进行预测;我们会根据所学的参数进行预测。因此,为了进行推断,我们只需要保存参数model.state_dict()

torch.save(model.state_dict(), filepath)

并在以后使用model.load_state_dict(torch.load(filepath))model.eval()

注意:不要忘了最后一行,model.eval()在加载模型之后,这是至关重要的。

也不要试图保存torch.save(model.parameters(), filepath)。该model.parameters()只是生成对象。

另一方面,torch.save(model, filepath)保存模型对象本身,但请记住,模型没有优化程序state_dict。检查@Jadiel de Armas的其他出色答案,以保存优化程序的状态字典。

The pickle Python library implements binary protocols for serializing and de-serializing a Python object.

When you import torch (or when you use PyTorch) it will import pickle for you and you don’t need to call pickle.dump() and pickle.load() directly, which are the methods to save and to load the object.

In fact, torch.save() and torch.load() will wrap pickle.dump() and pickle.load() for you.

A state_dict the other answer mentioned deserves just few more notes.

What state_dict do we have inside PyTorch? There are actually two state_dicts.

The PyTorch model is torch.nn.Module has model.parameters() call to get learnable parameters (w and b). These learnable parameters, once randomly set, will update over time as we learn. Learnable parameters are the first state_dict.

The second state_dict is the optimizer state dict. You recall that the optimizer is used to improve our learnable parameters. But the optimizer state_dict is fixed. Nothing to learn in there.

Because state_dict objects are Python dictionaries, they can be easily saved, updated, altered, and restored, adding a great deal of modularity to PyTorch models and optimizers.

Let’s create a super simple model to explain this:

import torch
import torch.optim as optim

model = torch.nn.Linear(5, 2)

# Initialize optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

print("Model's state_dict:")
for param_tensor in model.state_dict():
    print(param_tensor, "\t", model.state_dict()[param_tensor].size())

print("Model weight:")    
print(model.weight)

print("Model bias:")    
print(model.bias)

print("---")
print("Optimizer's state_dict:")
for var_name in optimizer.state_dict():
    print(var_name, "\t", optimizer.state_dict()[var_name])

This code will output the following:

Model's state_dict:
weight   torch.Size([2, 5])
bias     torch.Size([2])
Model weight:
Parameter containing:
tensor([[ 0.1328,  0.1360,  0.1553, -0.1838, -0.0316],
        [ 0.0479,  0.1760,  0.1712,  0.2244,  0.1408]], requires_grad=True)
Model bias:
Parameter containing:
tensor([ 0.4112, -0.0733], requires_grad=True)
---
Optimizer's state_dict:
state    {}
param_groups     [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [140695321443856, 140695321443928]}]

Note this is a minimal model. You may try to add stack of sequential

model = torch.nn.Sequential(
          torch.nn.Linear(D_in, H),
          torch.nn.Conv2d(A, B, C)
          torch.nn.Linear(H, D_out),
        )

Note that only layers with learnable parameters (convolutional layers, linear layers, etc.) and registered buffers (batchnorm layers) have entries in the model’s state_dict.

Non learnable things, belong to the optimizer object state_dict, which contains information about the optimizer’s state, as well as the hyperparameters used.

The rest of the story is the same; in the inference phase (this is a phase when we use the model after training) for predicting; we do predict based on the parameters we learned. So for the inference, we just need to save the parameters model.state_dict().

torch.save(model.state_dict(), filepath)

And to use later model.load_state_dict(torch.load(filepath)) model.eval()

Note: Don’t forget the last line model.eval() this is crucial after loading the model.

Also don’t try to save torch.save(model.parameters(), filepath). The model.parameters() is just the generator object.

On the other side, torch.save(model, filepath) saves the model object itself, but keep in mind the model doesn’t have the optimizer’s state_dict. Check the other excellent answer by @Jadiel de Armas to save the optimizer’s state dict.


回答 3

常见的PyTorch约定是使用.pt或.pth文件扩展名保存模型。

保存/加载整个模型 保存:

path = "username/directory/lstmmodelgpu.pth"
torch.save(trainer, path)

加载:

模型类必须在某处定义

model = torch.load(PATH)
model.eval()

A common PyTorch convention is to save models using either a .pt or .pth file extension.

Save/Load Entire Model Save:

path = "username/directory/lstmmodelgpu.pth"
torch.save(trainer, path)

Load:

Model class must be defined somewhere

model = torch.load(PATH)
model.eval()

回答 4

如果您要保存模型并希望以后继续训练,请执行以下操作:

单个GPU: 保存:

state = {
        'epoch': epoch,
        'state_dict': model.state_dict(),
        'optimizer': optimizer.state_dict(),
}
savepath='checkpoint.t7'
torch.save(state,savepath)

加载:

checkpoint = torch.load('checkpoint.t7')
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']

多GPU: 保存

state = {
        'epoch': epoch,
        'state_dict': model.module.state_dict(),
        'optimizer': optimizer.state_dict(),
}
savepath='checkpoint.t7'
torch.save(state,savepath)

加载:

checkpoint = torch.load('checkpoint.t7')
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']

#Don't call DataParallel before loading the model otherwise you will get an error

model = nn.DataParallel(model) #ignore the line if you want to load on Single GPU

If you want to save the model and wants to resume the training later:

Single GPU: Save:

state = {
        'epoch': epoch,
        'state_dict': model.state_dict(),
        'optimizer': optimizer.state_dict(),
}
savepath='checkpoint.t7'
torch.save(state,savepath)

Load:

checkpoint = torch.load('checkpoint.t7')
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']

Multiple GPU: Save

state = {
        'epoch': epoch,
        'state_dict': model.module.state_dict(),
        'optimizer': optimizer.state_dict(),
}
savepath='checkpoint.t7'
torch.save(state,savepath)

Load:

checkpoint = torch.load('checkpoint.t7')
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']

#Don't call DataParallel before loading the model otherwise you will get an error

model = nn.DataParallel(model) #ignore the line if you want to load on Single GPU

如何在TensorFlow中打印Tensor对象的值?

问题:如何在TensorFlow中打印Tensor对象的值?

我一直在使用TensorFlow中矩阵乘法的入门示例。

matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.],[2.]])
product = tf.matmul(matrix1, matrix2)

当我打印产品时,它会将其显示为Tensor对象:

<tensorflow.python.framework.ops.Tensor object at 0x10470fcd0>

但是我怎么知道的价值product呢?

以下内容无济于事:

print product
Tensor("MatMul:0", shape=TensorShape([Dimension(1), Dimension(1)]), dtype=float32)

我知道图形可以继续运行Sessions,但是没有任何方法可以在Tensor不运行图形的情况下检查对象的输出session吗?

I have been using the introductory example of matrix multiplication in TensorFlow.

matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.],[2.]])
product = tf.matmul(matrix1, matrix2)

When I print the product, it is displaying it as a Tensor object:

<tensorflow.python.framework.ops.Tensor object at 0x10470fcd0>

But how do I know the value of product?

The following doesn’t help:

print product
Tensor("MatMul:0", shape=TensorShape([Dimension(1), Dimension(1)]), dtype=float32)

I know that graphs run on Sessions, but isn’t there any way I can check the output of a Tensor object without running the graph in a session?


回答 0

评估对象实际值的最简单方法[A]Tensor是将其传递给Session.run()方法,或者Tensor.eval()在有默认会话时(即在一个with tf.Session():块中,或参阅下文)调用该方法。通常[B],如果不在会话中运行某些代码,就无法打印张量的值。

如果您正在试验编程模型,并且想要一种简单的方法来评估张量,则tf.InteractiveSession可以在程序开始时打开一个会话,然后将该会话用于所有Tensor.eval()(和Operation.run())调用。当在一个Session无处不在的对象周围传递乏味的代码时,在诸如外壳或IPython笔记本之类的交互式设置中,这可能会更容易。例如,以下内容在Jupyter笔记本中起作用:

with tf.Session() as sess:  print(product.eval()) 

对于这么小的表达式来说,这似乎很愚蠢,但是Tensorflow 1.x的关键思想之一是推迟执行:构建一个大型而复杂的表达式非常便宜,并且当您要对其进行评估时,后端与您连接的Session)能够更有效地安排其执行时间(例如,并行执行独立的部分并使用GPU)。


[A]:要打印张量的值而不将其返回到Python程序,可以使用tf.print()运算符,如Andrzej在另一个答案中建议的那样。根据官方文件:

为了确保操作员能够运行,用户需要将产生的op传递给tf.compat.v1.Session的run方法,或者通过将其指定为来将op用作已执行op的控件依赖项tf.compat.v1.control_dependencies([print_op],并打印到标准输出中。

另请注意:

在Jupyter笔记本和Colab中,tf.print打印到笔记本单元的输出。它不会写入笔记本内核的控制台日志。

[B]:如果可以有效tf.get_static_value()地计算给定张量的值,则可以使用该函数获取其恒定值。

The easiest[A] way to evaluate the actual value of a Tensor object is to pass it to the Session.run() method, or call Tensor.eval() when you have a default session (i.e. in a with tf.Session(): block, or see below). In general[B], you cannot print the value of a tensor without running some code in a session.

If you are experimenting with the programming model, and want an easy way to evaluate tensors, the tf.InteractiveSession lets you open a session at the start of your program, and then use that session for all Tensor.eval() (and Operation.run()) calls. This can be easier in an interactive setting, such as the shell or an IPython notebook, when it’s tedious to pass around a Session object everywhere. For example, the following works in a Jupyter notebook:

with tf.Session() as sess:  print(product.eval()) 

This might seem silly for such a small expression, but one of the key ideas in Tensorflow 1.x is deferred execution: it’s very cheap to build a large and complex expression, and when you want to evaluate it, the back-end (to which you connect with a Session) is able to schedule its execution more efficiently (e.g. executing independent parts in parallel and using GPUs).


[A]: To print the value of a tensor without returning it to your Python program, you can use the tf.print() operator, as Andrzej suggests in another answer. According to the official documentation:

To make sure the operator runs, users need to pass the produced op to tf.compat.v1.Session‘s run method, or to use the op as a control dependency for executed ops by specifying with tf.compat.v1.control_dependencies([print_op]), which is printed to standard output.

Also note that:

In Jupyter notebooks and colabs, tf.print prints to the notebook cell outputs. It will not write to the notebook kernel’s console logs.

[B]: You might be able to use the tf.get_static_value() function to get the constant value of the given tensor if its value is efficiently calculable.


回答 1

尽管其他答案是正确的,即您不能在评估图形之前就无法打印该值,但它们并没有讨论在评估图形后在图形内部实际打印值的一种简单方法。

每当评估图形(使用runeval)时,查看张量值的最简单方法是使用Print此示例中的操作:

# Initialize session
import tensorflow as tf
sess = tf.InteractiveSession()

# Some tensor we want to print the value of
a = tf.constant([1.0, 3.0])

# Add print operation
a = tf.Print(a, [a], message="This is a: ")

# Add more elements of the graph using a
b = tf.add(a, a)

现在,无论何时我们评估整个图形,例如使用b.eval(),我们都会得到:

I tensorflow/core/kernels/logging_ops.cc:79] This is a: [1 3]

While other answers are correct that you cannot print the value until you evaluate the graph, they do not talk about one easy way of actually printing a value inside the graph, once you evaluate it.

The easiest way to see a value of a tensor whenever the graph is evaluated (using run or eval) is to use the Print operation as in this example:

# Initialize session
import tensorflow as tf
sess = tf.InteractiveSession()

# Some tensor we want to print the value of
a = tf.constant([1.0, 3.0])

# Add print operation
a = tf.Print(a, [a], message="This is a: ")

# Add more elements of the graph using a
b = tf.add(a, a)

Now, whenever we evaluate the whole graph, e.g. using b.eval(), we get:

I tensorflow/core/kernels/logging_ops.cc:79] This is a: [1 3]

回答 2

重申其他人说的话,如果不运行图形就无法检查值。

想要寻找简单示例来打印值的任何人的简单摘录如下。该代码无需修改即可在ipython Notebook中执行

import tensorflow as tf

#define a variable to hold normal random values 
normal_rv = tf.Variable( tf.truncated_normal([2,3],stddev = 0.1))

#initialize the variable
init_op = tf.initialize_all_variables()

#run the graph
with tf.Session() as sess:
    sess.run(init_op) #execute init_op
    #print the random values that we sample
    print (sess.run(normal_rv))

输出:

[[-0.16702934  0.07173464 -0.04512421]
 [-0.02265321  0.06509651 -0.01419079]]

Reiterating what others said, its not possible to check the values without running the graph.

A simple snippet for anyone looking for an easy example to print values is as below. The code can be executed without any modification in ipython notebook

import tensorflow as tf

#define a variable to hold normal random values 
normal_rv = tf.Variable( tf.truncated_normal([2,3],stddev = 0.1))

#initialize the variable
init_op = tf.initialize_all_variables()

#run the graph
with tf.Session() as sess:
    sess.run(init_op) #execute init_op
    #print the random values that we sample
    print (sess.run(normal_rv))

Output:

[[-0.16702934  0.07173464 -0.04512421]
 [-0.02265321  0.06509651 -0.01419079]]

回答 3

不,如果不运行图形,就无法看到张量的内容(这样做session.run())。您只能看到的是:

  • 张量的维数(但我认为不难为操作列表计算它) TF具有)
  • 将用于生成张量(transpose_1:0random_uniform:0
  • 张量中的元素类型(float32

我没有在文档中找到此信息,但我相信变量的值(以及某些常量在赋值时并未计算)。


看一下这个例子:

import tensorflow as tf
from datetime import datetime
dim = 7000

第一个示例中,我只是初始化一个随机数的常数Tensor,而在不显着的dim(0:00:00.003261)情况下几乎同时运行

startTime = datetime.now()
m1 = tf.truncated_normal([dim, dim], mean=0.0, stddev=0.02, dtype=tf.float32, seed=1)
print datetime.now() - startTime

在第二种情况下,实际上是在评估常数并分配了值,时间显然取决于dim(0:00:01.244642

startTime = datetime.now()
m1 = tf.truncated_normal([dim, dim], mean=0.0, stddev=0.02, dtype=tf.float32, seed=1)
sess = tf.Session()
sess.run(m1)
print datetime.now() - startTime

并且您可以通过计算一些东西使它更加清晰(请d = tf.matrix_determinant(m1)记住,时间将会流逝O(dim^2.8)

我发现的PS是在文档中解释的:

Tensor对象是操作结果的符号句柄,但实际上并不保存操作输出的值。

No, you can not see the content of the tensor without running the graph (doing session.run()). The only things you can see are:

  • the dimensionality of the tensor (but I assume it is not hard to calculate it for the list of the operations that TF has)
  • type of the operation that will be used to generate the tensor (transpose_1:0, random_uniform:0)
  • type of elements in the tensor (float32)

I have not found this in documentation, but I believe that the values of the variables (and some of the constants are not calculated at the time of assignment).


Take a look at this example:

import tensorflow as tf
from datetime import datetime
dim = 7000

The first example where I just initiate a constant Tensor of random numbers run approximately the same time irrespectibly of dim (0:00:00.003261)

startTime = datetime.now()
m1 = tf.truncated_normal([dim, dim], mean=0.0, stddev=0.02, dtype=tf.float32, seed=1)
print datetime.now() - startTime

In the second case, where the constant is actually gets evaluated and the values are assigned, the time clearly depends on dim (0:00:01.244642)

startTime = datetime.now()
m1 = tf.truncated_normal([dim, dim], mean=0.0, stddev=0.02, dtype=tf.float32, seed=1)
sess = tf.Session()
sess.run(m1)
print datetime.now() - startTime

And you can make it more clear by calculating something (d = tf.matrix_determinant(m1), keeping in mind that the time will run in O(dim^2.8))

P.S. I found were it is explained in documentation:

A Tensor object is a symbolic handle to the result of an operation, but does not actually hold the values of the operation’s output.


回答 4

我认为您需要弄清一些基本知识。在上面的示例中,您已经创建了张量(多维数组)。但是要使张量流真正起作用,您必须启动一个“ 会话 ”并在该会话中运行“ 操作 ”。注意单词“会话”和“操作”。您需要了解4件事才能使用tensorflow:

  1. 张量
  2. 运作方式
  3. 届会
  4. 图表

现在,从您写的内容中,您已经给出了张量和操作,但是您没有运行会话或图形。张量(图的边缘)流经图,并由操作(图的节点)操纵。有默认图形,但您可以在会话中启动图形。

说出print时,只能访问定义的变量或常量的形状。

这样您就可以看到丢失的内容:

 with tf.Session() as sess:     
           print(sess.run(product))
           print (product.eval())

希望能帮助到你!

I think you need to get some fundamentals right. With the examples above you have created tensors (multi dimensional array). But for tensor flow to really work you have to initiate a “session” and run your “operation” in the session. Notice the word “session” and “operation”. You need to know 4 things to work with tensorflow:

  1. tensors
  2. Operations
  3. Sessions
  4. Graphs

Now from what you wrote out you have given the tensor, and the operation but you have no session running nor a graph. Tensor (edges of the graph) flow through graphs and are manipulated by operations (nodes of the graph). There is default graph but you can initiate yours in a session.

When you say print , you only access the shape of the variable or constant you defined.

So you can see what you are missing :

 with tf.Session() as sess:     
           print(sess.run(product))
           print (product.eval())

Hope it helps!


回答 5

Tensorflow 1.x

import tensorflow as tf
tf.enable_eager_execution()
matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.],[2.]])
product = tf.matmul(matrix1, matrix2)

#print the product
print(product)         # tf.Tensor([[12.]], shape=(1, 1), dtype=float32)
print(product.numpy()) # [[12.]]

在Tensorflow 2.x中,默认情况下启用了eager模式。因此以下代码适用于TF2.0。

import tensorflow as tf
matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.],[2.]])
product = tf.matmul(matrix1, matrix2)

#print the product
print(product)         # tf.Tensor([[12.]], shape=(1, 1), dtype=float32)
print(product.numpy()) # [[12.]]

In Tensorflow 1.x

import tensorflow as tf
tf.enable_eager_execution()
matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.],[2.]])
product = tf.matmul(matrix1, matrix2)

#print the product
print(product)         # tf.Tensor([[12.]], shape=(1, 1), dtype=float32)
print(product.numpy()) # [[12.]]

With Tensorflow 2.x, eager mode is enabled by default. so the following code works with TF2.0.

import tensorflow as tf
matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.],[2.]])
product = tf.matmul(matrix1, matrix2)

#print the product
print(product)         # tf.Tensor([[12.]], shape=(1, 1), dtype=float32)
print(product.numpy()) # [[12.]]

回答 6

根据上述答案,使用您的特定代码段,您可以像这样打印产品:

import tensorflow as tf
#Initialize the session
sess = tf.InteractiveSession()

matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.],[2.]])
product = tf.matmul(matrix1, matrix2)

#print the product
print(product.eval())

#close the session to release resources
sess.close()

Based on the answers above, with your particular code snippet you can print the product like this:

import tensorflow as tf
#Initialize the session
sess = tf.InteractiveSession()

matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.],[2.]])
product = tf.matmul(matrix1, matrix2)

#print the product
print(product.eval())

#close the session to release resources
sess.close()

回答 7

在Tensorflow 2.0+中(或在Eager模式环境中),您可以调用.numpy()方法:

import tensorflow as tf

matrix1 = tf.constant([[3., 3.0]])
matrix2 = tf.constant([[2.0],[2.0]])
product = tf.matmul(matrix1, matrix2)

print(product.numpy()) 

In Tensorflow 2.0+ (or in Eager mode environment) you can call .numpy() method:

import tensorflow as tf

matrix1 = tf.constant([[3., 3.0]])
matrix2 = tf.constant([[2.0],[2.0]])
product = tf.matmul(matrix1, matrix2)

print(product.numpy()) 

回答 8

tf.keras.backend.eval 对于评估小表达式很有用。

tf.keras.backend.eval(op)

TF 1.x和TF 2.0兼容。


最小的可验证示例

from tensorflow.keras.backend import eval

m1 = tf.constant([[3., 3.]])
m2 = tf.constant([[2.],[2.]])

eval(tf.matmul(m1, m2))
# array([[12.]], dtype=float32)

这很有用,因为您不必显式创建Sessionor InteractiveSession

tf.keras.backend.eval is useful for evaluating small expressions.

tf.keras.backend.eval(op)

TF 1.x and TF 2.0 compatible.


Minimal Verifiable Example

from tensorflow.keras.backend import eval

m1 = tf.constant([[3., 3.]])
m2 = tf.constant([[2.],[2.]])

eval(tf.matmul(m1, m2))
# array([[12.]], dtype=float32)

This is useful because you do not have to explicitly create a Session or InteractiveSession.


回答 9

您可以通过启用急切执行来检查TensorObject的输出而无需在会话中运行图

只需添加以下两行代码: import tensorflow.contrib.eager as tfe tfe.enable_eager_execution()

在你之后 import tensorflow

输出 print product您的示例中现在为: tf.Tensor([[ 12.]], shape=(1, 1), dtype=float32)

请注意,截至目前(2017年11月),您必须每晚安装Tensorflow构建以启用渴望执行的功能。预制的车轮可以在这里找到。

You can check the output of a TensorObject without running the graph in a session, by enabling eager execution.

Simply add the following two lines of code: import tensorflow.contrib.eager as tfe tfe.enable_eager_execution()

right after you import tensorflow.

The output of print product in your example will now be: tf.Tensor([[ 12.]], shape=(1, 1), dtype=float32)

Note that as of now (November 2017) you’ll have to install a Tensorflow nightly build to enable eager execution. Pre-built wheels can be found here.


回答 10

请注意,这tf.Print()将更改张量名称。如果您要打印的张量是一个占位符,则向其馈送数据将失败,因为在馈送过程中将找不到原始名称。例如:

import tensorflow as tf
tens = tf.placeholder(tf.float32,[None,2],name="placeholder")
print(eval("tens"))
tens = tf.Print(tens,[tens, tf.shape(tens)],summarize=10,message="tens:")
print(eval("tens"))
res = tens + tens
sess = tf.Session()
sess.run(tf.global_variables_initializer())

print(sess.run(res))

输出为:

python test.py
Tensor("placeholder:0", shape=(?, 2), dtype=float32)
Tensor("Print:0", shape=(?, 2), dtype=float32)
Traceback (most recent call last):
[...]
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'placeholder' with dtype float

Please note that tf.Print() will change the tensor name. If the tensor you seek to print is a placeholder, feeding data to it will fail as the original name will not be found during feeding. For example:

import tensorflow as tf
tens = tf.placeholder(tf.float32,[None,2],name="placeholder")
print(eval("tens"))
tens = tf.Print(tens,[tens, tf.shape(tens)],summarize=10,message="tens:")
print(eval("tens"))
res = tens + tens
sess = tf.Session()
sess.run(tf.global_variables_initializer())

print(sess.run(res))

Output is:

python test.py
Tensor("placeholder:0", shape=(?, 2), dtype=float32)
Tensor("Print:0", shape=(?, 2), dtype=float32)
Traceback (most recent call last):
[...]
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'placeholder' with dtype float

回答 11

您应该将TensorFlow Core程序视为由两个独立的部分组成:

  • 建立计算图。
  • 运行计算图。

因此,对于下面的代码,您只需构建计算图。

matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.],[2.]])
product = tf.matmul(matrix1, matrix2)

您还需要在TensorFlow程序中初始化所有变量,还必须显式调用一个特殊操作,如下所示:

init = tf.global_variables_initializer()

现在,您要构建图并初始化所有变量,下一步是评估节点,您必须在会话中运行计算图。会话封装了TensorFlow运行时的控件和状态。

以下代码创建一个Session对象,然后调用其run方法来运行足够的计算图以进行评估product

sess = tf.Session()
// run variables initializer
sess.run(init)

print(sess.run([product]))

You should think of TensorFlow Core programs as consisting of two discrete sections:

  • Building the computational graph.
  • Running the computational graph.

So for the code below you just Build the computational graph.

matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.],[2.]])
product = tf.matmul(matrix1, matrix2)

You need also To initialize all the variables in a TensorFlow program , you must explicitly call a special operation as follows:

init = tf.global_variables_initializer()

Now you build the graph and initialized all variables ,next step is to evaluate the nodes, you must run the computational graph within a session. A session encapsulates the control and state of the TensorFlow runtime.

The following code creates a Session object and then invokes its run method to run enough of the computational graph to evaluate product :

sess = tf.Session()
// run variables initializer
sess.run(init)

print(sess.run([product]))

回答 12

您可以使用Keras,单行答案将是使用这样的eval方法:

import keras.backend as K
print(K.eval(your_tensor))

You can use Keras, one-line answer will be to use eval method like so:

import keras.backend as K
print(K.eval(your_tensor))

回答 13

试试这个简单的代码!(不言而喻)

import tensorflow as tf
sess = tf.InteractiveSession() # see the answers above :)
x = [[1.,2.,1.],[1.,1.,1.]]    # a 2D matrix as input to softmax
y = tf.nn.softmax(x)           # this is the softmax function
                               # you can have anything you like here
u = y.eval()
print(u)

Try this simple code! (it is self explanatory)

import tensorflow as tf
sess = tf.InteractiveSession() # see the answers above :)
x = [[1.,2.,1.],[1.,1.,1.]]    # a 2D matrix as input to softmax
y = tf.nn.softmax(x)           # this is the softmax function
                               # you can have anything you like here
u = y.eval()
print(u)

回答 14

即使阅读完所有答案,我仍然不容易理解需要执行的操作,直到执行完为止。TensofFlow对我来说也是新的。

def printtest():
x = tf.constant([1.0, 3.0])
x = tf.Print(x,[x],message="Test")
init = (tf.global_variables_initializer(), tf.local_variables_initializer())
b = tf.add(x, x)
with tf.Session() as sess:
    sess.run(init)
    print(sess.run(b))
    sess.close()

但是您仍然可能需要通过执行会话来返回值。

def printtest():
    x = tf.constant([100.0])
    x = tf.Print(x,[x],message="Test")
    init = (tf.global_variables_initializer(), tf.local_variables_initializer())
    b = tf.add(x, x)
    with tf.Session() as sess:
        sess.run(init)
        c = sess.run(b)
        print(c)
        sess.close()

I didn’t find it easy to understand what is required even after reading all the answers until I executed this. TensofFlow is new to me too.

def printtest():
x = tf.constant([1.0, 3.0])
x = tf.Print(x,[x],message="Test")
init = (tf.global_variables_initializer(), tf.local_variables_initializer())
b = tf.add(x, x)
with tf.Session() as sess:
    sess.run(init)
    print(sess.run(b))
    sess.close()

But still you may need the value returned by executing the session.

def printtest():
    x = tf.constant([100.0])
    x = tf.Print(x,[x],message="Test")
    init = (tf.global_variables_initializer(), tf.local_variables_initializer())
    b = tf.add(x, x)
    with tf.Session() as sess:
        sess.run(init)
        c = sess.run(b)
        print(c)
        sess.close()

回答 15

基本上,在tensorflow中,当您创建任何类型的张量时,它们都会创建并存储在其中,只有在运行tensorflow会话时才能访问它们。假设您创建了一个恒定张量,而
c = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
无需运行会话,则可以获得
op:一个操作。计算该张量的运算。
value_index:int。产生此张量的操作端点的索引。
dtype:DType。存储在该张量中的元素的类型。

要获取值,可以使用所需的张量运行会话:

with tf.Session() as sess:
    print(sess.run(c))
    sess.close()

输出将是这样的:

array([[1。,2.,3.],[4.,5.,6.]],dtype = float32)

Basically, in tensorflow when you create a tensor of any sort they are created and stored inside which can be accessed only when you run a tensorflow session. Say you have created a constant tensor
c = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
Without running a session, you can get
op: An Operation. Operation that computes this tensor.
value_index: An int. Index of the operation’s endpoint that produces this tensor.
dtype: A DType. Type of elements stored in this tensor.

To get the values you can run a session with the tensor you require as:

with tf.Session() as sess:
    print(sess.run(c))
    sess.close()

The output will be something like this:

array([[1., 2., 3.], [4., 5., 6.]], dtype=float32)


回答 16

启用1.10版后在tensorflow中引入的热切执行。它很容易使用。

# Initialize session
import tensorflow as tf
tf.enable_eager_execution()


# Some tensor we want to print the value of
a = tf.constant([1.0, 3.0])

print(a)

Enable the eager execution which is introduced in tensorflow after version 1.10. It’s very easy to use.

# Initialize session
import tensorflow as tf
tf.enable_eager_execution()


# Some tensor we want to print the value of
a = tf.constant([1.0, 3.0])

print(a)

回答 17

使用https://www.tensorflow.org/api_docs/python/tf/print中提供的提示,我使用该log_d函数来打印格式化的字符串。

import tensorflow as tf

def log_d(fmt, *args):
    op = tf.py_func(func=lambda fmt_, *args_: print(fmt%(*args_,)),
                    inp=[fmt]+[*args], Tout=[])
    return tf.control_dependencies([op])


# actual code starts now...

matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.],[2.]])
product = tf.matmul(matrix1, matrix2)

with log_d('MAT1: %s, MAT2: %s', matrix1, matrix2): # this will print the log line
    product = tf.matmul(matrix1, matrix2)

with tf.Session() as sess:
    sess.run(product)

Using tips provided in https://www.tensorflow.org/api_docs/python/tf/print I use the log_d function to print formatted strings.

import tensorflow as tf

def log_d(fmt, *args):
    op = tf.py_func(func=lambda fmt_, *args_: print(fmt%(*args_,)),
                    inp=[fmt]+[*args], Tout=[])
    return tf.control_dependencies([op])


# actual code starts now...

matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.],[2.]])
product = tf.matmul(matrix1, matrix2)

with log_d('MAT1: %s, MAT2: %s', matrix1, matrix2): # this will print the log line
    product = tf.matmul(matrix1, matrix2)

with tf.Session() as sess:
    sess.run(product)

回答 18

import tensorflow as tf
sess = tf.InteractiveSession()
x = [[1.,2.,1.],[1.,1.,1.]]    
y = tf.nn.softmax(x)           

matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.],[2.]])
product = tf.matmul(matrix1, matrix2)

print(product.eval())
tf.reset_default_graph()
sess.close()
import tensorflow as tf
sess = tf.InteractiveSession()
x = [[1.,2.,1.],[1.,1.,1.]]    
y = tf.nn.softmax(x)           

matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.],[2.]])
product = tf.matmul(matrix1, matrix2)

print(product.eval())
tf.reset_default_graph()
sess.close()

回答 19

现在不建议使用tf.Print,这是使用tf.print(小写p)的方法。

虽然运行会话是一个不错的选择,但并非总是可行的。例如,您可能想在特定会话中打印一些张量。

新的打印方法返回一个没有输出张量的打印操作:

print_op = tf.print(tensor_to_print)

由于它没有输出,因此无法像使用tf.Print一样将其插入到图形中。相反,您可以添加它以控制会话中的依赖项,以便使其打印。

sess = tf.compat.v1.Session()
with sess.as_default():
  tensor_to_print = tf.range(10)
  print_op = tf.print(tensor_to_print)
with tf.control_dependencies([print_op]):
  tripled_tensor = tensor_to_print * 3
sess.run(tripled_tensor)

有时,在较大的图中(可能部分由子函数创建),将print_op传播到会话调用很麻烦。然后,可以使用tf.tuple将打印操作与另一个操作耦合,然后无论哪个会话执行代码,该操作都将与该操作一起运行。这是完成的方式:

print_op = tf.print(tensor_to_print)
some_tensor_list = tf.tuple([some_tensor], control_inputs=[print_op])
# Use some_tensor_list[0] instead of any_tensor below.

tf.Print is now deprecated, here’s how to use tf.print (lowercase p) instead.

While running a session is a good option, it is not always the way to go. For instance, you may want to print some tensor in a particular session.

The new print method returns a print operation which has no output tensors:

print_op = tf.print(tensor_to_print)

Since it has no outputs, you can’t insert it in a graph the same way as you could with tf.Print. Instead, you can you can add it to control dependencies in your session in order to make it print.

sess = tf.compat.v1.Session()
with sess.as_default():
  tensor_to_print = tf.range(10)
  print_op = tf.print(tensor_to_print)
with tf.control_dependencies([print_op]):
  tripled_tensor = tensor_to_print * 3
sess.run(tripled_tensor)

Sometimes, in a larger graph, maybe created partly in subfunctions, it is cumbersome to propagate the print_op to the session call. Then, tf.tuple can be used to couple the print operation with another operation, which will then run with that operation whichever session executes the code. Here’s how that is done:

print_op = tf.print(tensor_to_print)
some_tensor_list = tf.tuple([some_tensor], control_inputs=[print_op])
# Use some_tensor_list[0] instead of any_tensor below.

回答 20

问题:如何在TensorFlow中打印Tensor对象的值?

回答:

import tensorflow as tf

# Variable
x = tf.Variable([[1,2,3]])

# initialize
init = (tf.global_variables_initializer(), tf.local_variables_initializer())

# Create a session
sess = tf.Session()

# run the session
sess.run(init)

# print the value
sess.run(x)

Question: How to print the value of a Tensor object in TensorFlow?

Answer:

import tensorflow as tf

# Variable
x = tf.Variable([[1,2,3]])

# initialize
init = (tf.global_variables_initializer(), tf.local_variables_initializer())

# Create a session
sess = tf.Session()

# run the session
sess.run(init)

# print the value
sess.run(x)

Pytorch-强GPU加速的Python中的张量和动态神经网络

PyTorch是一个Python软件包,提供了两个高级功能。

张量计算(如NumPy),具有强大的GPU加速能力,建立在基于磁带的autograd系统上的深度神经网络,在需要的时候,你可以重复使用你最喜欢的Python包,如NumPy、SciPy和Cython,来扩展PyTorch。

系统 3.6 3.7 3.8
Linux CPU
Linux GPU
Windows CPU/GPU
Linux(Ppc64le)CPU
Linux(Ppc64le)GPU
Linux(Aarch64)CPU

另请参阅ci.pytorch.org HUD

更多关于PyTorch的信息

在粒度级别上,PyTorch是一个由以下组件组成的库:

组件 描述
火炬 像NumPy这样的张量器库,具有强大的GPU支持
torch.autograd 基于磁带的自动区分库,支持TORCH中的所有可微分张量操作
torch.jit 从PyTorch代码创建可序列化和可优化模型的编译堆栈(TorchScript)
torch.nn 与Autograd深度集成的神经网络库,旨在实现最大的灵活性
torch.multiprocessing Python多处理,但具有跨进程共享火炬张量的神奇内存。适用于数据加载和HogWild培训
torch.utils 为方便起见,DataLoader和其他实用程序功能

通常,PyTorch用作以下任一用途:

  • 替代NumPy使用GPU的功能
  • 提供最大灵活性和速度的深度学习研究平台

进一步阐述:

一种支持GPU的张量库

如果您使用NumPy,则您使用的是张量(也称为(Ndarray)

PyTorch提供的张量既可以在CPU上运行,也可以在GPU上运行,从而将计算速度大幅提高

我们提供各种各样的张量例程来加速和满足您的科学计算需求,例如切片、索引、数学运算、线性代数、约简。而且他们跑得很快!

动态神经网络:基于磁带的自动评分

PyTorch有一种构建神经网络的独特方式:使用和重放磁带录音机

大多数框架,如TensorFlow、Theano、Caffe和CNTK都有静电的世界观。人们必须建立一个神经网络,并一次又一次地重复使用相同的结构。改变网络的行为方式意味着必须从头开始

对于PyTorch,我们使用一种称为反向模式自动区分的技术,该技术允许您在没有延迟或开销的情况下任意更改网络的行为方式。我们的灵感来自于几篇关于这一主题的研究论文,以及目前和过去的工作,如手电筒-自动分级自动评分链条

虽然这种技术不是PyTorch独有的,但它是迄今为止最快的实现之一。你在疯狂的研究中获得了最快的速度和最好的灵活性

Python优先

PyTorch不是到单一C++框架的Python绑定。它是为深度集成到Python而构建的。你可以像以前一样自然地使用它NumPy/科学Py/科学工具包-学习等。您可以用Python本身编写新的神经网络层,使用您喜欢的库并使用包,如CythonNumba我们的目标是不在适当的地方重新发明轮子。

势在必行的经验

PyTorch被设计为直观、线性的思想,并且易于使用。当您执行一行代码时,它就会被执行。没有异步的世界观。当您进入调试器或接收错误消息和堆栈跟踪时,理解它们很简单。堆栈跟踪准确地指向定义代码的位置。我们希望您永远不要因为错误的堆栈跟踪或异步且不透明的执行引擎而花费数小时调试您的代码

快速精益

PyTorch的框架开销最小。我们集成了加速库,如英特尔MKL和NVIDIA(CuDNNNCCL)以最大限度地提高速度。在核心上,其CPU和GPU张量以及神经网络后端(TH、THC、THNN、THCUNN)已经成熟,并且经过多年的测试

因此,PyTorch是相当快的-无论您运行的是小型神经网络还是大型神经网络

与Torch或某些替代方案相比,PyTorch中的内存使用效率非常高。我们已经为GPU编写了自定义内存分配器,以确保您的深度学习模型具有最高的内存效率。这使您能够训练比以前更大的深度学习模型

无痛延长

编写新的神经网络模块,或者与PyTorch的张量API接口,设计得简单明了,抽象最少

您可以使用Torch API在Python中编写新的神经网络图层或者您最喜欢的基于NumPy的库,如SciPy

如果您想用C/C++编写您的层,我们提供了一个方便的扩展API,它是高效的,并且具有最少的样板。不需要编写包装器代码。你可以看到此处提供教程这里有一个例子

安装

二进制文件

通过conda或pip轮从二进制文件安装的命令在我们的网站上提供:https://pytorch.org

NVIDIA Jetson平台

NVIDIA的Jetson Nano、Jetson TX2和Jetson AGX Xavier的Python轮可通过以下URL获得:

它们需要JetPack 4.2和更高版本,以及@达斯蒂-内华达州维护它们

来自源

如果从源代码安装,则需要Python 3.6.2或更高版本和C++14编译器。此外,我们强烈建议您安装蟒蛇环境。您将获得一个高质量的BLAS库(MKL),并且无论您的Linux发行版是什么,您都可以获得受控的依赖项版本

一旦你有了蟒蛇已安装,以下是说明

如果要使用CUDA支持进行编译,请安装

如果要禁用CUDA支持,请导出环境变量USE_CUDA=0其他可能有用的环境变量可以在setup.py

如果您正在为NVIDIA的Jetson平台(Jetson Nano、TX1、TX2、AGX Xavier)构建,请参阅以下安装PyTorch for Jetson Nano的说明此处提供

如果要使用ROCM支持进行编译,请安装

  • AMD ROCM4.0及更高版本的安装
  • ROCM目前仅支持Linux系统

如果要禁用ROCM支持,请导出环境变量USE_ROCM=0其他可能有用的环境变量可以在setup.py

安装依赖项

常见

conda install astunparse numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses

在Linux上

# CUDA only: Add LAPACK support for the GPU if needed
conda install -c pytorch magma-cuda110  # or the magma-cuda* that matches your CUDA version from https://anaconda.org/pytorch/repo

在MacOS上

# Add these packages if torch.distributed is needed
conda install pkg-config libuv

在Windows上

# Add these packages if torch.distributed is needed.
# Distributed package support on Windows is a prototype feature and is subject to changes.
conda install -c conda-forge libuv=1.39

获取PyTorch源代码

git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive --jobs 0

安装PyTorch

在Linux上

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py install

请注意,如果您正在为ROCM编译,则必须首先运行此命令:

python tools/amd_build/build_amd.py

请注意,如果您使用的是Python,您可能会遇到链接器导致的错误:

build/temp.linux-x86_64-3.7/torch/csrc/stub.o: file not recognized: file format not recognized
collect2: error: ld returned 1 exit status
error: command 'g++' failed with exit status 1

这是由以下原因引起的ld从Conda环境跟踪系统ld您应该使用较新版本的Python来修复此问题。推荐的Python版本为3.6.10+、3.7.6+和3.8.1+

在MacOS上

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install

每个CUDA版本仅支持一个特定的XCode版本。据报道,以下组合可与PyTorch配合使用

CUDA版 Xcode版本
10.0 Xcode 9.4
10.1 Xcode 10.1

在Windows上

选择正确的Visual Studio版本

在Visual Studio的新版本中有时会出现回归,因此最好使用相同的Visual Studio版本16.8.5作为Pytorch CI。虽然PyTorch CI使用Visual Studio BuildTools,但您可以使用Visual Studio Enterprise、Professional或Community

如果您想构建遗留的python代码,请参阅在遗留代码和CUDA的基础上构建

使用CPU构建

使用CPU构建相当容易

有关OpenMP的说明:所需的OpenMP实施是英特尔OpenMP(IOMP)。为了链接到iomp,您需要手动下载库并通过调整设置构建环境CMAKE_INCLUDE_PATHLIB该说明这里是设置MKL和英特尔OpenMP的示例。如果没有这些CMake配置,将使用Microsoft Visual C OpenMP运行时(vcomp

使用CUDA构建

NVTX是使用CUDA构建Pytorch所必需的。NVTX是CUDA分布式的一部分,被称为“NSight Compute”。要将其安装到已安装的CUDA上,请再次运行CUDA安装并选中相应的复选框。确保在Visual Studio之后安装带Night Compute的CUDA

目前支持VS 2017/2019,支持忍者作为CMake的生成器。如果ninja.exe在以下位置检测到PATH,则使用忍者作为默认生成器,否则将使用VS 2017/2019
如果选择忍者作为生成器,则会选择最新的MSVC作为底层工具链

其他库,如岩浆oneDNN,也称为MKLDNN或DNNL,以及Sccache是经常需要的。请参阅安装帮助器要安装它们,请执行以下操作

您可以参考build_pytorch.bat其他一些环境变量配置的脚本

cmd

:: [Optional] If you want to build with the VS 2017 generator for old CUDA and PyTorch, please change the value in the next line to `Visual Studio 15 2017`.
:: Note: This value is useless if Ninja is detected. However, you can force that by using `set USE_NINJA=OFF`.
set CMAKE_GENERATOR=Visual Studio 16 2019

:: Read the content in the previous section carefully before you proceed.
:: [Optional] If you want to override the underlying toolset used by Ninja and Visual Studio with CUDA, please run the following script block.
:: "Visual Studio 2019 Developer Command Prompt" will be run automatically.
:: Make sure you have CMake >= 3.12 before you do this when you use the Visual Studio generator.
set CMAKE_GENERATOR_TOOLSET_VERSION=14.27
set DISTUTILS_USE_SDK=1
for /f "usebackq tokens=*" %i in (`"%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe" -version [15^,16^) -products * -latest -property installationPath`) do call "%i\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%CMAKE_GENERATOR_TOOLSET_VERSION%

:: [Optional] If you want to override the CUDA host compiler
set CUDAHOSTCXX=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\bin\HostX64\x64\cl.exe

python setup.py install
调整生成选项(可选)

您可以通过执行以下操作来调整cmake变量的配置(无需首先构建)。例如,可以通过这样的步骤来调整CuDNN或BLAS的预先检测的目录

在Linux上

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py build --cmake-only
ccmake build  # or cmake-gui build

在MacOS上

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build --cmake-only
ccmake build  # or cmake-gui build

Docker镜像

使用预构建映像

您还可以从Docker Hub获取预构建的坞站映像,并使用Docker v19.03+运行

docker run --gpus all --rm -ti --ipc=host pytorch/pytorch:latest

请注意,PyTorch使用共享内存在进程之间共享数据,因此如果使用Torch多处理(例如,对于多线程数据加载器),运行容器的默认共享内存段大小是不够的,您应该使用以下两种方法之一来增加共享内存大小--ipc=host--shm-size命令行选项用于nvidia-docker run

自己打造形象

注:必须使用18.06以上的坞站版本构建

这个Dockerfile用于构建支持CUDA 11.1和cuDNN V8的映像。你可以通过PYTHON_VERSION=x.yMake Variable指定Miniconda要使用的Python版本,或将其保留为未设置为使用默认版本

make -f docker.Makefile
# images are tagged as docker.io/${your_docker_username}/pytorch

构建文档

要构建各种格式的文档,您需要狮身人面像以及阅读文档的主题

cd docs/
pip install -r requirements.txt

然后,您可以通过运行以下命令来生成文档make <format>docs/文件夹。跑make要获取所有可用输出格式的列表,请执行以下操作

如果运行Katex错误npm install katex如果它持续存在,请尝试npm install -g katex

以前的版本

早期PyTorch版本的安装说明和二进制文件可在以下位置找到我们的网站

快速入门

帮助您入门的三点建议:

资源

沟通

发布和贡献

PyTorch有90天的发布周期(主要版本)。如果您通过以下方式遇到错误,请通知我们提交问题

我们感谢所有的贡献。如果您计划回馈错误修复,请在不做任何进一步讨论的情况下这样做

如果您计划为核心贡献新的功能、实用程序功能或扩展,请先打开一个问题并与我们讨论该功能。未经讨论发送PR可能最终导致拒绝PR,因为我们可能会将核心带到与您可能意识到的方向不同的方向

要了解更多关于为Pytorch做出贡献的信息,请参阅我们的投稿页面

团队

PyTorch是一个社区驱动的项目,有几个熟练的工程师和研究人员参与其中

PyTorch目前由亚当·帕兹克萨姆·格罗斯苏史密斯·钦塔拉格雷戈里·查南(Gregory Chanan)主要贡献来自于数以百计的各种形式和手段的人才。值得一提的是:特雷弗·基林(Trevor Killeen)、萨桑克·奇拉姆库尔蒂(Sasank Chilamkurthy)、谢尔盖·扎戈鲁伊科(Sergey Zagoruyko)、亚当·莱勒(Adam Lerer)、弗朗西斯科·马萨(Francisco Massa)、阿利汗·特贾尼(Alykhan Tejani)、卢卡·安提加(Luca Antiga)、阿尔班·德斯迈森(Alban Desmaison)、安德烈亚斯·科普夫(Andreas Koepf)、詹姆斯·布拉德伯里(James Bradbury)、林泽明、田远东、纪尧姆·兰普尔(Guillaume Lample

注:此项目与哈伯金/火炬同名同姓。休是Torch社区的一位有价值的贡献者,并在Torch和PyTorch的许多事情上提供了帮助

许可证

PyTorch具有BSD样式的许可证,可以在许可证