PyTorch中的“视图”方法如何工作?

问题:PyTorch中的“视图”方法如何工作?

我对方法感到困惑 view()对以下代码片段中。

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool  = nn.MaxPool2d(2,2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1   = nn.Linear(16*5*5, 120)
        self.fc2   = nn.Linear(120, 84)
        self.fc3   = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

我的困惑是关于以下几行。

x = x.view(-1, 16*5*5)

是什么 tensor.view()函数有作用?我已经在很多地方看到了它的用法,但是我不明白它是如何解释其参数的。

如果我给负值作为参数,会发生什么? view()函数怎样?例如,如果我打电话给我tensor_variable.view(1, 1, -1)怎么办?

谁能view()用一些例子解释功能的主要原理?

I am confused about the method view() in the following code snippet.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool  = nn.MaxPool2d(2,2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1   = nn.Linear(16*5*5, 120)
        self.fc2   = nn.Linear(120, 84)
        self.fc3   = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

My confusion is regarding the following line.

x = x.view(-1, 16*5*5)

What does tensor.view() function do? I have seen its usage in many places, but I can’t understand how it interprets its parameters.

What happens if I give negative values as parameters to the view() function? For example, what happens if I call, tensor_variable.view(1, 1, -1)?

Can anyone explain the main principle of view() function with some examples?


回答 0

视图功能旨在重塑张量。

说你有张量

import torch
a = torch.range(1, 16)

a是具有16个元素(从1到16(包括))的张量。如果要重塑该张量以使其成为4 x 4张量,则可以使用

a = a.view(4, 4)

现在a将是4 x 4张量。请注意,在重塑后,元素总数必须保持不变。重塑张a3 x 5张量是不恰当的。

参数-1是什么意思?

如果在某些情况下您不知道要多少行,但是确定了列数,则可以将其指定为-1。(请注意,您可以将其扩展到具有更大尺寸的张量。轴值之一只能是-1)。这是一种告诉库的方法:“给我一个具有这么多列的张量,然后您就可以计算出实现此目的所需的适当行数”。

可以在上面给出的神经网络代码中看到。在x = self.pool(F.relu(self.conv2(x)))前进功能中的线之后,您将具有16深度特征图。您必须将其展平以将其分配给完全连接的层。因此,您告诉pytorch重塑所获得的张量,使其具有特定的列数,并告诉它自己决定行数。

在numpy和pytorch之间绘制相似之处, view类似于numpy的重塑功能。

The view function is meant to reshape the tensor.

Say you have a tensor

import torch
a = torch.range(1, 16)

a is a tensor that has 16 elements from 1 to 16(included). If you want to reshape this tensor to make it a 4 x 4 tensor then you can use

a = a.view(4, 4)

Now a will be a 4 x 4 tensor. Note that after the reshape the total number of elements need to remain the same. Reshaping the tensor a to a 3 x 5 tensor would not be appropriate.

What is the meaning of parameter -1?

If there is any situation that you don’t know how many rows you want but are sure of the number of columns, then you can specify this with a -1. (Note that you can extend this to tensors with more dimensions. Only one of the axis value can be -1). This is a way of telling the library: “give me a tensor that has these many columns and you compute the appropriate number of rows that is necessary to make this happen”.

This can be seen in the neural network code that you have given above. After the line x = self.pool(F.relu(self.conv2(x))) in the forward function, you will have a 16 depth feature map. You have to flatten this to give it to the fully connected layer. So you tell pytorch to reshape the tensor you obtained to have specific number of columns and tell it to decide the number of rows by itself.

Drawing a similarity between numpy and pytorch, view is similar to numpy’s reshape function.


回答 1

让我们做一些例子,从简单到困难。

  1. view方法返回的张量具有与张量相同的数据self(这意味着返回的张量具有相同数量的元素),但形状不同。例如:

    a = torch.arange(1, 17)  # a's shape is (16,)
    
    a.view(4, 4) # output below
      1   2   3   4
      5   6   7   8
      9  10  11  12
     13  14  15  16
    [torch.FloatTensor of size 4x4]
    
    a.view(2, 2, 4) # output below
    (0 ,.,.) = 
    1   2   3   4
    5   6   7   8
    
    (1 ,.,.) = 
     9  10  11  12
    13  14  15  16
    [torch.FloatTensor of size 2x2x4]
  2. 假设这-1不是参数之一,则将它们相乘时,结果必须等于张量中的元素数量。如果您执行以下操作:a.view(3, 3),它将引发一个RuntimeError原因,因为形状(3 x 3)不适用于具有16个元素的输入。换句话说:3 x 3不等于16而是9。

  3. 您可以将其-1用作传递给函数的参数之一,但只能使用一次。所有发生的事情是该方法将为您完成如何填充该维​​度的数学运算。例如a.view(2, -1, 4)等于a.view(2, 2, 4)。[16 /(2 x 4)= 2]

  4. 请注意,返回的张量共享相同的数据。如果您在“视图”中进行了更改,那么您正在更改原始张量的数据:

    b = a.view(4, 4)
    b[0, 2] = 2
    a[2] == 3.0
    False
  5. 现在,对于更复杂的用例。该文档说,每个新视图维必须是原始维的子空间,或者只能是跨度d,d + 1,…,d + k,它们满足以下所有i = 0,…的连续性条件。 ..,k-1,stride [i] = stride [i +1] x size [i +1]。否则,contiguous()需要先调用才能查看张量。例如:

    a = torch.rand(5, 4, 3, 2) # size (5, 4, 3, 2)
    a_t = a.permute(0, 2, 3, 1) # size (5, 3, 2, 4)
    
    # The commented line below will raise a RuntimeError, because one dimension
    # spans across two contiguous subspaces
    # a_t.view(-1, 4)
    
    # instead do:
    a_t.contiguous().view(-1, 4)
    
    # To see why the first one does not work and the second does,
    # compare a.stride() and a_t.stride()
    a.stride() # (24, 6, 2, 1)
    a_t.stride() # (24, 2, 1, 6)

    请注意,对于a_t,因为24!= 2 x 3,所以stride [0]!= stride [1] x size [1]

Let’s do some examples, from simpler to more difficult.

  1. The view method returns a tensor with the same data as the self tensor (which means that the returned tensor has the same number of elements), but with a different shape. For example:

    a = torch.arange(1, 17)  # a's shape is (16,)
    
    a.view(4, 4) # output below
      1   2   3   4
      5   6   7   8
      9  10  11  12
     13  14  15  16
    [torch.FloatTensor of size 4x4]
    
    a.view(2, 2, 4) # output below
    (0 ,.,.) = 
    1   2   3   4
    5   6   7   8
    
    (1 ,.,.) = 
     9  10  11  12
    13  14  15  16
    [torch.FloatTensor of size 2x2x4]
    
  2. Assuming that -1 is not one of the parameters, when you multiply them together, the result must be equal to the number of elements in the tensor. If you do: a.view(3, 3), it will raise a RuntimeError because shape (3 x 3) is invalid for input with 16 elements. In other words: 3 x 3 does not equal 16 but 9.

  3. You can use -1 as one of the parameters that you pass to the function, but only once. All that happens is that the method will do the math for you on how to fill that dimension. For example a.view(2, -1, 4) is equivalent to a.view(2, 2, 4). [16 / (2 x 4) = 2]

  4. Notice that the returned tensor shares the same data. If you make a change in the “view” you are changing the original tensor’s data:

    b = a.view(4, 4)
    b[0, 2] = 2
    a[2] == 3.0
    False
    
  5. Now, for a more complex use case. The documentation says that each new view dimension must either be a subspace of an original dimension, or only span d, d + 1, …, d + k that satisfy the following contiguity-like condition that for all i = 0, …, k – 1, stride[i] = stride[i + 1] x size[i + 1]. Otherwise, contiguous() needs to be called before the tensor can be viewed. For example:

    a = torch.rand(5, 4, 3, 2) # size (5, 4, 3, 2)
    a_t = a.permute(0, 2, 3, 1) # size (5, 3, 2, 4)
    
    # The commented line below will raise a RuntimeError, because one dimension
    # spans across two contiguous subspaces
    # a_t.view(-1, 4)
    
    # instead do:
    a_t.contiguous().view(-1, 4)
    
    # To see why the first one does not work and the second does,
    # compare a.stride() and a_t.stride()
    a.stride() # (24, 6, 2, 1)
    a_t.stride() # (24, 2, 1, 6)
    

    Notice that for a_t, stride[0] != stride[1] x size[1] since 24 != 2 x 3


回答 2

torch.Tensor.view()

简而言之,torch.Tensor.view()numpy.ndarray.reshape()或启发numpy.reshape(),创建了一个新视图,只要新形状与原始张量的形状兼容张量。

让我们通过一个具体的例子来详细了解这一点。

In [43]: t = torch.arange(18) 

In [44]: t 
Out[44]: 
tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17])

有了这个张量t的形状(18,),新观点可以为以下形状创建:

(1, 18)或等效 (1, -1)或 或等效 或 或等效 或 或等效 或 或等效 或或等效 或(-1, 18)
(2, 9)(2, -1)(-1, 9)
(3, 6)(3, -1)(-1, 6)
(6, 3)(6, -1)(-1, 3)
(9, 2)(9, -1)(-1, 2)
(18, 1)(18, -1)(-1, 1)

正如我们可以从已经上述形状元组观察,形状元组(例如中的元素的乘法运算2*93*6等)必须始终等于在原始张量元素的总数(18在我们的例子)。

要观察的另一件事是,我们-1在每个形状元组的一个位置中使用了a 。通过使用a -1,我们懒于自己进行计算,而是将任务委托给PyTorch来在形状创建新视图时对该形状进行该值的计算。需要注意的重要一件事是,我们只能-1在形状元组中使用单个。其余值应由我们明确提供。其他PyTorch会抱怨RuntimeError

RuntimeError:只能推断一个维度

因此,使用上述所有形状,PyTorch将始终返回原始张量的新视图t。这基本上意味着,它只是针对所请求的每个新视图更改张量的步幅信息。

下面是一些示例,说明每个新视图如何改变张量的步幅。

# stride of our original tensor `t`
In [53]: t.stride() 
Out[53]: (1,)

现在,我们将看到新视图的大步前进:

# shape (1, 18)
In [54]: t1 = t.view(1, -1)
# stride tensor `t1` with shape (1, 18)
In [55]: t1.stride() 
Out[55]: (18, 1)

# shape (2, 9)
In [56]: t2 = t.view(2, -1)
# stride of tensor `t2` with shape (2, 9)
In [57]: t2.stride()       
Out[57]: (9, 1)

# shape (3, 6)
In [59]: t3 = t.view(3, -1) 
# stride of tensor `t3` with shape (3, 6)
In [60]: t3.stride() 
Out[60]: (6, 1)

# shape (6, 3)
In [62]: t4 = t.view(6,-1)
# stride of tensor `t4` with shape (6, 3)
In [63]: t4.stride() 
Out[63]: (3, 1)

# shape (9, 2)
In [65]: t5 = t.view(9, -1) 
# stride of tensor `t5` with shape (9, 2)
In [66]: t5.stride()
Out[66]: (2, 1)

# shape (18, 1)
In [68]: t6 = t.view(18, -1)
# stride of tensor `t6` with shape (18, 1)
In [69]: t6.stride()
Out[69]: (1, 1)

这就是view()功能的魔力。它只是改变(原始)张量的步幅为每个新的观点,只要新的形状视图是与原来的形状相容。

从跨步元组可能会观察到的另一件有趣的事情是,在形状元组的第0 位置的元素的值等于在形状元组的第一个位置的元素的值。

In [74]: t3.shape 
Out[74]: torch.Size([3, 6])
                        |
In [75]: t3.stride()    |
Out[75]: (6, 1)         |
          |_____________|

这是因为:

In [76]: t3 
Out[76]: 
tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11],
        [12, 13, 14, 15, 16, 17]])

步幅(6, 1)说,从一个元素到下一个元素沿0 维度,我们要或采取6个步骤。(即从去06,人们必须采取6个步骤。)但是,从一个元素去的1个一个元素ST层面,我们只需要只差一步(例如,用于从去23)。

因此,步幅信息是如何从存储器访问元素以执行计算的核心。


torch.reshape()

此函数将返回一个视图,并且与使用完全相同torch.Tensor.view()只要新形状与原始张量的形状兼容,与之。否则,它将返回一个副本。

但是,注意事项torch.reshape()警告:

连续的输入和具有兼容步幅的输入可以在不复制的情况下进行重塑,但其中一个不应依赖于复制与查看行为。

torch.Tensor.view()

Simply put, torch.Tensor.view() which is inspired by numpy.ndarray.reshape() or numpy.reshape(), creates a new view of the tensor, as long as the new shape is compatible with the shape of the original tensor.

Let’s understand this in detail using a concrete example.

In [43]: t = torch.arange(18) 

In [44]: t 
Out[44]: 
tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17])

With this tensor t of shape (18,), new views can only be created for the following shapes:

(1, 18) or equivalently (1, -1) or (-1, 18)
(2, 9) or equivalently (2, -1) or (-1, 9)
(3, 6) or equivalently (3, -1) or (-1, 6)
(6, 3) or equivalently (6, -1) or (-1, 3)
(9, 2) or equivalently (9, -1) or (-1, 2)
(18, 1) or equivalently (18, -1) or (-1, 1)

As we can already observe from the above shape tuples, the multiplication of the elements of the shape tuple (e.g. 2*9, 3*6 etc.) must always be equal to the total number of elements in the original tensor (18 in our example).

Another thing to observe is that we used a -1 in one of the places in each of the shape tuples. By using a -1, we are being lazy in doing the computation ourselves and rather delegate the task to PyTorch to do calculation of that value for the shape when it creates the new view. One important thing to note is that we can only use a single -1 in the shape tuple. The remaining values should be explicitly supplied by us. Else PyTorch will complain by throwing a RuntimeError:

RuntimeError: only one dimension can be inferred

So, with all of the above mentioned shapes, PyTorch will always return a new view of the original tensor t. This basically means that it just changes the stride information of the tensor for each of the new views that are requested.

Below are some examples illustrating how the strides of the tensors are changed with each new view.

# stride of our original tensor `t`
In [53]: t.stride() 
Out[53]: (1,)

Now, we will see the strides for the new views:

# shape (1, 18)
In [54]: t1 = t.view(1, -1)
# stride tensor `t1` with shape (1, 18)
In [55]: t1.stride() 
Out[55]: (18, 1)

# shape (2, 9)
In [56]: t2 = t.view(2, -1)
# stride of tensor `t2` with shape (2, 9)
In [57]: t2.stride()       
Out[57]: (9, 1)

# shape (3, 6)
In [59]: t3 = t.view(3, -1) 
# stride of tensor `t3` with shape (3, 6)
In [60]: t3.stride() 
Out[60]: (6, 1)

# shape (6, 3)
In [62]: t4 = t.view(6,-1)
# stride of tensor `t4` with shape (6, 3)
In [63]: t4.stride() 
Out[63]: (3, 1)

# shape (9, 2)
In [65]: t5 = t.view(9, -1) 
# stride of tensor `t5` with shape (9, 2)
In [66]: t5.stride()
Out[66]: (2, 1)

# shape (18, 1)
In [68]: t6 = t.view(18, -1)
# stride of tensor `t6` with shape (18, 1)
In [69]: t6.stride()
Out[69]: (1, 1)

So that’s the magic of the view() function. It just changes the strides of the (original) tensor for each of the new views, as long as the shape of the new view is compatible with the original shape.

Another interesting thing one might observe from the strides tuples is that the value of the element in the 0th position is equal to the value of the element in the 1st position of the shape tuple.

In [74]: t3.shape 
Out[74]: torch.Size([3, 6])
                        |
In [75]: t3.stride()    |
Out[75]: (6, 1)         |
          |_____________|

This is because:

In [76]: t3 
Out[76]: 
tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11],
        [12, 13, 14, 15, 16, 17]])

the stride (6, 1) says that to go from one element to the next element along the 0th dimension, we have to jump or take 6 steps. (i.e. to go from 0 to 6, one has to take 6 steps.) But to go from one element to the next element in the 1st dimension, we just need only one step (for e.g. to go from 2 to 3).

Thus, the strides information is at the heart of how the elements are accessed from memory for performing the computation.


torch.reshape()

This function would return a view and is exactly the same as using torch.Tensor.view() as long as the new shape is compatible with the shape of the original tensor. Otherwise, it will return a copy.

However, the notes of torch.reshape() warns that:

contiguous inputs and inputs with compatible strides can be reshaped without copying, but one should not depend on the copying vs. viewing behavior.


回答 3

我发现它x.view(-1, 16 * 5 * 5)等效于x.flatten(1),其中参数1指示扁平化过程从第一维开始(而不是扁平化“样本”维),如您所见,后者的用法在语义上更加清晰并且易于使用,因此我喜欢flatten()

I figured it out that x.view(-1, 16 * 5 * 5) is equivalent to x.flatten(1), where the parameter 1 indicates the flatten process starts from the 1st dimension(not flattening the ‘sample’ dimension) As you can see, the latter usage is semantically more clear and easier to use, so I prefer flatten().


回答 4

参数-1是什么意思?

您可以读取-1为动态数量的参数或“任何内容”。正因为如此,只能有一个参数-1view()

如果您要求,x.view(-1,1)将输出张量形状,[anything, 1]具体取决于中的元素数量x。例如:

import torch
x = torch.tensor([1, 2, 3, 4])
print(x,x.shape)
print("...")
print(x.view(-1,1), x.view(-1,1).shape)
print(x.view(1,-1), x.view(1,-1).shape)

将输出:

tensor([1, 2, 3, 4]) torch.Size([4])
...
tensor([[1],
        [2],
        [3],
        [4]]) torch.Size([4, 1])
tensor([[1, 2, 3, 4]]) torch.Size([1, 4])

What is the meaning of parameter -1?

You can read -1 as dynamic number of parameters or “anything”. Because of that there can be only one parameter -1 in view().

If you ask x.view(-1,1) this will output tensor shape [anything, 1] depending on the number of elements in x. For example:

import torch
x = torch.tensor([1, 2, 3, 4])
print(x,x.shape)
print("...")
print(x.view(-1,1), x.view(-1,1).shape)
print(x.view(1,-1), x.view(1,-1).shape)

Will output:

tensor([1, 2, 3, 4]) torch.Size([4])
...
tensor([[1],
        [2],
        [3],
        [4]]) torch.Size([4, 1])
tensor([[1, 2, 3, 4]]) torch.Size([1, 4])

回答 5

weights.reshape(a, b) 将返回一个新的张量,该张量的数据与权重为(a,b)的权重相同,因为它会将数据复制到内存的另一部分。

weights.resize_(a, b)返回具有不同形状的相同张量。但是,如果新形状导致的元素数量少于原始张量,则某些元素将从张量中删除(但不会从内存中删除)。如果新形状导致的元素数量多于原始张量,则新元素将在内存中未初始化。

weights.view(a, b) 将返回与具有权重(a,b)的权重相同的数据的新张量

weights.reshape(a, b) will return a new tensor with the same data as weights with size (a, b) as in it copies the data to another part of memory.

weights.resize_(a, b) returns the same tensor with a different shape. However, if the new shape results in fewer elements than the original tensor, some elements will be removed from the tensor (but not from memory). If the new shape results in more elements than the original tensor, new elements will be uninitialized in memory.

weights.view(a, b) will return a new tensor with the same data as weights with size (a, b)


回答 6

我真的很喜欢@Jadiel de Armas的例子。

我想对.view(…)的元素排序方式有一点了解

  • 对于形状为(a,b,c)的张量,其元素的顺序由编号系统确定:其中第一个数字 数字,第二个数字为b数字,第三个数字为c数字。
  • .view(…)返回的新Tensor中的元素映射将保留原始Tensor的此顺序

I really liked @Jadiel de Armas examples.

I would like to add a small insight to how elements are ordered for .view(…)

  • For a Tensor with shape (a,b,c), the order of it’s elements are determined by a numbering system: where the first digit has a numbers, second digit has b numbers and third digit has c numbers.
  • The mapping of the elements in the new Tensor returned by .view(…) preserves this order of the original Tensor.

回答 7

让我们尝试通过以下示例了解视图:

    a=torch.range(1,16)

print(a)

    tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13., 14.,
            15., 16.])

print(a.view(-1,2))

    tensor([[ 1.,  2.],
            [ 3.,  4.],
            [ 5.,  6.],
            [ 7.,  8.],
            [ 9., 10.],
            [11., 12.],
            [13., 14.],
            [15., 16.]])

print(a.view(2,-1,4))   #3d tensor

    tensor([[[ 1.,  2.,  3.,  4.],
             [ 5.,  6.,  7.,  8.]],

            [[ 9., 10., 11., 12.],
             [13., 14., 15., 16.]]])
print(a.view(2,-1,2))

    tensor([[[ 1.,  2.],
             [ 3.,  4.],
             [ 5.,  6.],
             [ 7.,  8.]],

            [[ 9., 10.],
             [11., 12.],
             [13., 14.],
             [15., 16.]]])

print(a.view(4,-1,2))

    tensor([[[ 1.,  2.],
             [ 3.,  4.]],

            [[ 5.,  6.],
             [ 7.,  8.]],

            [[ 9., 10.],
             [11., 12.]],

            [[13., 14.],
             [15., 16.]]])

如果我们知道y,z的值,则将-1作为参数值是计算x值的一种简便方法;在3d的情况下,反之亦然;对于2d,它又是计算x值的一种简便方法知道y的值,反之亦然。

Let’s try to understand view by the following examples:

    a=torch.range(1,16)

print(a)

    tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13., 14.,
            15., 16.])

print(a.view(-1,2))

    tensor([[ 1.,  2.],
            [ 3.,  4.],
            [ 5.,  6.],
            [ 7.,  8.],
            [ 9., 10.],
            [11., 12.],
            [13., 14.],
            [15., 16.]])

print(a.view(2,-1,4))   #3d tensor

    tensor([[[ 1.,  2.,  3.,  4.],
             [ 5.,  6.,  7.,  8.]],

            [[ 9., 10., 11., 12.],
             [13., 14., 15., 16.]]])
print(a.view(2,-1,2))

    tensor([[[ 1.,  2.],
             [ 3.,  4.],
             [ 5.,  6.],
             [ 7.,  8.]],

            [[ 9., 10.],
             [11., 12.],
             [13., 14.],
             [15., 16.]]])

print(a.view(4,-1,2))

    tensor([[[ 1.,  2.],
             [ 3.,  4.]],

            [[ 5.,  6.],
             [ 7.,  8.]],

            [[ 9., 10.],
             [11., 12.]],

            [[13., 14.],
             [15., 16.]]])

-1 as an argument value is an easy way to compute the value of say x provided we know values of y, z or the other way round in case of 3d and for 2d again an easy way to compute the value of say x provided we know values of y or vice versa..