标签归档:nvidia

如何检查pytorch是否正在使用GPU?

问题:如何检查pytorch是否正在使用GPU?

我想知道是否pytorch正在使用我的GPU。nvidia-smi在此过程中,可以检测是否有来自GPU的任何活动,但是我想要在python脚本中编写一些东西。

有办法吗?

I would like to know if pytorch is using my GPU. It’s possible to detect with nvidia-smi if there is any activity from the GPU during the process, but I want something written in a python script.

Is there a way to do so?


回答 0

这将起作用:

In [1]: import torch

In [2]: torch.cuda.current_device()
Out[2]: 0

In [3]: torch.cuda.device(0)
Out[3]: <torch.cuda.device at 0x7efce0b03be0>

In [4]: torch.cuda.device_count()
Out[4]: 1

In [5]: torch.cuda.get_device_name(0)
Out[5]: 'GeForce GTX 950M'

In [6]: torch.cuda.is_available()
Out[6]: True

这告诉我GPU GeForce GTX 950M正在被使用PyTorch

This is going to work :

In [1]: import torch

In [2]: torch.cuda.current_device()
Out[2]: 0

In [3]: torch.cuda.device(0)
Out[3]: <torch.cuda.device at 0x7efce0b03be0>

In [4]: torch.cuda.device_count()
Out[4]: 1

In [5]: torch.cuda.get_device_name(0)
Out[5]: 'GeForce GTX 950M'

In [6]: torch.cuda.is_available()
Out[6]: True

This tells me the GPU GeForce GTX 950M is being used by PyTorch.


回答 1

因为这里没有提出,所以我添加了一个使用的方法torch.device,因为这很方便,也可以在正确的上初始化张量device

# setting device on GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()

#Additional Info when using cuda
if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))
    print('Memory Usage:')
    print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
    print('Cached:   ', round(torch.cuda.memory_cached(0)/1024**3,1), 'GB')

输出:

Using device: cuda

Tesla K80
Memory Usage:
Allocated: 0.3 GB
Cached:    0.6 GB

如上所述,使用device它是可能的

  • 移至张量到各自的device

    torch.rand(10).to(device)
  • 创建直接在张量device

    torch.rand(10, device=device)

这使得在CPUGPU之间切换变得舒适,而无需更改实际代码。


编辑:

由于对缓存分配的内存存在一些疑问和困惑,因此我添加了一些有关它的其他信息:


您可以直接device在帖子中上面指定的位置移交一个,也可以将其保留为None,它将使用current_device()

As it hasn’t been proposed here, I’m adding a method using torch.device, as this is quite handy, also when initializing tensors on the correct device.

# setting device on GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()

#Additional Info when using cuda
if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))
    print('Memory Usage:')
    print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
    print('Cached:   ', round(torch.cuda.memory_reserved(0)/1024**3,1), 'GB')

Edit: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved. So use memory_cached for older versions.

Output:

Using device: cuda

Tesla K80
Memory Usage:
Allocated: 0.3 GB
Cached:    0.6 GB

As mentioned above, using device it is possible to:

  • To move tensors to the respective device:

      torch.rand(10).to(device)
    
  • To create a tensor directly on the device:

      torch.rand(10, device=device)
    

Which makes switching between CPU and GPU comfortable without changing the actual code.


Edit:

As there has been some questions and confusion about the cached and allocated memory I’m adding some additional information about it:


You can either directly hand over a device as specified further above in the post or you can leave it None and it will use the current_device().


Additional note: Old graphic cards with Cuda compute capability 3.0 or lower may be visible but cannot be used by Pytorch!
Thanks to hekimgil for pointing this out! – “Found GPU0 GeForce GT 750M which is of cuda capability 3.0. PyTorch no longer supports this GPU because it is too old. The minimum cuda capability that we support is 3.5.”


回答 2

在开始运行训练循环之后,如果要从终端手动查看它,则您的程序是否正在使用GPU资源以及使用程度如何,则可以像下面这样简单地使用watch

$ watch -n 2 nvidia-smi

这将持续每2秒更新一次使用情况统计信息,直到您按ctrl+c


如果您需要对可能需要的更多GPU统计信息进行更多控制,则可以使用with的更复杂的版本nvidia-smi--query-gpu=...。以下是对此的简单说明:

$ watch -n 3 nvidia-smi --query-gpu=index,gpu_name,memory.total,memory.used,memory.free,temperature.gpu,pstate,utilization.gpu,utilization.memory --format=csv

这将输出统计信息,例如:

注意:中的逗号分隔查询名称之间不应有任何空格--query-gpu=...。否则,这些值将被忽略,并且不返回任何统计信息。


另外,您可以通过执行以下操作来检查您的PyTorch安装是否正确检测到CUDA安装:

In [13]: import  torch

In [14]: torch.cuda.is_available()
Out[14]: True

True状态表示PyTorch已正确配置并正在使用GPU,尽管您必须在代码中使用必需的语句移动/放置张量。


如果要在Python代码中执行此操作,请查看以下模块:

https://github.com/jonsafari/nvidia-ml-py或在pypi中:https ://pypi.python.org/pypi/nvidia-ml-py/

After you start running the training loop, if you want to manually watch it from the terminal whether your program is utilizing the GPU resources and to what extent, then you can simply use watch as in:

$ watch -n 2 nvidia-smi

This will continuously update the usage stats for every 2 seconds until you press ctrl+c


If you need more control on more GPU stats you might need, you can use more sophisticated version of nvidia-smi with --query-gpu=.... Below is a simple illustration of this:

$ watch -n 3 nvidia-smi --query-gpu=index,gpu_name,memory.total,memory.used,memory.free,temperature.gpu,pstate,utilization.gpu,utilization.memory --format=csv

which would output the stats something like:

Note: There should not be any space between the comma separated query names in --query-gpu=.... Else those values will be ignored and no stats are returned.


Also, you can check whether your installation of PyTorch detects your CUDA installation correctly by doing:

In [13]: import  torch

In [14]: torch.cuda.is_available()
Out[14]: True

True status means that PyTorch is configured correctly and is using the GPU although you have to move/place the tensors with necessary statements in your code.


If you want to do this inside Python code, then look into this module:

https://github.com/jonsafari/nvidia-ml-py or in pypi here: https://pypi.python.org/pypi/nvidia-ml-py/


回答 3

在办公室站点和“入门”页面上,如下检查PyTorch的GPU:

import torch
torch.cuda.is_available()

参考:PyTorch |开始

On the office site and the get start page, check GPU for PyTorch as below:

import torch
torch.cuda.is_available()

Reference: PyTorch|Get Start


回答 4

从实际的角度来看,只有一个小题外话:

import torch
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

dev现在知道,如果CUDA或CPU。

当移至cuda时,如何处理模型和张量是有区别的。起初有点奇怪。

import torch
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
t1 = torch.randn(1,2)
t2 = torch.randn(1,2).to(dev)
print(t1)  # tensor([[-0.2678,  1.9252]])
print(t2)  # tensor([[ 0.5117, -3.6247]], device='cuda:0')
t1.to(dev) 
print(t1)  # tensor([[-0.2678,  1.9252]]) 
print(t1.is_cuda) # False
t1=t1.to(dev)
print(t1)  # tensor([[-0.2678,  1.9252]], device='cuda:0') 
print(t1.is_cuda) # True

class M(nn.Module):
def __init__(self):        
    super().__init__()        
    self.l1 = nn.Linear(1,2)

def forward(self, x):                      
    x = self.l1(x)
    return x
model = M()   # not on cuda
model.to(dev) # is on cuda (all parameters)
print(next(model.parameters()).is_cuda) #True

这一切都是棘手的,一旦理解就可以帮助您以更少的调试快速处理。

From practical standpoint just one minor digression:

import torch
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

This dev now knows if cuda or cpu.

And there is a difference how you deal with model and with tensors when moving to cuda. It is a bit strange at first.

import torch
import torch.nn as nn
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
t1 = torch.randn(1,2)
t2 = torch.randn(1,2).to(dev)
print(t1)  # tensor([[-0.2678,  1.9252]])
print(t2)  # tensor([[ 0.5117, -3.6247]], device='cuda:0')
t1.to(dev) 
print(t1)  # tensor([[-0.2678,  1.9252]]) 
print(t1.is_cuda) # False
t1 = t1.to(dev)
print(t1)  # tensor([[-0.2678,  1.9252]], device='cuda:0') 
print(t1.is_cuda) # True

class M(nn.Module):
    def __init__(self):        
        super().__init__()        
        self.l1 = nn.Linear(1,2)

    def forward(self, x):                      
        x = self.l1(x)
        return x
model = M()   # not on cuda
model.to(dev) # is on cuda (all parameters)
print(next(model.parameters()).is_cuda) # True

This all is tricky and understanding it once, helps you to deal fast with less debugging.


回答 5

要检查是否有可用的GPU:

torch.cuda.is_available()

如果以上函数返回False

  1. 你要么没有GPU,
  2. 或尚未安装Nvidia驱动程序,因此操作系统看不到GPU,
  3. 或者GPU被环境变量隐藏CUDA_VISIBLE_DEVICES。当值CUDA_VISIBLE_DEVICES是-1时,所有设备都被隐藏。您可以使用以下代码在代码中检查该值:os.environ['CUDA_VISIBLE_DEVICES']

如果以上函数返回True,则不一定表示您正在使用GPU。在Pytorch中,您可以在创建张量时为设备分配张量。默认情况下,张量分配给cpu。要检查张量的分配位置,请执行以下操作:

# assuming that 'a' is a tensor created somewhere else
a.device  # returns the device where the tensor is allocated

请注意,您无法对在不同设备中分配的张量进行操作。要查看如何为GPU分配张量,请参见此处:https : //pytorch.org/docs/stable/notes/cuda.html

To check if there is a GPU available:

torch.cuda.is_available()

If the above function returns False,

  1. you either have no GPU,
  2. or the Nvidia drivers have not been installed so the OS does not see the GPU,
  3. or the GPU is being hidden by the environmental variable CUDA_VISIBLE_DEVICES. When the value of CUDA_VISIBLE_DEVICES is -1, then all your devices are being hidden. You can check that value in code with this line: os.environ['CUDA_VISIBLE_DEVICES']

If the above function returns True that does not necessarily mean that you are using the GPU. In Pytorch you can allocate tensors to devices when you create them. By default, tensors get allocated to the cpu. To check where your tensor is allocated do:

# assuming that 'a' is a tensor created somewhere else
a.device  # returns the device where the tensor is allocated

Note that you cannot operate on tensors allocated in different devices. To see how to allocate a tensor to the GPU, see here: https://pytorch.org/docs/stable/notes/cuda.html


回答 6

几乎所有答案都在这里参考torch.cuda.is_available()。但是,那只是硬币的一部分。它告诉您GPU(实际上是CUDA)是否可用,而不是实际上是否在使用它。在典型的设置中,您可以通过以下方式设置设备:

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

但是在较大的环境(例如研究)中,通常也为用户提供更多选项,因此,基于输入,他们可以禁用CUDA,指定CUDA ID等。在这种情况下,是否使用GPU不仅取决于是否可用。将设备设置为割炬设备后,您可以获取其type属性以验证它是否为CUDA。

if device.type == 'cuda':
    # do something

Almost all answers here reference torch.cuda.is_available(). However, that’s only one part of the coin. It tells you whether the GPU (actually CUDA) is available, not whether it’s actually being used. In a typical setup, you would set your device with something like this:

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

but in larger environments (e.g. research) it is also common to give the user more options, so based on input they can disable CUDA, specify CUDA IDs, and so on. In such case, whether or not the GPU is used is not only based on whether it is available or not. After the device has been set to a torch device, you can get its type property to verify whether it’s CUDA or not.

if device.type == 'cuda':
    # do something

回答 7

只需从命令提示符或Linux环境运行以下命令。

python -c 'import torch; print(torch.cuda.is_available())'

上面应该打印 True

python -c 'import torch; print(torch.rand(2,3).cuda())'

这应该打印以下内容:

tensor([[0.7997, 0.6170, 0.7042], [0.4174, 0.1494, 0.0516]], device='cuda:0')

Simply from command prompt or Linux environment run the following command.

python -c 'import torch; print(torch.cuda.is_available())'

The above should print True

python -c 'import torch; print(torch.rand(2,3).cuda())'

This one should print the following:

tensor([[0.7997, 0.6170, 0.7042], [0.4174, 0.1494, 0.0516]], device='cuda:0')

回答 8

如果您在这里是因为您的pytorch总是False为此付出代价torch.cuda.is_available(),则可能是因为您安装的pytorch版本没有GPU支持。(例如:您先在笔记本电脑中编码,然后在服务器上进行测试)。

解决方案是使用pytorch 下载页面中的正确命令再次卸载并安装pytorch 。另请参阅 pytorch问题。

If you are here because your pytorch always gives False for torch.cuda.is_available() that’s probably because you installed your pytorch version without GPU support. (Eg: you coded up in laptop then testing on server).

The solution is to uninstall and install pytorch again with the right command from pytorch downloads page. Also refer this pytorch issue.


回答 9

在GPU上创建张量,如下所示:

$ python
>>> import torch
>>> print(torch.rand(3,3).cuda()) 

不要退出,打开另一个终端,并使用以下命令检查python进程是否正在使用GPU:

$ nvidia-smi

Create a tensor on the GPU as follows:

$ python
>>> import torch
>>> print(torch.rand(3,3).cuda()) 

Do not quit, open another terminal and check if the python process is using the GPU using:

$ nvidia-smi