标签归档:nvidia-titan

如何防止张量流分配GPU内存的总量?

问题:如何防止张量流分配GPU内存的总量?

我在共享计算资源的环境中工作,即,我们有几台服务器计算机,每台服务器计算机都配备了一些Nvidia Titan X GPU。

对于中小型型号,Titan X的12 GB通常足以让2-3个人在同一GPU上同时进行训练。如果模型足够小,以至于单个模型无法充分利用GPU的所有计算单元,那么与运行一个训练过程之后再执行另一个训练过程相比,这实际上可以提高速度。即使在同时访问GPU确实减慢了单个训练时间的情况下,具有让多个用户同时在GPU上进行训练的灵活性仍然很好。

TensorFlow的问题在于,默认情况下,它在启动时会分配全部可用的GPU内存。即使对于小型的两层神经网络,我也看到所有12 GB的GPU内存都已用完。

如果有人知道这足以满足给定模型的需求,是否有办法使TensorFlow仅分配4 GB的GPU内存?

I work in an environment in which computational resources are shared, i.e., we have a few server machines equipped with a few Nvidia Titan X GPUs each.

For small to moderate size models, the 12 GB of the Titan X is usually enough for 2–3 people to run training concurrently on the same GPU. If the models are small enough that a single model does not take full advantage of all the computational units of the GPU, this can actually result in a speedup compared with running one training process after the other. Even in cases where the concurrent access to the GPU does slow down the individual training time, it is still nice to have the flexibility of having multiple users simultaneously train on the GPU.

The problem with TensorFlow is that, by default, it allocates the full amount of available GPU memory when it is launched. Even for a small two-layer neural network, I see that all 12 GB of the GPU memory is used up.

Is there a way to make TensorFlow only allocate, say, 4 GB of GPU memory, if one knows that this is enough for a given model?


回答 0

您可以tf.Session通过传递a tf.GPUOptions作为可选config参数的一部分来设置构造a时要分配的GPU内存的比例:

# Assume that you have 12GB of GPU memory and want to allocate ~4GB:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)

sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

per_process_gpu_memory_fraction是同一台计算机上每个GPU上的进程将使用的GPU内存量的硬上限。当前,这部分被均匀地应用于同一台机器上的所有GPU。无法基于每个GPU进行设置。

You can set the fraction of GPU memory to be allocated when you construct a tf.Session by passing a tf.GPUOptions as part of the optional config argument:

# Assume that you have 12GB of GPU memory and want to allocate ~4GB:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)

sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

The per_process_gpu_memory_fraction acts as a hard upper bound on the amount of GPU memory that will be used by the process on each GPU on the same machine. Currently, this fraction is applied uniformly to all of the GPUs on the same machine; there is no way to set this on a per-GPU basis.


回答 1

config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)

https://github.com/tensorflow/tensorflow/issues/1578

config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)

https://github.com/tensorflow/tensorflow/issues/1578


回答 2

这是本书的摘录 Deep Learning with TensorFlow

在某些情况下,希望该过程仅分配可用内存的子集,或者仅增加该过程所需的内存使用量。TensorFlow 在会话中提供了两个配置选项来控制它。第一个是该allow_growth选项,它尝试根据运行时分配仅分配尽可能多的GPU内存,它开始时分配的内存很少,并且随着会话的运行和需要更多GPU内存,我们扩展了TensorFlow所需的GPU内存区域处理。

1)允许增长:(更灵活)

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

第二种方法是per_process_gpu_memory_fraction选项,它确定each应分配可见GPU 的内存总量的一部分。注意:不需要释放内存,完成后甚至会加剧内存碎片。

2)分配固定内存

仅通过以下方式分配40%每个GPU的总内存:

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

注意: 这仅在您真正想绑定TensorFlow进程上可用的GPU内存量时有用。

Here is an excerpt from the Book Deep Learning with TensorFlow

In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as it is needed by the process. TensorFlow provides two configuration options on the session to control this. The first is the allow_growth option, which attempts to allocate only as much GPU memory based on runtime allocations, it starts out allocating very little memory, and as sessions get run and more GPU memory is needed, we extend the GPU memory region needed by the TensorFlow process.

1) Allow growth: (more flexible)

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

The second method is per_process_gpu_memory_fraction option, which determines the fraction of the overall amount of memory that each visible GPU should be allocated. Note: No release of memory needed, it can even worsen memory fragmentation when done.

2) Allocate fixed memory:

To only allocate 40% of the total memory of each GPU by:

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

Note: That’s only useful though if you truly want to bind the amount of GPU memory available on the TensorFlow process.


回答 3

已针对TensorFlow 2.0 Alpha及更高版本进行了更新

从2.0 Alpha文档来看,答案现在只有一行,然后您可以使用TensorFlow进行任何操作:

import tensorflow as tf
tf.config.gpu.set_per_process_memory_growth(True)

For TensorFlow 2.0 and 2.1 (docs):

import tensorflow as tf
tf.config.gpu.set_per_process_memory_growth(True)

For TensorFlow 2.2+ (docs):

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
  tf.config.experimental.set_memory_growth(gpu, True)

The docs also list some more methods:

  • Set environment variable TF_FORCE_GPU_ALLOW_GROWTH to true.
  • Use tf.config.experimental.set_virtual_device_configuration to set a hard limit on a Virtual GPU device.

回答 4

以上所有答案均假定通过sess.run()调用执行,这已成为exceptions,而不是TensorFlow的最新版本中的规则。

使用tf.Estimator框架(TensorFlow 1.4及更高版本)时,将分数传递给隐式创建的方法MonitoredTrainingSession是:

opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
conf = tf.ConfigProto(gpu_options=opts)
trainingConfig = tf.estimator.RunConfig(session_config=conf, ...)
tf.estimator.Estimator(model_fn=..., 
                       config=trainingConfig)

同样在“急切”模式(TensorFlow 1.5及更高版本)中,

opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
conf = tf.ConfigProto(gpu_options=opts)
tfe.enable_eager_execution(config=conf)

编辑:11-04-2018 作为示例,如果要使用tf.contrib.gan.train,则可以使用类似于波纹管的东西:

tf.contrib.gan.gan_train(........, config=conf)

All the answers above assume execution with a sess.run() call, which is becoming the exception rather than the rule in recent versions of TensorFlow.

When using the tf.Estimator framework (TensorFlow 1.4 and above) the way to pass the fraction along to the implicitly created MonitoredTrainingSession is,

opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
conf = tf.ConfigProto(gpu_options=opts)
trainingConfig = tf.estimator.RunConfig(session_config=conf, ...)
tf.estimator.Estimator(model_fn=..., 
                       config=trainingConfig)

Similarly in Eager mode (TensorFlow 1.5 and above),

opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
conf = tf.ConfigProto(gpu_options=opts)
tfe.enable_eager_execution(config=conf)

Edit: 11-04-2018 As an example, if you are to use tf.contrib.gan.train, then you can use something similar to bellow:

tf.contrib.gan.gan_train(........, config=conf)

回答 5

对于Tensorflow版本2.0和2.1使用以下代码段

 import tensorflow as tf
 gpu_devices = tf.config.experimental.list_physical_devices('GPU')
 tf.config.experimental.set_memory_growth(gpu_devices[0], True)

对于以前的版本,以下代码段对我有用:

import tensorflow as tf
tf_config=tf.ConfigProto()
tf_config.gpu_options.allow_growth=True
sess = tf.Session(config=tf_config)

For Tensorflow version 2.0 and 2.1 use the following snippet:

 import tensorflow as tf
 gpu_devices = tf.config.experimental.list_physical_devices('GPU')
 tf.config.experimental.set_memory_growth(gpu_devices[0], True)

For prior versions , following snippet used to work for me:

import tensorflow as tf
tf_config=tf.ConfigProto()
tf_config.gpu_options.allow_growth=True
sess = tf.Session(config=tf_config)

回答 6

Tensorflow 2.0 Beta和(可能)超越

API再次更改。现在可以在以下位置找到它:

tf.config.experimental.set_memory_growth(
    device,
    enable
)

别名:

  • tf.compat.v1.config.experimental.set_memory_growth
  • tf.compat.v2.config.experimental.set_memory_growth

参考文献:

另请参阅: Tensorflow-使用GPUhttps : //www.tensorflow.org/guide/gpu

对于Tensorflow 2.0 Alpha,请参阅: 此答案

Tensorflow 2.0 Beta and (probably) beyond

The API changed again. It can be now found in:

tf.config.experimental.set_memory_growth(
    device,
    enable
)

Aliases:

  • tf.compat.v1.config.experimental.set_memory_growth
  • tf.compat.v2.config.experimental.set_memory_growth

References:

See also: Tensorflow – Use a GPU: https://www.tensorflow.org/guide/gpu

for Tensorflow 2.0 Alpha see: this answer


回答 7

您可以使用

TF_FORCE_GPU_ALLOW_GROWTH=true

在您的环境变量中。

在张量代码中:

bool GPUBFCAllocator::GetAllowGrowthValue(const GPUOptions& gpu_options) {
  const char* force_allow_growth_string =
      std::getenv("TF_FORCE_GPU_ALLOW_GROWTH");
  if (force_allow_growth_string == nullptr) {
    return gpu_options.allow_growth();
}

You can use

TF_FORCE_GPU_ALLOW_GROWTH=true

in your environment variables.

In tensorflow code:

bool GPUBFCAllocator::GetAllowGrowthValue(const GPUOptions& gpu_options) {
  const char* force_allow_growth_string =
      std::getenv("TF_FORCE_GPU_ALLOW_GROWTH");
  if (force_allow_growth_string == nullptr) {
    return gpu_options.allow_growth();
}

回答 8

无耻插件:如果安装了GPU支持的Tensorflow,则无论您将其设置为仅使用CPU还是GPU,该会话都会首先分配所有GPU。我可能会补充一点,即使您将图形设置为仅使用CPU,也应该设置相同的配置(如上回答:)),以防止不必要的GPU占用。

并且在类似IPython的交互式界面中,您还应该设置configure,否则它将分配所有内存,而几乎不分配给其他人。有时很难注意到。

Shameless plug: If you install the GPU supported Tensorflow, the session will first allocate all GPU whether you set it to use only CPU or GPU. I may add my tip that even you set the graph to use CPU only you should set the same configuration(as answered above:) ) to prevent the unwanted GPU occupation.

And in an interactive interface like IPython and Jupyter, you should also set that configure, otherwise, it will allocate all memory and left almost none for others. This is sometimes hard to notice.


回答 9

对于Tensorflow 2.0而言,解决方案适用于我。(TF-GPU 2.0,Windows 10,GeForce RTX 2070)

physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
tf.config.experimental.set_memory_growth(physical_devices[0], True)

For Tensorflow 2.0 this this solution worked for me. (TF-GPU 2.0, Windows 10, GeForce RTX 2070)

physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
tf.config.experimental.set_memory_growth(physical_devices[0], True)

回答 10

如果您使用的是Tensorflow 2,请尝试以下操作:

config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)

If you’re using Tensorflow 2 try the following:

config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)

回答 11

我尝试对voc数据集进行unet训练,但是由于巨大的图像大小,内存完成了。我尝试了上述所有技巧,甚至尝试使用批处理大小== 1,但没有任何改善。有时TensorFlow版本也会导致内存问题。尝试使用

点安装tensorflow-gpu == 1.8.0

i tried to train unet on voc data set but because of huge image size, memory finishes. i tried all the above tips, even tried with batch size==1, yet to no improvement. sometimes TensorFlow version also causes the memory issues. try by using

pip install tensorflow-gpu==1.8.0


回答 12

好吧,我是tensorflow的新手,我有Geforce 740m或具有2GB内存的GPU,我正在运行mnist手写的本地语言示例,其训练数据包含38700张图像和4300张测试图像,并试图获得精度,召回率, F1使用以下代码作为sklearn并没有给我精确的结果。一旦将其添加到现有代码中,我就开始出现GPU错误。

TP = tf.count_nonzero(predicted * actual)
TN = tf.count_nonzero((predicted - 1) * (actual - 1))
FP = tf.count_nonzero(predicted * (actual - 1))
FN = tf.count_nonzero((predicted - 1) * actual)

prec = TP / (TP + FP)
recall = TP / (TP + FN)
f1 = 2 * prec * recall / (prec + recall)

再加上我的模型很沉重,我想我在147个,148个纪元后出现内存错误,然后我想为什么不为这些任务创建函数,所以我不知道它是否在tensrorflow中以这种方式工作,但是我想如果局部变量是使用,并且超出范围可能释放内存,并且我在模块中定义了用于培训和测试的上述元素,我能够实现10000个纪元而没有任何问题,希望对您有所帮助。

Well I am new to tensorflow, I have Geforce 740m or something GPU with 2GB ram, I was running mnist handwritten kind of example for a native language with training data containing of 38700 images and 4300 testing images and was trying to get precision , recall , F1 using following code as sklearn was not giving me precise reults. once i added this to my existing code i started getting GPU errors.

TP = tf.count_nonzero(predicted * actual)
TN = tf.count_nonzero((predicted - 1) * (actual - 1))
FP = tf.count_nonzero(predicted * (actual - 1))
FN = tf.count_nonzero((predicted - 1) * actual)

prec = TP / (TP + FP)
recall = TP / (TP + FN)
f1 = 2 * prec * recall / (prec + recall)

plus my model was heavy i guess, i was getting memory error after 147, 148 epochs, and then I thought why not create functions for the tasks so I dont know if it works this way in tensrorflow, but I thought if a local variable is used and when out of scope it may release memory and i defined the above elements for training and testing in modules, I was able to achieve 10000 epochs without any issues, I hope this will help..


回答 13

# allocate 60% of GPU memory 
from keras.backend.tensorflow_backend import set_session
import tensorflow as tf 
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.6
set_session(tf.Session(config=config))
# allocate 60% of GPU memory 
from keras.backend.tensorflow_backend import set_session
import tensorflow as tf 
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.6
set_session(tf.Session(config=config))