
为什么TensorFlow 2比TensorFlow 1慢得多?

问题:为什么TensorFlow 2比TensorFlow 1慢得多?





规格:CUDA 10.0.130,cuDNN 7.4.2,Python 3.7.4,Windows 10,GTX 1070


UPDATE:禁用每下面的代码不会急于执行没有帮助。但是,该行为是不一致的:有时以图形方式运行会有所帮助,而其他时候其运行速度要比 Eager




# use tensorflow.keras... to benchmark tf.keras; used GPU for all above benchmarks
from keras.layers import Input, Dense, LSTM, Bidirectional, Conv1D
from keras.layers import Flatten, Dropout
from keras.models import Model
from keras.optimizers import Adam
import keras.backend as K
import numpy as np
from time import time

batch_shape = (32, 400, 16)
X, y = make_data(batch_shape)

model_small = make_small_model(batch_shape)
model_small.train_on_batch(X, y)  # skip first iteration which builds graph
timeit(model_small.train_on_batch, 200, X, y)

K.clear_session()  # in my testing, kernel was restarted instead

model_medium = make_medium_model(batch_shape)
model_medium.train_on_batch(X, y)  # skip first iteration which builds graph
timeit(model_medium.train_on_batch, 10, X, y)


def timeit(func, iterations, *args):
    t0 = time()
    for _ in range(iterations):
    print("Time/iter: %.4f sec" % ((time() - t0) / iterations))

def make_small_model(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(128, 400, strides=4, padding='same')(ipt)
    x     = Flatten()(x)
    x     = Dropout(0.5)(x)
    x     = Dense(64, activation='relu')(x)
    out   = Dense(1,  activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_medium_model(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Bidirectional(LSTM(512, activation='relu', return_sequences=True))(ipt)
    x     = LSTM(512, activation='relu', return_sequences=True)(x)
    x     = Conv1D(128, 400, strides=4, padding='same')(x)
    x     = Flatten()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_data(batch_shape):
    return np.random.randn(*batch_shape), np.random.randint(0, 2, (batch_shape[0], 1))

It’s been cited by many users as the reason for switching to Pytorch, but I’ve yet to find a justification / explanation for sacrificing the most important practical quality, speed, for eager execution.

Below is code benchmarking performance, TF1 vs. TF2 – with TF1 running anywhere from 47% to 276% faster.

My question is: what is it, at the graph or hardware level, that yields such a significant slowdown?

Looking for a detailed answer – am already familiar with broad concepts. Relevant Git

Specs: CUDA 10.0.130, cuDNN 7.4.2, Python 3.7.4, Windows 10, GTX 1070

Benchmark results:

UPDATE: Disabling Eager Execution per below code does not help. The behavior, however, is inconsistent: sometimes running in graph mode helps considerably, other times it runs slower relative to Eager.

As TF devs don’t appear around anywhere, I’ll be investigating this matter myself – can follow progress in the linked Github issue.

UPDATE 2: tons of experimental results to share, along explanations; should be done today.

Benchmark code:

# use tensorflow.keras... to benchmark tf.keras; used GPU for all above benchmarks
from keras.layers import Input, Dense, LSTM, Bidirectional, Conv1D
from keras.layers import Flatten, Dropout
from keras.models import Model
from keras.optimizers import Adam
import keras.backend as K
import numpy as np
from time import time

batch_shape = (32, 400, 16)
X, y = make_data(batch_shape)

model_small = make_small_model(batch_shape)
model_small.train_on_batch(X, y)  # skip first iteration which builds graph
timeit(model_small.train_on_batch, 200, X, y)

K.clear_session()  # in my testing, kernel was restarted instead

model_medium = make_medium_model(batch_shape)
model_medium.train_on_batch(X, y)  # skip first iteration which builds graph
timeit(model_medium.train_on_batch, 10, X, y)

Functions used:

def timeit(func, iterations, *args):
    t0 = time()
    for _ in range(iterations):
    print("Time/iter: %.4f sec" % ((time() - t0) / iterations))

def make_small_model(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(128, 400, strides=4, padding='same')(ipt)
    x     = Flatten()(x)
    x     = Dropout(0.5)(x)
    x     = Dense(64, activation='relu')(x)
    out   = Dense(1,  activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_medium_model(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Bidirectional(LSTM(512, activation='relu', return_sequences=True))(ipt)
    x     = LSTM(512, activation='relu', return_sequences=True)(x)
    x     = Conv1D(128, 400, strides=4, padding='same')(x)
    x     = Flatten()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_data(batch_shape):
    return np.random.randn(*batch_shape), np.random.randint(0, 2, (batch_shape[0], 1))

回答 0

2020年2月18日更新:我每晚排练 2.1和2.1;结果好坏参半。除了一个配置(模型和数据大小)外,其他配置的运行速度都快于TF2和TF1的最佳配置。速度较慢且急剧下降的是大型-尤其是。在图形执行中(慢1.6倍至2.5倍)。






问题摘要:正如TensorFlow开发人员Q. Scott Zhu 确认的那样,TF2专注于Eager执行和带有Keras的紧密集成的开发,这涉及到TF源的全面更改-包括图形级。好处:大大扩展了处理,分发,调试和部署功能。但是,其中一些成本是速度。


  1. TF2与TF1
  2. 渴望与图表模式
  3. kerastf.keras
  4. numpyvs. tf.data.Datasetvs ….
  5. train_on_batch()fit()
  6. GPU与CPU
  7. model(x)vs. model.predict(x)vs ….




  • train_on_batch()++ numpy+ tf.kerasTF1 +热切/图
  • train_on_batch()+ numpy+ tf.keras+ + TF2图
  • fit()++ numpy+ tf.kerasTF1 / TF2 +图表+大型模型和数据


  • fit()+ numpy+ keras用于中小型模型和数据
  • fit()++ numpy+ tf.kerasTF1 / TF2 +渴望
  • train_on_batch()+ numpy+ keras+ + TF1伊格

  • [主要] tf.python.keras;它的运行速度可以降低10到100倍,并且带有许多错误;更多信息

    • 这包括layersmodelsoptimizers,和相关的“乱用”的使用进口; ops,utils和相关的“私有”导入都可以-但可以肯定的是,请检查alt以及它们是否用于tf.keras

请参阅其他答案底部的代码,以获取基准测试设置示例。上面的列表主要基于其他答案中的“ BENCHMARKS”表。


  • 这个问题的标题是“为什么TF2比TF1慢得多?”,尽管它的主体明确地涉及训练,但问题并不局限于此。即使在相同的TF版本,导入,数据格式等中,推理也将受到主要速度差异的影响-参见此答案
  • RNN在TF2中得到了改进,很可能会明显改变其他答案中的数据网格。
  • 模型主要用于Conv1DDense-不RNNs,稀疏数据/目标,4 / 5D输入,和其他CONFIGS
  • 输入数据限制为numpytf.data.Dataset,同时存在许多其他格式;查看其他答案
  • 使用了GPU;结果在CPU上有所不同。实际上,当我问这个问题时,我的CUDA配置不正确,并且某些结果是基于CPU的。


  • 出色的调试:您可能会遇到许多问题,询问“如何获得中间层输出”或“如何检查权重”;渴望,它(几乎)很简单.__dict__。相比之下,Graph需要熟悉特殊的后端功能-极大地增加了调试和自省的整个过程。
  • 更快的原型制作:与上述类似的想法;更快的理解=剩下更多的时间用于实际DL。


tf.enable_eager_execution()  # TF1; must be done before any model/tensor creation
tf.compat.v1.disable_eager_execution() # TF2; above holds


  • 仔细_on_batch()研究TF2中的方法;根据TF开发人员的说法,他们仍然使用较慢的实现方式,但不是故意的 -即必须解决。有关详细信息,请参见其他答案。


  1. 请修复train_on_batch(),以及fit()迭代调用的性能方面;定制火车循环对许多人尤其是我来说很重要。
  2. 添加有关这些性能差异的文档/文档字符串,以供用户了解。
  3. 提高一般执行速度,以防止窥视现象跳入Pytorch。



  • 191114日 -找到了一个模型(在我的实际应用程序中),该模型在TF2上针对所有*配置(带有Numpy输入数据)的速度较慢。差异范围为13-19%,平均为17%。但是,keras和之间的tf.keras差异更为明显:平均18-40%。32%(TF1和2)。(*-渴望者(TF2 OOM’d为此)

  • 11/17/19 -devs on_batch()最近的一次提交中更新了方法,指出已提高了速度-将在TF 2.1中发布,或现在以形式提供tf-nightly。由于我无法让后者运行,因此将替补席推迟到2.1。

  • 2/20/20-预测性能也值得借鉴;例如,在TF2中,CPU预测时间可能涉及周期性的峰值

UPDATE 2/18/2020: I’ve benched 2.1 and 2.1-nightly; the results are mixed. All but one configs (model & data size) are as fast as or much faster than the best of TF2 & TF1. The one that’s slower, and slower dramatically, is Large-Large – esp. in Graph execution (1.6x to 2.5x slower).

Furthermore, there are extreme reproducibility differences between Graph and Eager for a large model I tested – one not explainable via randomness/compute-parallelism. I can’t currently present reproducible code for these claims per time constraints, so instead I strongly recommend testing this for your own models.

Haven’t opened a Git issue on these yet, but I did comment on the original – no response yet. I’ll update the answer(s) once progress is made.

VERDICT: it isn’t, IF you know what you’re doing. But if you don’t, it could cost you, lots – by a few GPU upgrades on average, and by multiple GPUs worst-case.

THIS ANSWER: aims to provide a high-level description of the issue, as well as guidelines for how to decide on the training configuration specific to your needs. For a detailed, low-level description, which includes all benchmarking results + code used, see my other answer.

I’ll be updating my answer(s) w/ more info if I learn any – can bookmark / “star” this question for reference.

ISSUE SUMMARY: as confirmed by a TensorFlow developer, Q. Scott Zhu, TF2 focused development on Eager execution & tight integration w/ Keras, which involved sweeping changes in TF source – including at graph-level. Benefits: greatly expanded processing, distribution, debug, and deployment capabilities. The cost of some of these, however, is speed.

The matter, however, is fairly more complex. It isn’t just TF1 vs. TF2 – factors yielding significant differences in train speed include:

  1. TF2 vs. TF1
  2. Eager vs. Graph mode
  3. keras vs. tf.keras
  4. numpy vs. tf.data.Dataset vs. …
  5. train_on_batch() vs. fit()
  6. GPU vs. CPU
  7. model(x) vs. model.predict(x) vs. …

Unfortunately, almost none of the above are independent of the other, and each can at least double execution time relative to another. Fortunately, you can determine what’ll work best systematically, and with a few shortcuts – as I’ll be showing.

WHAT SHOULD I DO? Currently, the only way is – experiment for your specific model, data, and hardware. No single configuration will always work best – but there are do’s and don’t’s to simplify your search:

>> DO:

  • train_on_batch() + numpy + tf.keras + TF1 + Eager/Graph
  • train_on_batch() + numpy + tf.keras + TF2 + Graph
  • fit() + numpy + tf.keras + TF1/TF2 + Graph + large model & data

>> DON’T:

  • fit() + numpy + keras for small & medium models and data
  • fit() + numpy + tf.keras + TF1/TF2 + Eager
  • train_on_batch() + numpy + keras + TF1 + Eager

  • [Major] tf.python.keras; it can run 10-100x slower, and w/ plenty of bugs; more info

    • This includes layers, models, optimizers, & related “out-of-box” usage imports; ops, utils, & related ‘private’ imports are fine – but to be sure, check for alts, & whether they’re used in tf.keras

Refer to code at bottom of my other answer for an example benchmarking setup. The list above is based mainly on the “BENCHMARKS” tables in the other answer.

LIMITATIONS of the above DO’s & DON’T’s:

  • This question’s titled “Why is TF2 much slower than TF1?”, and while its body concerns training explicitly, the matter isn’t limited to it; inference, too, is subject to major speed differences, even within the same TF version, import, data format, etc. – see this answer.
  • RNNs are likely to notably change the data grid in the other answer, as they’ve been improved in TF2
  • Models primarily used Conv1D and Dense – no RNNs, sparse data/targets, 4/5D inputs, & other configs
  • Input data limited to numpy and tf.data.Dataset, while many other formats exist; see other answer
  • GPU was used; results will differ on a CPU. In fact, when I asked the question, my CUDA wasn’t properly configured, and some of the results were CPU-based.

Why did TF2 sacrifice the most practical quality, speed, for eager execution? It hasn’t, clearly – graph is still available. But if the question is “why eager at all”:

  • Superior debugging: you’ve likely come across multitudes of questions asking “how do I get intermediate layer outputs” or “how do I inspect weights”; with eager, it’s (almost) as simple as .__dict__. Graph, in contrast, requires familiarity with special backend functions – greatly complicating the entire process of debugging & introspection.
  • Faster prototyping: per ideas similar to above; faster understanding = more time left for actual DL.


tf.enable_eager_execution()  # TF1; must be done before any model/tensor creation
tf.compat.v1.disable_eager_execution() # TF2; above holds


  • Careful with _on_batch() methods in TF2; according to the TF dev, they still use a slower implementation, but not intentionally – i.e. it’s to be fixed. See other answer for details.


  1. Please fix train_on_batch(), and the performance aspect of calling fit() iteratively; custom train loops are important to many, especially to me.
  2. Add documentation / docstring mention of these performance differences for users’ knowledge.
  3. Improve general execution speed to keep peeps from hopping to Pytorch.



  • 11/14/19 – found a model (in my real application) that that runs slower on TF2 for all* configurations w/ Numpy input data. Differences ranged 13-19%, averaging 17%. Differences between keras and tf.keras, however, were more dramatic: 18-40%, avg. 32% (both TF1 & 2). (* – except Eager, for which TF2 OOM’d)

  • 11/17/19 – devs updated on_batch() methods in a recent commit, stating to have improved speed – to be released in TF 2.1, or available now as tf-nightly. As I’m unable to get latter running, will delay benching until 2.1.

  • 2/20/20 – prediction performance is also worth benching; in TF2, for example, CPU prediction times can involve periodic spikes

回答 1



EAGER VS. GRAPH:这可以说是整个答案的关键:根据我的测试,TF2的渴望比TF1的渴望。细节进一步下降。

两者之间的根本区别是:Graph 主动设置计算网络,并在“提示”时执行-而Eager在创建时执行所有操作。但故事只从这里开始:

  • 渴望并不是没有Graph,实际上可能主要是 Graph,这与预期相反。它主要是执行图 -包括模型和优化器权重,占图的很大一部分。

  • 渴望在执行时重建自己图的一部分 ; Graph未完全构建的直接结果-请参阅分析器结果。这具有计算开销。

  • 渴望慢与脾气暴躁的输入 ; 根据此Git注释和代码,Eager中的Numpy输入包括将张量从CPU复制到GPU的开销成本。遍历源代码,数据处理差异很明显;渴望直接通过Numpy,而图则通过张量,然后求和为Numpy。不确定确切的过程,但后者应涉及GPU级别的优化

  • TF2 Eager 比TF1 Eager -这是…意外。请参阅下面的基准测试结果。差异从可以忽略不计到显着,但是是一致的。不确定为什么会这样-如果TF开发人员澄清了,将会更新答案。

TF2与TF1:引用TF开发人员Q. Scott Zhu的相关部分的回复 -附上我的强调和改写:






为了克服急切模式下的缓慢性,我们提供了@ tf.function,它将把python函数变成图形。当像np数组一样输入数值时,tf.function的主体将转换为静态图,进行优化,并返回最终值,该值很快,并且应具有与TF1图模式相似的性能。


最后,开发人员的链接提交:支持Keras v2循环的大量更改


              training_v2.Loop()) # multi-worker mode
# Case 1: distribution strategy
# Case 2: generator-like. Input is Python generator, or Sequence object,
# or a non-distributed Dataset or iterator in eager execution.
# Case 3: Symbolic tensors or Numpy array-like. This includes Datasets and iterators 
# in graph mode (since they generate symbolic tensors).
training_generator.GeneratorLikeTrainingLoop() # Eager
training_arrays.ArrayLikeTrainingLoop() # Graph


火车循环:fitvs train_on_batchkerasvstf.keras:四个循环都使用不同的火车循环,尽管可能不是每种可能的组合。kerasfit,例如,使用的形式fit_loop,例如training_arrays.fit_loop(),其train_on_batch可以使用K.function()tf.keras具有更复杂的层次结构,在上一节中进行了部分描述。

训练循环:文档 -有关某些不同执行方法的相关源文档字符串


function 为每个唯一的输入形状和数据类型集实例化一个单独的图




  • 是决定性的;没有任何一种配置能在所有型号和数据尺寸上脱颖而出。
  • 相对于模型大小的数据大小很重要;对于小型数据和模型,数据传输(例如,CPU至GPU)的开销可能占主导。同样,小型的开销处理器在每个数据转换时间对大型数据的运行速度上较慢(请参见 convert_to_tensor“配置文件”)
  • 速度因火车循环和输入数据处理器处理资源的方式而异。

基准:磨碎的肉。- Word文档Excel电子表格


  • 减去%的数字都是
  • %计算为(1 - longer_time / shorter_time)*100; 理由:我们对哪个因素比另一个因素更快感兴趣shorter / longer实际上是非线性关系,对直接比较没有用
  • 百分号确定:
    • TF2 vs TF1:+如果TF2更快
    • GvE(图表vs.渴望):+如果图表更快
  • TF2 = TensorFlow 2.0.0 + Keras 2.3.1; TF1 = TensorFlow 1.14.0 + Keras 2.2.5


PROFILER-说明:Spyder 3.3.6 IDE分析器。

  • 有些功能在其他嵌套中重复;因此,很难找到“数据处理”和“训练”功能之间的确切间隔,因此会有一些重叠-在最后一个结果中很明显。

  • 计算的wrt运行时数减去构建时间的百分比

  • 通过将所有(唯一)运行时间相加得出的构建时间来计算,这些运行时间称为1或2次
  • 通过累加所有(唯一的)运行时间(与迭代的次数和它们的嵌套的运行时间相同)计算出的训练时间
  • 不幸的是,函数是根据其原始名称进行_func = func概要分析的(即,将概要分析为func),这会混入构建时间-因此需要将其排除在外


  • 底部执行的代码带有最少的后台任务运行
  • GPU是“热身” W /定时重复前几次反复,在提出这个帖子
  • 从源代码构建的CUDA 10.0.130,cuDNN 7.6.0,TensorFlow 1.14.0和TensorFlow 2.0.0,以及Anaconda
  • Python 3.7.4,Spyder 3.3.6 IDE
  • GTX 1070,Windows 10、24 GB DDR4 2.4 MHz RAM,i7-7700HQ 2.8 GHz CPU


  • 基准“小”,“中”和“大”模型和数据大小
  • 固定每个模型大小的参数数,与输入数据大小无关
  • “较大”模型具有更多参数和层
  • “较大”的数据具有更长的序列,但相同batch_sizenum_channels
  • 模型只使用Conv1DDense“可学习”层; 每个TF版本的符号都避免了RNN。差异
  • 始终在基准循环之外运行一列火车,以省略模型和优化器图的构建
  • 不使用稀疏数据(例如layers.Embedding())或稀疏目标(例如SparseCategoricalCrossEntropy()



import numpy as np
import tensorflow as tf
import random
from termcolor import cprint
from time import time

from tensorflow.keras.layers import Input, Dense, Conv1D
from tensorflow.keras.layers import Dropout, GlobalAveragePooling1D
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
import tensorflow.keras.backend as K
#from keras.layers import Input, Dense, Conv1D
#from keras.layers import Dropout, GlobalAveragePooling1D
#from keras.models import Model 
#from keras.optimizers import Adam
#import keras.backend as K


def reset_seeds(reset_graph_with_backend=None, verbose=1):
    if reset_graph_with_backend is not None:
        K = reset_graph_with_backend
        if verbose:

    if tf.__version__[0] == '2':
    if verbose:
        print("RANDOM SEEDS RESET")

print("TF version: {}".format(tf.__version__))

def timeit(func, iterations, *args, _verbose=0, **kwargs):
    t0 = time()
    for _ in range(iterations):
        func(*args, **kwargs)
    print("Time/iter: %.4f sec" % ((time() - t0) / iterations))

def make_model_small(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(128, 40, strides=4, padding='same')(ipt)
    x     = GlobalAveragePooling1D()(x)
    x     = Dropout(0.5)(x)
    x     = Dense(64, activation='relu')(x)
    out   = Dense(1,  activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_model_medium(batch_shape):
    ipt = Input(batch_shape=batch_shape)
    x = ipt
    for filters in [64, 128, 256, 256, 128, 64]:
        x  = Conv1D(filters, 20, strides=1, padding='valid')(x)
    x     = GlobalAveragePooling1D()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_model_large(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(64,  400, strides=4, padding='valid')(ipt)
    x     = Conv1D(128, 200, strides=1, padding='valid')(x)
    for _ in range(40):
        x = Conv1D(256,  12, strides=1, padding='same')(x)
    x     = Conv1D(512,  20, strides=2, padding='valid')(x)
    x     = Conv1D(1028, 10, strides=2, padding='valid')(x)
    x     = Conv1D(256,   1, strides=1, padding='valid')(x)
    x     = GlobalAveragePooling1D()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)    
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_data(batch_shape):
    return np.random.randn(*batch_shape), \
           np.random.randint(0, 2, (batch_shape[0], 1))

def make_data_tf(batch_shape, n_batches, iters):
    data = np.random.randn(n_batches, *batch_shape),
    trgt = np.random.randint(0, 2, (n_batches, batch_shape[0], 1))
    return tf.data.Dataset.from_tensor_slices((data, trgt))#.repeat(iters)

batch_shape_small  = (32, 140,   30)
batch_shape_medium = (32, 1400,  30)
batch_shape_large  = (32, 14000, 30)

batch_shapes = batch_shape_small, batch_shape_medium, batch_shape_large
make_model_fns = make_model_small, make_model_medium, make_model_large
iterations = [200, 100, 50]
shape_names = ["Small data",  "Medium data",  "Large data"]
model_names = ["Small model", "Medium model", "Large model"]

def test_all(fit=False, tf_dataset=False):
    for model_fn, model_name, iters in zip(make_model_fns, model_names, iterations):
        for batch_shape, shape_name in zip(batch_shapes, shape_names):
            if (model_fn is make_model_large) and (batch_shape is batch_shape_small):
            if tf_dataset:
                data = make_data_tf(batch_shape, iters, iters)
                data = make_data(batch_shape)
            model = model_fn(batch_shape)

            if fit:
                if tf_dataset:
                    t0 = time()
                    model.fit(data, steps_per_epoch=iters)
                    print("Time/iter: %.4f sec" % ((time() - t0) / iters))
                    timeit(model.fit, iters, *data, _verbose=1, verbose=0)
                timeit(model.train_on_batch, iters, *data, _verbose=1)
            cprint(">> {}, {} done <<\n".format(model_name, shape_name), 'blue')
            del model

test_all(fit=True, tf_dataset=False)

THIS ANSWER: aims to provide a detailed, graph/hardware-level description of the issue – including TF2 vs. TF1 train loops, input data processors, and Eager vs. Graph mode executions. For an issue summary & resolution guidelines, see my other answer.

PERFORMANCE VERDICT: sometimes one is faster, sometimes the other, depending on configuration. As far as TF2 vs TF1 goes, they’re about on par on average, but significant config-based differences do exist, and TF1 trumps TF2 more often than vice versa. See “BENCHMARKING” below.

EAGER VS. GRAPH: the meat of this entire answer for some: TF2’s eager is slower than TF1’s, according to my testing. Details further down.

The fundamental difference between the two is: Graph sets up a computational network proactively, and executes when ‘told to’ – whereas Eager executes everything upon creation. But the story only begins here:

  • Eager is NOT devoid of Graph, and may in fact be mostly Graph, contrary to expectation. What it largely is, is executed Graph – this includes model & optimizer weights, comprising a great portion of the graph.

  • Eager rebuilds part of own graph at execution; direct consequence of Graph not being fully built — see profiler results. This has a computational overhead.

  • Eager is slower w/ Numpy inputs; per this Git comment & code, Numpy inputs in Eager include the overhead cost of copying tensors from CPU to GPU. Stepping through source code, data handling differences are clear; Eager directly passes Numpy, while Graph passes tensors which then evaluate to Numpy; uncertain of the exact process, but latter should involve GPU-level optimizations

  • TF2 Eager is slower than TF1 Eager – this is… unexpected. See benchmarking results below. Differences span from negligible to significant, but are consistent. Unsure why it’s the case – if a TF dev clarifies, will update answer.

TF2 vs. TF1: quoting relevant portions of a TF dev’s, Q. Scott Zhu’s, response – w/ bit of my emphasis & rewording:

In eager, the runtime needs to execute the ops and return the numerical value for every line of python code. The nature of single step execution causes it to be slow.

In TF2, Keras leverages tf.function to build its graph for training, eval and prediction. We call them “execution function” for the model. In TF1, the “execution function” was a FuncGraph, which shared some common component as TF function, but has a different implementation.

During the process, we somehow left an incorrect implementation for train_on_batch(), test_on_batch() and predict_on_batch(). They are still numerically correct, but the execution function for x_on_batch is a pure python function, rather than a tf.function wrapped python function. This will cause slowness

In TF2, we convert all input data into a tf.data.Dataset, by which we can unify our execution function to handle the single type of the inputs. There might be some overhead in the dataset conversion, and I think this is a one-time only overhead, rather than a per-batch cost

With the last sentence of last paragraph above, and last clause of below paragraph:

To overcome the slowness in eager mode, we have @tf.function, which will turn a python function into a graph. When feed numerical value like np array, the body of the tf.function is converted into static graph, being optimized, and return the final value, which is fast and should have similar performance as TF1 graph mode.

I disagree – per my profiling results, which show Eager’s input data processing to be substantially slower than Graph’s. Also, unsure about tf.data.Dataset in particular, but Eager does repeatedly call multiple of the same data conversion methods – see profiler.

Lastly, dev’s linked commit: Significant number of changes to support the Keras v2 loops.

Train Loops: depending on (1) Eager vs. Graph; (2) input data format, training in will proceed with a distinct train loop – in TF2, _select_training_loop(), training.py, one of:

              training_v2.Loop()) # multi-worker mode
# Case 1: distribution strategy
# Case 2: generator-like. Input is Python generator, or Sequence object,
# or a non-distributed Dataset or iterator in eager execution.
# Case 3: Symbolic tensors or Numpy array-like. This includes Datasets and iterators 
# in graph mode (since they generate symbolic tensors).
training_generator.GeneratorLikeTrainingLoop() # Eager
training_arrays.ArrayLikeTrainingLoop() # Graph

Each handles resource allocation differently, and bears consequences on performance & capability.

Train Loops: fit vs train_on_batch, keras vs. tf.keras: each of the four uses different train loops, though perhaps not in every possible combination. kerasfit, for example, uses a form of fit_loop, e.g. training_arrays.fit_loop(), and its train_on_batch may use K.function(). tf.keras has a more sophisticated hierarchy described in part in previous section.

Train Loops: documentation — relevant source docstring on some of the different execution methods:

Unlike other TensorFlow operations, we don’t convert python numerical inputs to tensors. Moreover, a new graph is generated for each distinct python numerical value

function instantiates a separate graph for every unique set of input shapes and datatypes.

A single tf.function object might need to map to multiple computation graphs under the hood. This should be visible only as performance (tracing graphs has a nonzero computational and memory cost)

Input data processors: similar to above, the processor is selected case-by-case, depending on internal flags set according to runtime configurations (execution mode, data format, distribution strategy). The simplest case’s with Eager, which works directly w/ Numpy arrays. For some specific examples, see this answer.


  • Is decisive; no single configuration crowned itself atop all model & data sizes.
  • Data size relative to model size is important; for small data & model, data transfer (e.g. CPU to GPU) overhead can dominate. Likewise, small overhead processors can run slower on large data per data conversion time dominating (see convert_to_tensor in “PROFILER”)
  • Speed differs per train loops’ and input data processors’ differing means of handling resources.

BENCHMARKS: the grinded meat. — Word DocumentExcel Spreadsheet


  • %-less numbers are all seconds
  • % computed as (1 - longer_time / shorter_time)*100; rationale: we’re interested by what factor one is faster than the other; shorter / longer is actually a non-linear relation, not useful for direct comparison
  • % sign determination:
    • TF2 vs TF1: + if TF2 is faster
    • GvE (Graph vs. Eager): + if Graph is faster
  • TF2 = TensorFlow 2.0.0 + Keras 2.3.1; TF1 = TensorFlow 1.14.0 + Keras 2.2.5


PROFILER – Explanation: Spyder 3.3.6 IDE profiler.

  • Some functions are repeated in nests of others; hence, it’s hard to track down the exact separation between “data processing” and “training” functions, so there will be some overlap – as pronounced in the very last result.

  • % figures computed w.r.t. runtime minus build time

  • Build time computed by summing all (unique) runtimes which were called 1 or 2 times
  • Train time computed by summing all (unique) runtimes which were called the same # of times as the # of iterations, and some of their nests’ runtimes
  • Functions are profiled according to their original names, unfortunately (i.e. _func = func will profile as func), which mixes in build time – hence the need to exclude it


  • Executed code at bottom w/ minimal background tasks running
  • GPU was “warmed up” w/ a few iterations before timing iterations, as suggested in this post
  • CUDA 10.0.130, cuDNN 7.6.0, TensorFlow 1.14.0, & TensorFlow 2.0.0 built from source, plus Anaconda
  • Python 3.7.4, Spyder 3.3.6 IDE
  • GTX 1070, Windows 10, 24GB DDR4 2.4-MHz RAM, i7-7700HQ 2.8-GHz CPU


  • Benchmark ‘small’, ‘medium’, & ‘large’ model & data sizes
  • Fix # of parameters for each model size, independent of input data size
  • “Larger” model has more parameters and layers
  • “Larger” data has a longer sequence, but same batch_size and num_channels
  • Models only use Conv1D, Dense ‘learnable’ layers; RNNs avoided per TF-version implem. differences
  • Always ran one train fit outside of benchmarking loop, to omit model & optimizer graph building
  • Not using sparse data (e.g. layers.Embedding()) or sparse targets (e.g. SparseCategoricalCrossEntropy()

LIMITATIONS: a “complete” answer would explain every possible train loop & iterator, but that’s surely beyond my time ability, nonexistent paycheck, or general necessity. The results are only as good as the methodology – interpret with an open mind.


import numpy as np
import tensorflow as tf
import random
from termcolor import cprint
from time import time

from tensorflow.keras.layers import Input, Dense, Conv1D
from tensorflow.keras.layers import Dropout, GlobalAveragePooling1D
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
import tensorflow.keras.backend as K
#from keras.layers import Input, Dense, Conv1D
#from keras.layers import Dropout, GlobalAveragePooling1D
#from keras.models import Model 
#from keras.optimizers import Adam
#import keras.backend as K


def reset_seeds(reset_graph_with_backend=None, verbose=1):
    if reset_graph_with_backend is not None:
        K = reset_graph_with_backend
        if verbose:

    if tf.__version__[0] == '2':
    if verbose:
        print("RANDOM SEEDS RESET")

print("TF version: {}".format(tf.__version__))

def timeit(func, iterations, *args, _verbose=0, **kwargs):
    t0 = time()
    for _ in range(iterations):
        func(*args, **kwargs)
    print("Time/iter: %.4f sec" % ((time() - t0) / iterations))

def make_model_small(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(128, 40, strides=4, padding='same')(ipt)
    x     = GlobalAveragePooling1D()(x)
    x     = Dropout(0.5)(x)
    x     = Dense(64, activation='relu')(x)
    out   = Dense(1,  activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_model_medium(batch_shape):
    ipt = Input(batch_shape=batch_shape)
    x = ipt
    for filters in [64, 128, 256, 256, 128, 64]:
        x  = Conv1D(filters, 20, strides=1, padding='valid')(x)
    x     = GlobalAveragePooling1D()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_model_large(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(64,  400, strides=4, padding='valid')(ipt)
    x     = Conv1D(128, 200, strides=1, padding='valid')(x)
    for _ in range(40):
        x = Conv1D(256,  12, strides=1, padding='same')(x)
    x     = Conv1D(512,  20, strides=2, padding='valid')(x)
    x     = Conv1D(1028, 10, strides=2, padding='valid')(x)
    x     = Conv1D(256,   1, strides=1, padding='valid')(x)
    x     = GlobalAveragePooling1D()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)    
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_data(batch_shape):
    return np.random.randn(*batch_shape), \
           np.random.randint(0, 2, (batch_shape[0], 1))

def make_data_tf(batch_shape, n_batches, iters):
    data = np.random.randn(n_batches, *batch_shape),
    trgt = np.random.randint(0, 2, (n_batches, batch_shape[0], 1))
    return tf.data.Dataset.from_tensor_slices((data, trgt))#.repeat(iters)

batch_shape_small  = (32, 140,   30)
batch_shape_medium = (32, 1400,  30)
batch_shape_large  = (32, 14000, 30)

batch_shapes = batch_shape_small, batch_shape_medium, batch_shape_large
make_model_fns = make_model_small, make_model_medium, make_model_large
iterations = [200, 100, 50]
shape_names = ["Small data",  "Medium data",  "Large data"]
model_names = ["Small model", "Medium model", "Large model"]

def test_all(fit=False, tf_dataset=False):
    for model_fn, model_name, iters in zip(make_model_fns, model_names, iterations):
        for batch_shape, shape_name in zip(batch_shapes, shape_names):
            if (model_fn is make_model_large) and (batch_shape is batch_shape_small):
            if tf_dataset:
                data = make_data_tf(batch_shape, iters, iters)
                data = make_data(batch_shape)
            model = model_fn(batch_shape)

            if fit:
                if tf_dataset:
                    t0 = time()
                    model.fit(data, steps_per_epoch=iters)
                    print("Time/iter: %.4f sec" % ((time() - t0) / iters))
                    timeit(model.fit, iters, *data, _verbose=1, verbose=0)
                timeit(model.train_on_batch, iters, *data, _verbose=1)
            cprint(">> {}, {} done <<\n".format(model_name, shape_name), 'blue')
            del model

test_all(fit=True, tf_dataset=False)

Tensorflow 2.0-AttributeError:模块’tensorflow’没有属性’Session’

问题:Tensorflow 2.0-AttributeError:模块’tensorflow’没有属性’Session’

sess = tf.Session()在Tensorflow 2.0环境中执行命令时,出现如下错误消息:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'tensorflow' has no attribute 'Session'


  • 操作系统平台和发行版:Windows 10
  • python版本:3.7.1
  • Tensorflow版本:2.0.0-alpha0(随pip一起安装)



  1. 点安装-升级点
  2. pip install tensorflow == 2.0.0-alpha0
  3. 点安装keras
  4. 点安装numpy == 1.16.2


  1. 执行命令:将tensorflow导入为tf
  2. 执行命令:sess = tf.Session()

When I am executing the command sess = tf.Session() in Tensorflow 2.0 environment, I am getting an error message as below:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'tensorflow' has no attribute 'Session'

System Information:

  • OS Platform and Distribution: Windows 10
  • Python Version: 3.7.1
  • Tensorflow Version: 2.0.0-alpha0 (installed with pip)

Steps to reproduce:


  1. pip install –upgrade pip
  2. pip install tensorflow==2.0.0-alpha0
  3. pip install keras
  4. pip install numpy==1.16.2


  1. Execute command: import tensorflow as tf
  2. Execute command: sess = tf.Session()

回答 0

根据TF 1:1 Symbols Map,在TF 2.0中,您应该使用tf.compat.v1.Session()而不是tf.Session()


要获得TF 2.0中类似TF 1.x的行为,可以运行

import tensorflow.compat.v1 as tf

但后来人们无法受益于TF 2.0所做的许多改进。有关更多详细信息,请参阅迁移指南 https://www.tensorflow.org/guide/migrate

According to TF 1:1 Symbols Map, in TF 2.0 you should use tf.compat.v1.Session() instead of tf.Session()


To get TF 1.x like behaviour in TF 2.0 one can run

import tensorflow.compat.v1 as tf

but then one cannot benefit of many improvements made in TF 2.0. For more details please refer to the migration guide https://www.tensorflow.org/guide/migrate

回答 1

TF2默认情况下运行急切执行,因此无需会话。如果要运行静态图,则更正确的方法是tf.function()在TF2中使用。虽然仍然可以通过tf.compat.v1.Session()TF2访问Session ,但我不建议使用它。通过比较问候世界中的差异来证明这种差异可能会有所帮助:


import tensorflow as tf
msg = tf.constant('Hello, TensorFlow!')
sess = tf.Session()


import tensorflow as tf
msg = tf.constant('Hello, TensorFlow!')

有关更多信息,请参见Effective TensorFlow 2

TF2 runs Eager Execution by default, thus removing the need for Sessions. If you want to run static graphs, the more proper way is to use tf.function() in TF2. While Session can still be accessed via tf.compat.v1.Session() in TF2, I would discourage using it. It may be helpful to demonstrate this difference by comparing the difference in hello worlds:

TF1.x hello world:

import tensorflow as tf
msg = tf.constant('Hello, TensorFlow!')
sess = tf.Session()

TF2.x hello world:

import tensorflow as tf
msg = tf.constant('Hello, TensorFlow!')

For more info, see Effective TensorFlow 2

回答 2

安装后第一次尝试python时遇到了这个问题 windows10 + python3.7(64bit) + anacconda3 + jupyter notebook.

我通过参考“ https://vispud.blogspot.com/2019/05/tensorflow200a0-attributeerror-module.html ”解决了此问题


我相信TF 2.0已删除了“ Session()”。

我插入了两行。一个是tf.compat.v1.disable_eager_execution(),另一个是sess = tf.compat.v1.Session()


import tensorflow as tf


hello = tf.constant('Hello, TensorFlow!')

sess = tf.compat.v1.Session()


I faced this problem when I first tried python after installing windows10 + python3.7(64bit) + anacconda3 + jupyter notebook.

I solved this problem by refering to “https://vispud.blogspot.com/2019/05/tensorflow200a0-attributeerror-module.html

I agree with

I believe “Session()” has been removed with TF 2.0.

I inserted two lines. One is tf.compat.v1.disable_eager_execution() and the other is sess = tf.compat.v1.Session()

My Hello.py is as follows:

import tensorflow as tf


hello = tf.constant('Hello, TensorFlow!')

sess = tf.compat.v1.Session()


回答 3


import tensorflow as tf
with tf.compat.v1.Session() as sess:
    hello = tf.constant('hello world')

>>> b'hello world

For TF2.x, you can do like this.

import tensorflow as tf
with tf.compat.v1.Session() as sess:
    hello = tf.constant('hello world')

>>> b'hello world

回答 4


import tensorflow as tf


hello = tf.constant('Hello, TensorFlow!')

sess = tf.compat.v1.Session()


try this

import tensorflow as tf


hello = tf.constant('Hello, TensorFlow!')

sess = tf.compat.v1.Session()


回答 5

如果这是您的代码,则正确的解决方案是将其重写为不使用Session(),因为在TensorFlow 2中不再需要

如果这只是您正在运行的代码,则可以通过运行降级到TensorFlow 1

pip3 install --upgrade --force-reinstall tensorflow-gpu==1.15.0 

(或TensorFlow 1最新版本

If this is your code, the correct solution is to rewrite it to not use Session(), since that’s no longer necessary in TensorFlow 2

If this is just code you’re running, you can downgrade to TensorFlow 1 by running

pip3 install --upgrade --force-reinstall tensorflow-gpu==1.15.0 

(or whatever the latest version of TensorFlow 1 is)

回答 6

Tensorflow 2.x支持默认执行Eager Execution,因此不支持Session。

Tensorflow 2.x support’s Eager Execution by default hence Session is not supported.

回答 7

使用Anaconda + Spyder(Python 3.7)


import tensorflow as tf
valor1 = tf.constant(2)
valor2 = tf.constant(3)
sess = tf.compat.v1.Session()
with sess:


import tensorflow as tf
valor1 = tf.constant(2)
valor2 = tf.constant(3)
Tensor("Const_8:0", shape=(), dtype=int32)
Out[18]: tensorflow.python.framework.ops.Tensor

Tensor("add_4:0", shape=(), dtype=int32)

sess = tf.compat.v1.Session()

with sess:

Using Anaconda + Spyder (Python 3.7)


import tensorflow as tf
valor1 = tf.constant(2)
valor2 = tf.constant(3)
sess = tf.compat.v1.Session()
with sess:


import tensorflow as tf
valor1 = tf.constant(2)
valor2 = tf.constant(3)
Tensor("Const_8:0", shape=(), dtype=int32)
Out[18]: tensorflow.python.framework.ops.Tensor

Tensor("add_4:0", shape=(), dtype=int32)

sess = tf.compat.v1.Session()

with sess:

回答 8

TF v2.0支持Eager模式和v1.0的Graph模式。因此,v2.0不支持tf.session()。因此,建议您重写代码以在Eager模式下工作。

TF v2.0 supports Eager mode vis-a-vis Graph mode of v1.0. Hence, tf.session() is not supported on v2.0. Hence, would suggest you to rewrite your code to work in Eager mode.

回答 9

import tensorflow as tf
sess = tf.Session()




import tensorflow.compat.v1 as tf
sess = tf.Session()
import tensorflow as tf
sess = tf.Session()

this code will show an Attribute error on version 2.x

to use version 1.x code in version 2.x

try this

import tensorflow.compat.v1 as tf
sess = tf.Session()





I’m running a Keras model, with a submission deadline of 36 hours, if I train my model on the cpu it will take approx 50 hours, is there a way to run Keras on gpu?

I’m using Tensorflow backend and running it on my Jupyter notebook, without anaconda installed.

回答 0


  1. 您的系统具有GPU(Nvidia。因为AMD尚未运行)
  2. 您已经安装了Tensorflow的GPU版本
  3. 您已安装CUDA 安装说明
  4. 验证Tensorflow是否与GPU一起运行,检查GPU是否正常工作

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))


from tensorflow.python.client import device_lib


  name: "/cpu:0"device_type: "CPU",
  name: "/gpu:0"device_type: "GPU"


要检查keras(> = 2.1.1)是否使用GPU:

from keras import backend as K


Yes you can run keras models on GPU. Few things you will have to check first.

  1. your system has GPU (Nvidia. As AMD doesn’t work yet)
  2. You have installed the GPU version of tensorflow
  3. You have installed CUDA installation instructions
  4. Verify that tensorflow is running with GPU check if GPU is working

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))


from tensorflow.python.client import device_lib

output will be something like this:

  name: "/cpu:0"device_type: "CPU",
  name: "/gpu:0"device_type: "GPU"

Once all this is done your model will run on GPU:

To Check if keras(>=2.1.1) is using GPU:

from keras import backend as K

All the best.

回答 1

当然。我想您已经安装了TensorFlow for GPU。


import keras
import tensorflow as tf

config = tf.ConfigProto( device_count = {'GPU': 1 , 'CPU': 56} ) 
sess = tf.Session(config=config) 


Sure. I suppose that you have already installed TensorFlow for GPU.

You need to add the following block after importing keras. I am working on a machine which have 56 core cpu, and a gpu.

import keras
import tensorflow as tf

config = tf.ConfigProto( device_count = {'GPU': 1 , 'CPU': 56} ) 
sess = tf.Session(config=config) 

Of course, this usage enforces my machines maximum limits. You can decrease cpu and gpu consumption values.

回答 2

2.0兼容答案:虽然上面提到的答案详细说明了如何在Keras Model上使用GPU,但我想说明如何实现Tensorflow Version 2.0


print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))




# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)



在设备/ job:localhost / replica:0 / task:0 / device:GPU:0 tf.Tensor([[22. 28.] [49. 64.]],shape =(2,2)中执行op MatMul dtype = float32)


2.0 Compatible Answer: While above mentioned answer explain in detail on how to use GPU on Keras Model, I want to explain how it can be done for Tensorflow Version 2.0.

To know how many GPUs are available, we can use the below code:

print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

To find out which devices your operations and tensors are assigned to, put tf.debugging.set_log_device_placement(True) as the first statement of your program.

Enabling device placement logging causes any Tensor allocations or operations to be printed. For example, running the below code:


# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)


gives the Output shown below:

Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0 tf.Tensor( [[22. 28.] [49. 64.]], shape=(2, 2), dtype=float32)

For more information, refer this link

回答 3



“ THEANO_FLAGS = device = gpu,floatX = float32 python my_keras_script.py”

Of course. if you are running on Tensorflow or CNTk backends, your code will run on your GPU devices defaultly.But if Theano backends, you can use following

Theano flags:

“THEANO_FLAGS=device=gpu,floatX=float32 python my_keras_script.py”

回答 4


此外,需要使用适用于CUDA版本的适当CUDA DNN库,才能使用tensorflow运行GPU。从此处下载/提取它,并将DLL(例如cudnn64_7.dll)放入CUDA bin文件夹(例如C:\ Program Files \ NVIDIA GPU Computing Toolkit \ CUDA \ v10.1 \ bin)。

See if your script is running GPU in Task manager. If not, suspect your CUDA version is right one for the tensorflow version you are using, as the other answers suggested already.

Additionally, a proper CUDA DNN library for the CUDA version is required to run GPU with tensorflow. Download/extract it from here and put the DLL (e.g., cudnn64_7.dll) into CUDA bin folder (e.g., C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin).

如何修复imdb.load_data()函数的“ allow_pickle = False时无法加载对象数组”?

问题:如何修复imdb.load_data()函数的“ allow_pickle = False时无法加载对象数组”?

我正在尝试使用Google Colab中的IMDb数据集实现二进制分类示例。我以前已经实现了此模型。但是,几天后我再次尝试执行此操作时,它返回一个值错误:对于load_data()函数,当allow_pickle = False时无法加载对象数组。

我已经尝试解决此问题,请参考一个类似问题的现有答案:如何修复sketch_rnn算法中的“当allow_pickle = False时无法加载对象数组”, 但事实证明,仅添加allow_pickle参数是不够的。


from keras.datasets import imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)


ValueError                                Traceback (most recent call last)
<ipython-input-1-2ab3902db485> in <module>()
      1 from keras.datasets import imdb
----> 2 (train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

2 frames
/usr/local/lib/python3.6/dist-packages/keras/datasets/imdb.py in load_data(path, num_words, skip_top, maxlen, seed, start_char, oov_char, index_from, **kwargs)
     57                     file_hash='599dadb1135973df5b59232a0e9a887c')
     58     with np.load(path) as f:
---> 59         x_train, labels_train = f['x_train'], f['y_train']
     60         x_test, labels_test = f['x_test'], f['y_test']

/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py in __getitem__(self, key)
    260                 return format.read_array(bytes,
    261                                          allow_pickle=self.allow_pickle,
--> 262                                          pickle_kwargs=self.pickle_kwargs)
    263             else:
    264                 return self.zip.read(key)

/usr/local/lib/python3.6/dist-packages/numpy/lib/format.py in read_array(fp, allow_pickle, pickle_kwargs)
    690         # The array contained Python objects. We need to unpickle the data.
    691         if not allow_pickle:
--> 692             raise ValueError("Object arrays cannot be loaded when "
    693                              "allow_pickle=False")
    694         if pickle_kwargs is None:

ValueError: Object arrays cannot be loaded when allow_pickle=False

I’m trying to implement the binary classification example using the IMDb dataset in Google Colab. I have implemented this model before. But when I tried to do it again after a few days, it returned a value error: 'Object arrays cannot be loaded when allow_pickle=False' for the load_data() function.

I have already tried solving this, referring to an existing answer for a similar problem: How to fix ‘Object arrays cannot be loaded when allow_pickle=False’ in the sketch_rnn algorithm. But it turns out that just adding an allow_pickle argument isn’t sufficient.

My code:

from keras.datasets import imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

The error:

ValueError                                Traceback (most recent call last)
<ipython-input-1-2ab3902db485> in <module>()
      1 from keras.datasets import imdb
----> 2 (train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

2 frames
/usr/local/lib/python3.6/dist-packages/keras/datasets/imdb.py in load_data(path, num_words, skip_top, maxlen, seed, start_char, oov_char, index_from, **kwargs)
     57                     file_hash='599dadb1135973df5b59232a0e9a887c')
     58     with np.load(path) as f:
---> 59         x_train, labels_train = f['x_train'], f['y_train']
     60         x_test, labels_test = f['x_test'], f['y_test']

/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py in __getitem__(self, key)
    260                 return format.read_array(bytes,
    261                                          allow_pickle=self.allow_pickle,
--> 262                                          pickle_kwargs=self.pickle_kwargs)
    263             else:
    264                 return self.zip.read(key)

/usr/local/lib/python3.6/dist-packages/numpy/lib/format.py in read_array(fp, allow_pickle, pickle_kwargs)
    690         # The array contained Python objects. We need to unpickle the data.
    691         if not allow_pickle:
--> 692             raise ValueError("Object arrays cannot be loaded when "
    693                              "allow_pickle=False")
    694         if pickle_kwargs is None:

ValueError: Object arrays cannot be loaded when allow_pickle=False

回答 0


(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)


import numpy as np
# save np.load
np_load_old = np.load

# modify the default parameters of np.load
np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)

# call load_data with allow_pickle implicitly set to true
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

# restore np.load for future normal usage
np.load = np_load_old

Here’s a trick to force imdb.load_data to allow pickle by, in your notebook, replacing this line:

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

by this:

import numpy as np
# save np.load
np_load_old = np.load

# modify the default parameters of np.load
np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)

# call load_data with allow_pickle implicitly set to true
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

# restore np.load for future normal usage
np.load = np_load_old

回答 1

这个问题仍然存在于keras git上。我希望它能尽快解决。在此之前,请尝试将numpy版本降级为1.16.2。看来解决了问题。

!pip install numpy==1.16.1
import numpy as np

此版本的numpy的默认值为allow_pickleas True

This issue is still up on keras git. I hope it gets solved as soon as possible. Until then, try downgrading your numpy version to 1.16.2. It seems to solve the problem.

!pip install numpy==1.16.1
import numpy as np

This version of numpy has the default value of allow_pickle as True.

回答 2


-  with np.load(path) as f:
+  with np.load(path, allow_pickle=True) as f:


Following this issue on GitHub, the official solution is to edit the imdb.py file. This fix worked well for me without the need to downgrade numpy. Find the imdb.py file at tensorflow/python/keras/datasets/imdb.py (full path for me was: C:\Anaconda\Lib\site-packages\tensorflow\python\keras\datasets\imdb.py – other installs will be different) and change line 85 as per the diff:

-  with np.load(path) as f:
+  with np.load(path, allow_pickle=True) as f:

The reason for the change is security to prevent the Python equivalent of an SQL injection in a pickled file. The change above will ONLY effect the imdb data and you therefore retain the security elsewhere (by not downgrading numpy).

回答 3

我只是使用allow_pickle = True作为np.load()的参数,它对我有用。

I just used allow_pickle = True as an argument to np.load() and it worked for me.

回答 4


np.load(path, allow_pickle=True)

In my case worked with:

np.load(path, allow_pickle=True)

回答 5



old = np.load
np.load = lambda *a,**k: old(*a,**k,allow_pickle=True)

from keras.datasets import reuters
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)

np.load = old

I think the answer from cheez (https://stackoverflow.com/users/122933/cheez) is the easiest and most effective one. I’d elaborate a little bit over it so it would not modify a numpy function for the whole session period.

My suggestion is below. I´m using it to download the reuters dataset from keras which is showing the same kind of error:

old = np.load
np.load = lambda *a,**k: old(*a,**k,allow_pickle=True)

from keras.datasets import reuters
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)

np.load = old

回答 6



You can try changing the flag’s value


回答 7

上面列出的解决方案都不对我有用:我使用python 3.7.3运行anaconda。对我有用的是

  • 从Anaconda powershell运行“ conda install numpy == 1.16.1”

  • 关闭并重新打开笔记本

none of the above listed solutions worked for me: i run anaconda with python 3.7.3. What worked for me was

  • run “conda install numpy==1.16.1” from Anaconda powershell

  • close and reopen the notebook

回答 8


np_load_old = np.load

# modify the default parameters of np.load
np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)


TypeError :()为关键字参数“ allow_pickle”获得了多个值


on jupyter notebook using

np_load_old = np.load

# modify the default parameters of np.load
np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)

worked fine, but the problem appears when you use this method in spyder(you have to restart the kernel every time or you will get an error like:

TypeError : () got multiple values for keyword argument ‘allow_pickle’

I solved this issue using the solution here:

回答 9





np.load(path, allow_pickle=True)

I landed up here, tried your ways and could not figure out.

I was actually working on a pregiven code where


was used so i replaced it with

np.load(path, allow_pickle=True)

回答 10


对于使用PyCharm IDE的用户:

在我的IDE(Pycharm)中,依次单击File-> Settings-> Project Interpreter:我发现numpy为1.16.3,所以我恢复为1.16.1。单击+并在搜索中键入numpy,在“指定版本”上打勾:1.16.1并选择->安装软件包。

Yes, installing previous a version of numpy solved the problem.

For those who uses PyCharm IDE:

in my IDE (Pycharm), File->Settings->Project Interpreter: I found my numpy to be 1.16.3, so I revert back to 1.16.1. Click + and type numpy in the search, tick “specify version” : 1.16.1 and choose–> install package.

回答 11

找到imdb.py的路径,然后将标志添加到np.load(path,… flag …)

    def load_data(.......):
    - with np.load(path) as f:
    + with np.load(path,allow_pickle=True) as f:

find the path to imdb.py then just add the flag to np.load(path,…flag…)

    def load_data(.......):
    - with np.load(path) as f:
    + with np.load(path,allow_pickle=True) as f:

回答 12


        np_load_old = np.load
        np.load = lambda *a: np_load_old(*a, allow_pickle=True)
        (x_train, y_train), (x_test, y_test) = reuters.load_data(num_words=None, test_split=0.2)
        np.load = np_load_old

Its work for me

        np_load_old = np.load
        np.load = lambda *a: np_load_old(*a, allow_pickle=True)
        (x_train, y_train), (x_test, y_test) = reuters.load_data(num_words=None, test_split=0.2)
        np.load = np_load_old

回答 13

我发现TensorFlow 2.0(我正在使用2.0.0-alpha0)与最新版本的Numpy不兼容,即v1.17.0(可能还有v1.16.5 +)。导入TF2后,它会抛出巨大的FutureWarning列表,如下所示:

FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/anaconda3/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/anaconda3/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/anaconda3/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.



np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)

我找到的最简单的解决方案是全局安装numpy 1.16.1,或者在虚拟环境中使用tensorflow和numpy的兼容版本。


What I have found is that TensorFlow 2.0 (I am using 2.0.0-alpha0) is not compatible with the latest version of Numpy i.e. v1.17.0 (and possibly v1.16.5+). As soon as TF2 is imported, it throws a huge list of FutureWarning, that looks something like this:

FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/anaconda3/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/anaconda3/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/anaconda3/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.

This also resulted in the allow_pickle error when tried to load imdb dataset from keras

I tried to use the following solution which worked just fine, but I had to do it every single project where I was importing TF2 or tf.keras.

np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)

The easiest solution I found was to either install numpy 1.16.1 globally, or use compatible versions of tensorflow and numpy in a virtual environment.

My goal with this answer is to point out that its not just a problem with imdb.load_data, but a larger problem vaused by incompatibility of TF2 and Numpy versions and may result in many other hidden bugs or issues.

回答 14


!pip install tf-nightly


Tensorflow has a fix in tf-nightly version.

!pip install tf-nightly

The current version is ‘2.0.0-dev20190511’.

回答 15


import numpy as np
from functools import partial

# save np.load
np_load_old = partial(np.load)

# modify the default parameters of np.load
np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)

# call load_data with allow_pickle implicitly set to true
(train_data, train_labels), (test_data, test_labels) = 

# restore np.load for future normal usage
np.load = np_load_old

The answer of @cheez sometime doesn’t work and recursively call the function again and again. To solve this problem you should copy the function deeply. You can do this by using the function partial, so the final code is:

import numpy as np
from functools import partial

# save np.load
np_load_old = partial(np.load)

# modify the default parameters of np.load
np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)

# call load_data with allow_pickle implicitly set to true
(train_data, train_labels), (test_data, test_labels) = 

# restore np.load for future normal usage
np.load = np_load_old

回答 16

我通常不发布这些东西,但这太烦人了。混淆来自某些Keras imdb.py文件已经更新的事实:

with np.load(path) as f:


from keras.datasets import imdb
(train_text, train_labels), (test_text, test_labels) = imdb.load_data(num_words=10000)

I don’t usually post to these things but this was super annoying. The confusion comes from the fact that some of the Keras imdb.py files have already updated:

with np.load(path) as f:

to the version with allow_pickle=True. Make sure check the imdb.py file to see if this change was already implemented. If it has been adjusted, the following works fine:

from keras.datasets import imdb
(train_text, train_labels), (test_text, test_labels) = imdb.load_data(num_words=10000)

回答 17


The easiest way is to change imdb.py setting allow_pickle=True to np.load at the line where imdb.py throws error.




I’m training a neural network for my project using Keras. Keras has provided a function for early stopping. May I know what parameters should be observed to avoid my neural network from overfitting by using early stopping?

回答 0


                              verbose=0, mode='auto')


  1. 通过将monitor 参数设置为,监控验证损失(需要使用交叉验证或至少训练/测试集)'val_loss'
  2. min_delta是在某个时期是否将损失量化为改善的阈值。如果损失差异小于min_delta,则将其量化为无改善。最好将其保留为0,因为我们对损失越来越严重感兴趣。
  3. patience参数代表损失开始增加(停止改善)后停止之前的时期数。这取决于您的实现,如果您使用的批次非常小学习率较高,则损失呈锯齿状(准确性会更加嘈杂),因此最好设置一个较大的patience参数。如果您使用大批量学习率较低,则损失会更平稳,因此可以使用较小的patience参数。无论哪种方式,我都将其保留为2,以便为模型提供更多机会。
  4. verbose 确定要打印的内容,将其保留为默认值(0)。
  5. mode参数取决于您监视的数量的方向(应该是减少还是增加),因为我们监视损失,所以可以使用min。但是让我们留给喀拉拉邦为我们处理,并将其设置为auto


                              verbose=0, mode='auto')

为了避免对回调的工作方式产生歧义,我将尝试解释更多信息。调用fit(... callbacks=[es])模型后,Keras会调用给定的回调对象预定的函数。这些功能可以称为on_train_beginon_train_endon_epoch_beginon_epoch_endon_batch_beginon_batch_end。在每个时期结束时调用提前停止回调,将最佳监视值与当前监视值进行比较,并在条件满足时停止(自观察最佳监视值以来已经过去了多少个时期,这不仅仅是耐心参数,两者之间的差最后一个值大于min_delta等。)。

正如@BrentFaust在评论中指出的那样,模型的训练将继续进行,直到满足Early Stopping条件或满足epochsin(默认值= 10)为止fit()。设置“提早停止”回调不会使模型超出其epochs参数进行训练。因此,调用fit()较大epochs值的函数将受益于Early Stopping回调。

Early stopping is basically stopping the training once your loss starts to increase (or in other words validation accuracy starts to decrease). According to documents it is used as follows;

                              verbose=0, mode='auto')

Values depends on your implementation (problem, batch size etc…) but generally to prevent overfitting I would use;

  1. Monitor the validation loss (need to use cross validation or at least train/test sets) by setting the monitor argument to 'val_loss'.
  2. min_delta is a threshold to whether quantify a loss at some epoch as improvement or not. If the difference of loss is below min_delta, it is quantified as no improvement. Better to leave it as 0 since we’re interested in when loss becomes worse.
  3. patience argument represents the number of epochs before stopping once your loss starts to increase (stops improving). This depends on your implementation, if you use very small batches or a large learning rate your loss zig-zag (accuracy will be more noisy) so better set a large patience argument. If you use large batches and a small learning rate your loss will be smoother so you can use a smaller patience argument. Either way I’ll leave it as 2 so I would give the model more chance.
  4. verbose decides what to print, leave it at default (0).
  5. mode argument depends on what direction your monitored quantity has (is it supposed to be decreasing or increasing), since we monitor the loss, we can use min. But let’s leave keras handle that for us and set that to auto

So I would use something like this and experiment by plotting the error loss with and without early stopping.

                              verbose=0, mode='auto')

For possible ambiguity on how callbacks work, I’ll try to explain more. Once you call fit(... callbacks=[es]) on your model, Keras calls given callback objects predetermined functions. These functions can be called on_train_begin, on_train_end, on_epoch_begin, on_epoch_end and on_batch_begin, on_batch_end. Early stopping callback is called on every epoch end, compares the best monitored value with the current one and stops if conditions are met (how many epochs have past since the observation of the best monitored value and is it more than patience argument, the difference between last value is bigger than min_delta etc..).

As pointed by @BrentFaust in comments, model’s training will continue until either Early Stopping conditions are met or epochs parameter (default=10) in fit() is satisfied. Setting an Early Stopping callback will not make the model to train beyond its epochs parameter. So calling fit() function with a larger epochs value would benefit more from Early Stopping callback.




I have Keras installed with the Tensorflow backend and CUDA. I’d like to sometimes on demand force Keras to use CPU. Can this be done without say installing a separate CPU-only Tensorflow in a virtual environment? If so how? If the backend were Theano, the flags could be set, but I have not heard of Tensorflow flags accessible via Keras.

回答 0



import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"   # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"] = ""

在导入Keras / Tensorflow之前。



$ CUDA_VISIBLE_DEVICES="" ./your_keras_code.py


  1. https://github.com/keras-team/keras/issues/152
  2. https://github.com/fchollet/keras/issues/4613

If you want to force Keras to use CPU

Way 1

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"   # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"] = ""

before Keras / Tensorflow is imported.

Way 2

Run your script as

$ CUDA_VISIBLE_DEVICES="" ./your_keras_code.py

See also

  1. https://github.com/keras-team/keras/issues/152
  2. https://github.com/fchollet/keras/issues/4613

回答 1


import tensorflow as tf
from keras import backend as K

num_cores = 4

if GPU:
    num_GPU = 1
    num_CPU = 1
if CPU:
    num_CPU = 1
    num_GPU = 0

config = tf.ConfigProto(intra_op_parallelism_threads=num_cores,
                        device_count = {'CPU' : num_CPU,
                                        'GPU' : num_GPU}

session = tf.Session(config=config)

在此处,通过booleans GPUCPU,我们通过严格定义允许Tensorflow会话访问的GPU和CPU的数量来指示我们是否要使用GPU或CPU运行代码。变量num_GPUnum_CPU定义该值。num_cores然后通过intra_op_parallelism_threads和设置可供使用的CPU内核数inter_op_parallelism_threads


allow_soft_placement 如果满足以下任一条件,则允许在CPU上运行操作:

  1. 该操作没有GPU实现

  2. 没有已知或注册的GPU设备

  3. 需要与CPU的其他输入一起放置


注意:这要求tensorflow-gpucuda/ cudnn要安装,因为提供了使用GPU的选项。


A rather separable way of doing this is to use

import tensorflow as tf
from keras import backend as K

num_cores = 4

if GPU:
    num_GPU = 1
    num_CPU = 1
if CPU:
    num_CPU = 1
    num_GPU = 0

config = tf.ConfigProto(intra_op_parallelism_threads=num_cores,
                        device_count = {'CPU' : num_CPU,
                                        'GPU' : num_GPU}

session = tf.Session(config=config)

Here, with booleans GPU and CPU, we indicate whether we would like to run our code with the GPU or CPU by rigidly defining the number of GPUs and CPUs the Tensorflow session is allowed to access. The variables num_GPU and num_CPU define this value. num_cores then sets the number of CPU cores available for usage via intra_op_parallelism_threads and inter_op_parallelism_threads.

The intra_op_parallelism_threads variable dictates the number of threads a parallel operation in a single node in the computation graph is allowed to use (intra). While the inter_ops_parallelism_threads variable defines the number of threads accessible for parallel operations across the nodes of the computation graph (inter).

allow_soft_placement allows for operations to be run on the CPU if any of the following criterion are met:

  1. there is no GPU implementation for the operation

  2. there are no GPU devices known or registered

  3. there is a need to co-locate with other inputs from the CPU

All of this is executed in the constructor of my class before any other operations, and is completely separable from any model or other code I use.

Note: This requires tensorflow-gpu and cuda/cudnn to be installed because the option is given to use a GPU.


回答 2


import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

This worked for me (win10), place before you import keras:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

回答 3


import tensorflow as tf
# your code here
with tf.device('/gpu:0'):
    model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list)

Just import tensortflow and use keras, it’s that easy.

import tensorflow as tf
# your code here
with tf.device('/gpu:0'):
    model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list)

回答 4

按照keras 教程,您可以简单地使用与tf.device常规tensorflow中相同的作用域:

with tf.device('/gpu:0'):
    x = tf.placeholder(tf.float32, shape=(None, 20, 64))
    y = LSTM(32)(x)  # all ops in the LSTM layer will live on GPU:0

with tf.device('/cpu:0'):
    x = tf.placeholder(tf.float32, shape=(None, 20, 64))
    y = LSTM(32)(x)  # all ops in the LSTM layer will live on CPU:0

As per keras tutorial, you can simply use the same tf.device scope as in regular tensorflow:

with tf.device('/gpu:0'):
    x = tf.placeholder(tf.float32, shape=(None, 20, 64))
    y = LSTM(32)(x)  # all ops in the LSTM layer will live on GPU:0

with tf.device('/cpu:0'):
    x = tf.placeholder(tf.float32, shape=(None, 20, 64))
    y = LSTM(32)(x)  # all ops in the LSTM layer will live on CPU:0

回答 5


你应该写 CUDA_VISIBLE_DEVICES=0 python test.py


I just spent some time figure it out. Thoma’s answer is not complete. Say your program is test.py, you want to use gpu0 to run this program, and keep other gpus free.

You should write CUDA_VISIBLE_DEVICES=0 python test.py

Notice it’s DEVICES not DEVICE

回答 6



For people working on PyCharm, and for forcing CPU, you can add the following line in the Run/Debug configuration, under Environment variables:







#Partly train model
model.fit(first_training, first_classes, batch_size=32, nb_epoch=20)

#Save partly trained model

#Load partly trained model
from keras.models import load_model
model = load_model('partly_trained.h5')

#Continue training
model.fit(second_training, second_classes, batch_size=32, nb_epoch=20)





Model by: http://machinelearningmastery.com/
# load (downloaded if needed) the MNIST dataset
import numpy
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
from keras.models import load_model

def baseline_model():
    model = Sequential()
    model.add(Dense(num_pixels, input_dim=num_pixels, init='normal', activation='relu'))
    model.add(Dense(num_classes, init='normal', activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

if __name__ == '__main__':
    # load data
    (X_train, y_train), (X_test, y_test) = mnist.load_data()

    # flatten 28*28 images to a 784 vector for each image
    num_pixels = X_train.shape[1] * X_train.shape[2]
    X_train = X_train.reshape(X_train.shape[0], num_pixels).astype('float32')
    X_test = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')
    # normalize inputs from 0-255 to 0-1
    X_train = X_train / 255
    X_test = X_test / 255
    # one hot encode outputs
    y_train = np_utils.to_categorical(y_train)
    y_test = np_utils.to_categorical(y_test)
    num_classes = y_test.shape[1]

    # build the model
    model = baseline_model()

    #Partly train model
    dataset1_x = X_train[:3000]
    dataset1_y = y_train[:3000]
    model.fit(dataset1_x, dataset1_y, nb_epoch=10, batch_size=200, verbose=2)

    # Final evaluation of the model
    scores = model.evaluate(X_test, y_test, verbose=0)
    print("Baseline Error: %.2f%%" % (100-scores[1]*100))

    #Save partly trained model
    del model

    #Reload model
    model = load_model('partly_trained.h5')

    #Continue training
    dataset2_x = X_train[3000:]
    dataset2_y = y_train[3000:]
    model.fit(dataset2_x, dataset2_y, nb_epoch=10, batch_size=200, verbose=2)
    scores = model.evaluate(X_test, y_test, verbose=0)
    print("Baseline Error: %.2f%%" % (100-scores[1]*100))

I was wondering if it was possible to save a partly trained Keras model and continue the training after loading the model again.

The reason for this is that I will have more training data in the future and I do not want to retrain the whole model again.

The functions which I am using are:

#Partly train model
model.fit(first_training, first_classes, batch_size=32, nb_epoch=20)

#Save partly trained model

#Load partly trained model
from keras.models import load_model
model = load_model('partly_trained.h5')

#Continue training
model.fit(second_training, second_classes, batch_size=32, nb_epoch=20)

Edit 1: added fully working example

With the first dataset after 10 epochs the loss of the last epoch will be 0.0748 and the accuracy 0.9863.

After saving, deleting and reloading the model the loss and accuracy of the model trained on the second dataset will be 0.1711 and 0.9504 respectively.

Is this caused by the new training data or by a completely re-trained model?

Model by: http://machinelearningmastery.com/
# load (downloaded if needed) the MNIST dataset
import numpy
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
from keras.models import load_model

def baseline_model():
    model = Sequential()
    model.add(Dense(num_pixels, input_dim=num_pixels, init='normal', activation='relu'))
    model.add(Dense(num_classes, init='normal', activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

if __name__ == '__main__':
    # load data
    (X_train, y_train), (X_test, y_test) = mnist.load_data()

    # flatten 28*28 images to a 784 vector for each image
    num_pixels = X_train.shape[1] * X_train.shape[2]
    X_train = X_train.reshape(X_train.shape[0], num_pixels).astype('float32')
    X_test = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')
    # normalize inputs from 0-255 to 0-1
    X_train = X_train / 255
    X_test = X_test / 255
    # one hot encode outputs
    y_train = np_utils.to_categorical(y_train)
    y_test = np_utils.to_categorical(y_test)
    num_classes = y_test.shape[1]

    # build the model
    model = baseline_model()

    #Partly train model
    dataset1_x = X_train[:3000]
    dataset1_y = y_train[:3000]
    model.fit(dataset1_x, dataset1_y, nb_epoch=10, batch_size=200, verbose=2)

    # Final evaluation of the model
    scores = model.evaluate(X_test, y_test, verbose=0)
    print("Baseline Error: %.2f%%" % (100-scores[1]*100))

    #Save partly trained model
    del model

    #Reload model
    model = load_model('partly_trained.h5')

    #Continue training
    dataset2_x = X_train[3000:]
    dataset2_y = y_train[3000:]
    model.fit(dataset2_x, dataset2_y, nb_epoch=10, batch_size=200, verbose=2)
    scores = model.evaluate(X_test, y_test, verbose=0)
    print("Baseline Error: %.2f%%" % (100-scores[1]*100))

回答 0

实际上- model.save根据您的情况保存重新开始培训所需的所有信息。重新加载模型可能会破坏的唯一事情是优化器状态。要进行检查-尝试save重新加载模型并根据训练数据进行训练。

Actually – model.save saves all information need for restarting training in your case. The only thing which could be spoiled by reloading model is your optimizer state. To check that – try to save and reload model and train it on training data.

回答 1


reduce_lr = ReduceLROnPlateau(monitor='loss', factor=lr_reduction_factor,
                              patience=patience, min_lr=min_lr, verbose=1)



The problem might be that you use a different optimizer – or different arguments to your optimizer. I just had the same issue with a custom pretrained model, using

reduce_lr = ReduceLROnPlateau(monitor='loss', factor=lr_reduction_factor,
                              patience=patience, min_lr=min_lr, verbose=1)

for the pretrained model, whereby the original learning rate starts at 0.0003 and during pre-training it is reduced to the min_learning rate, which is 0.000003

I just copied that line over to the script which uses the pre-trained model and got really bad accuracies. Until I noticed that the last learning rate of the pretrained model was the min learning rate, i.e. 0.000003. And if I start with that learning rate, I get exactly the same accuracies to start with as the output of the pretrained model – which makes sense, as starting with a learning rate that is 100 times bigger than the last learning rate used in the pretrained model will result in a huge overshoot of GD and hence in heavily decreased accuracies.

回答 2


import tensorflow as tf
from tensorflow import keras
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

def create_model():
  model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(512, activation=tf.nn.relu),  
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)

  model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',metrics=['accuracy'])
  return model

# Create a basic model instance
model.fit(x_train, y_train, epochs = 10, validation_data = (x_test,y_test),verbose=1)

请以* .tf格式保存模型。根据我的经验,如果您定义了任何custom_loss,*。h5格式将不会保存优化器状态,​​因此如果您要从我们离开的地方重新训练模型,将无法达到您的目的。

# saving the model in tensorflow format

# loading the saved model
loaded_model = tf.keras.models.load_model('./MyModel_tf')

# retraining the model
loaded_model.fit(x_train, y_train, epochs = 10, validation_data = (x_test,y_test),verbose=1)



Most of the above answers covered important points. If you are using recent Tensorflow (TF2.1 or above), Then the following example will help you. The model part of the code is from Tensorflow website.

import tensorflow as tf
from tensorflow import keras
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

def create_model():
  model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(512, activation=tf.nn.relu),  
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)

  model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',metrics=['accuracy'])
  return model

# Create a basic model instance
model.fit(x_train, y_train, epochs = 10, validation_data = (x_test,y_test),verbose=1)

Please save the model in *.tf format. From my experience, if you have any custom_loss defined, *.h5 format will not save optimizer status and hence will not serve your purpose if you want to retrain the model from where we left.

# saving the model in tensorflow format

# loading the saved model
loaded_model = tf.keras.models.load_model('./MyModel_tf')

# retraining the model
loaded_model.fit(x_train, y_train, epochs = 10, validation_data = (x_test,y_test),verbose=1)

This approach will restart the training where we left before saving the model. As mentioned by others, if you want to save weights of best model or you want to save weights of model every epoch you need to use keras callbacks function (ModelCheckpoint) with options such as save_weights_only=True, save_freq='epoch', and save_best_only.

For more details, please check here and another example here.

回答 3


Notice that Keras sometimes has issues with loaded models, as in here. This might explain cases in which you don’t start from the same trained accuracy.

回答 4



All above helps, you must resume from same learning rate() as the LR when the model and weights were saved. Set it directly on the optimizer.

Note that improvement from there is not guaranteed, because the model may have reached the local minimum, which may be global. There is no point to resume a model in order to search for another local minimum, unless you intent to increase the learning rate in a controlled fashion and nudge the model into a possibly better minimum not far away.

回答 5


You might also be hitting Concept Drift, see Should you retrain a model when new observations are available. There’s also the concept of catastrophic forgetting which a bunch of academic papers discuss. Here’s one with MNIST Empirical investigation of catastrophic forgetting




我为此阅读了该文档:http : //keras.io/layers/normalization/


model = Sequential()
keras.layers.normalization.BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None)
model.add(Dense(64, input_dim=14, init='uniform'))
model.add(Dense(64, init='uniform'))
model.add(Dense(2, init='uniform'))

sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd)
model.fit(X_train, y_train, nb_epoch=20, batch_size=16, show_accuracy=True, validation_split=0.2, verbose = 2)


If I want to use the BatchNormalization function in Keras, then do I need to call it once only at the beginning?

I read this documentation for it: http://keras.io/layers/normalization/

I don’t see where I’m supposed to call it. Below is my code attempting to use it:

model = Sequential()
keras.layers.normalization.BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None)
model.add(Dense(64, input_dim=14, init='uniform'))
model.add(Dense(64, init='uniform'))
model.add(Dense(2, init='uniform'))

sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd)
model.fit(X_train, y_train, nb_epoch=20, batch_size=16, show_accuracy=True, validation_split=0.2, verbose = 2)

I ask because if I run the code with the second line including the batch normalization and if I run the code without the second line I get similar outputs. So either I’m not calling the function in the right place, or I guess it doesn’t make that much of a difference.

回答 0




# import BatchNormalization
from keras.layers.normalization import BatchNormalization

# instantiate model
model = Sequential()

# we can think of this chunk as the input layer
model.add(Dense(64, input_dim=14, init='uniform'))

# we can think of this chunk as the hidden layer    
model.add(Dense(64, init='uniform'))

# we can think of this chunk as the output layer
model.add(Dense(2, init='uniform'))

# setting up the optimization of our weights 
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd)

# running the fitting
model.fit(X_train, y_train, nb_epoch=20, batch_size=16, show_accuracy=True, validation_split=0.2, verbose = 2)


Just to answer this question in a little more detail, and as Pavel said, Batch Normalization is just another layer, so you can use it as such to create your desired network architecture.

The general use case is to use BN between the linear and non-linear layers in your network, because it normalizes the input to your activation function, so that you’re centered in the linear section of the activation function (such as Sigmoid). There’s a small discussion of it here

In your case above, this might look like:

# import BatchNormalization
from keras.layers.normalization import BatchNormalization

# instantiate model
model = Sequential()

# we can think of this chunk as the input layer
model.add(Dense(64, input_dim=14, init='uniform'))

# we can think of this chunk as the hidden layer    
model.add(Dense(64, init='uniform'))

# we can think of this chunk as the output layer
model.add(Dense(2, init='uniform'))

# setting up the optimization of our weights 
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd)

# running the fitting
model.fit(X_train, y_train, nb_epoch=20, batch_size=16, show_accuracy=True, validation_split=0.2, verbose = 2)

Hope this clarifies things a bit more.

回答 1



This thread is misleading. Tried commenting on Lucas Ramadan’s answer, but I don’t have the right privileges yet, so I’ll just put this here.

Batch normalization works best after the activation function, and here or here is why: it was developed to prevent internal covariate shift. Internal covariate shift occurs when the distribution of the activations of a layer shifts significantly throughout training. Batch normalization is used so that the distribution of the inputs (and these inputs are literally the result of an activation function) to a specific layer doesn’t change over time due to parameter updates from each batch (or at least, allows it to change in an advantageous way). It uses batch statistics to do the normalizing, and then uses the batch normalization parameters (gamma and beta in the original paper) “to make sure that the transformation inserted in the network can represent the identity transform” (quote from original paper). But the point is that we’re trying to normalize the inputs to a layer, so it should always go immediately before the next layer in the network. Whether or not that’s after an activation function is dependent on the architecture in question.

回答 2


尽管没有正确的答案,但批处理规范化的作者说, 应在当前层的非线性之前立即应用它。原因(引自原文)-

“我们通过归一化x = Wu + b来在非线性之前添加BN变换。我们也可以归一化层输入u,但是由于u可能是另一个非线性的输出,因此其分布的形状可能会在训练,并限制其第一和第二时刻并不能消除协变量偏移,相反,Wu + b更有可能具有对称的,非稀疏的分布,即“更呈高斯分布”(Hyvèarinen&Oja,2000) ;规范化它可能会产生具有稳定分布的激活。”

This thread has some considerable debate about whether BN should be applied before non-linearity of current layer or to the activations of the previous layer.

Although there is no correct answer, the authors of Batch Normalization say that It should be applied immediately before the non-linearity of the current layer. The reason ( quoted from original paper) –

“We add the BN transform immediately before the nonlinearity, by normalizing x = Wu+b. We could have also normalized the layer inputs u, but since u is likely the output of another nonlinearity, the shape of its distribution is likely to change during training, and constraining its first and second moments would not eliminate the covariate shift. In contrast, Wu + b is more likely to have a symmetric, non-sparse distribution, that is “more Gaussian” (Hyv¨arinen & Oja, 2000); normalizing it is likely to produce activations with a stable distribution.”

回答 3


model.add(Dense(64, use_bias=False))


model.add(Convolution2D(64, 3, 3, use_bias=False))

Keras now supports the use_bias=False option, so we can save some computation by writing like

model.add(Dense(64, use_bias=False))


model.add(Convolution2D(64, 3, 3, use_bias=False))

回答 4


def Conv2DReluBatchNorm(n_filter, w_filter, h_filter, inputs):
    return BatchNormalization()(Activation(activation='relu')(Convolution2D(n_filter, w_filter, h_filter, border_mode='same')(inputs)))

It’s almost become a trend now to have a Conv2D followed by a ReLu followed by a BatchNormalization layer. So I made up a small function to call all of them at once. Makes the model definition look a whole lot cleaner and easier to read.

def Conv2DReluBatchNorm(n_filter, w_filter, h_filter, inputs):
    return BatchNormalization()(Activation(activation='relu')(Convolution2D(n_filter, w_filter, h_filter, border_mode='same')(inputs)))

回答 5



在此处查看示例:https : //github.com/fchollet/keras/blob/master/examples/kaggle_otto_nn.py

It is another type of layer, so you should add it as a layer in an appropriate place of your model


See an example here: https://github.com/fchollet/keras/blob/master/examples/kaggle_otto_nn.py

回答 6



from keras.layers.normalization import BatchNormalization
model = Sequential()
model.add(Dense(64, input_dim=14, init='uniform'))
model.add(BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None))
model.add(Dense(64, init='uniform'))
model.add(BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None))
model.add(Dense(2, init='uniform'))
model.add(BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None))

sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd)
model.fit(X_train, y_train, nb_epoch=20, batch_size=16, show_accuracy=True, 
validation_split=0.2, verbose = 2)


假设我们已将a [l-1]输入到层l。同样,我们对层l具有权重W [l]和偏置单元b [l]。令a [l]是为层l计算的激活向量(即,在添加了非线性之后),而z [l]是在添加非线性之前的向量

  1. 使用a [l-1]和W [l]我们可以计算层l的z [l]
  2. 通常,在前馈传播中,我们会在此阶段像z [l] + b [l]一样向z [l]添加偏置单元,但是在批归一化中,不需要添加b [l]的步骤,并且不需要使用b [l]参数。
  3. 计算z [l]均值并从每个元素中减去
  4. 使用标准偏差除以(z [l]-平均值)。称为Z_temp [l]
  5. 现在定义新参数γ和β,它们将改变隐藏层的比例,如下所示:

    z_norm [l] =γ.Z_temp[l] +β

在此代码摘录中,Dense()取a [l-1],使用W [l]并计算z [l]。然后立即的BatchNormalization()将执行上述步骤以给出z_norm [l]。然后立即Activation()将计算tanh(z_norm [l])得出a [l],即

a[l] = tanh(z_norm[l])

Batch Normalization is used to normalize the input layer as well as hidden layers by adjusting mean and scaling of the activations. Because of this normalizing effect with additional layer in deep neural networks, the network can use higher learning rate without vanishing or exploding gradients. Furthermore, batch normalization regularizes the network such that it is easier to generalize, and it is thus unnecessary to use dropout to mitigate overfitting.

Right after calculating the linear function using say, the Dense() or Conv2D() in Keras, we use BatchNormalization() which calculates the linear function in a layer and then we add the non-linearity to the layer using Activation().

from keras.layers.normalization import BatchNormalization
model = Sequential()
model.add(Dense(64, input_dim=14, init='uniform'))
model.add(BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None))
model.add(Dense(64, init='uniform'))
model.add(BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None))
model.add(Dense(2, init='uniform'))
model.add(BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None))

sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd)
model.fit(X_train, y_train, nb_epoch=20, batch_size=16, show_accuracy=True, 
validation_split=0.2, verbose = 2)

How is Batch Normalization applied?

Suppose we have input a[l-1] to a layer l. Also we have weights W[l] and bias unit b[l] for the layer l. Let a[l] be the activation vector calculated(i.e. after adding the non-linearity) for the layer l and z[l] be the vector before adding non-linearity

  1. Using a[l-1] and W[l] we can calculate z[l] for the layer l
  2. Usually in feed-forward propagation we will add bias unit to the z[l] at this stage like this z[l]+b[l], but in Batch Normalization this step of addition of b[l] is not required and no b[l] parameter is used.
  3. Calculate z[l] means and subtract it from each element
  4. Divide (z[l] – mean) using standard deviation. Call it Z_temp[l]
  5. Now define new parameters γ and β that will change the scale of the hidden layer as follows:

    z_norm[l] = γ.Z_temp[l] + β

In this code excerpt, the Dense() takes the a[l-1], uses W[l] and calculates z[l]. Then the immediate BatchNormalization() will perform the above steps to give z_norm[l]. And then the immediate Activation() will calculate tanh(z_norm[l]) to give a[l] i.e.

a[l] = tanh(z_norm[l])




model = Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
# (16, 16, 32)
model.add(Convolution2D(nb_filters*2, kernel_size[0], kernel_size[1]))
model.add(Convolution2D(nb_filters*2, kernel_size[0], kernel_size[1]))
# (8, 8, 64) = (2048)
model.add(Dense(2))  # define a binary classification problem

model.fit(x_train, y_train,
          validation_data=(x_test, y_test))


I have trained a binary classification model with CNN, and here is my code

model = Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
# (16, 16, 32)
model.add(Convolution2D(nb_filters*2, kernel_size[0], kernel_size[1]))
model.add(Convolution2D(nb_filters*2, kernel_size[0], kernel_size[1]))
# (8, 8, 64) = (2048)
model.add(Dense(2))  # define a binary classification problem

model.fit(x_train, y_train,
          validation_data=(x_test, y_test))

And here, I wanna get the output of each layer just like TensorFlow, how can I do that?

回答 0

您可以使用以下命令轻松获取任何图层的输出: model.layers[index].output


from keras import backend as K

inp = model.input                                           # input placeholder
outputs = [layer.output for layer in model.layers]          # all layer outputs
functors = [K.function([inp, K.learning_phase()], [out]) for out in outputs]    # evaluation functions

# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = [func([test, 1.]) for func in functors]
print layer_outs



K.function 创建theano / tensorflow张量函数,该函数随后用于从给定输入的符号图中获取输出。

现在K.learning_phase()需要作为输入,因为很多Keras层(如Dropout / Batchnomalization)都依赖于它,以在训练和测试期间更改行为。


from keras import backend as K

inp = model.input                                           # input placeholder
outputs = [layer.output for layer in model.layers]          # all layer outputs
functors = [K.function([inp], [out]) for out in outputs]    # evaluation functions

# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = [func([test]) for func in functors]
print layer_outs


我只是意识到,先前的答案并不是针对每个函数评估进行了优化,因为数据将被传输到CPU-> GPU内存中,并且还需要对低层进行n-n-over的张量计算。


from keras import backend as K

inp = model.input                                           # input placeholder
outputs = [layer.output for layer in model.layers]          # all layer outputs
functor = K.function([inp, K.learning_phase()], outputs )   # evaluation function

# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = functor([test, 1.])
print layer_outs

You can easily get the outputs of any layer by using: model.layers[index].output

For all layers use this:

from keras import backend as K

inp = model.input                                           # input placeholder
outputs = [layer.output for layer in model.layers]          # all layer outputs
functors = [K.function([inp, K.learning_phase()], [out]) for out in outputs]    # evaluation functions

# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = [func([test, 1.]) for func in functors]
print layer_outs

Note: To simulate Dropout use learning_phase as 1. in layer_outs otherwise use 0.

Edit: (based on comments)

K.function creates theano/tensorflow tensor functions which is later used to get the output from the symbolic graph given the input.

Now K.learning_phase() is required as an input as many Keras layers like Dropout/Batchnomalization depend on it to change behavior during training and test time.

So if you remove the dropout layer in your code you can simply use:

from keras import backend as K

inp = model.input                                           # input placeholder
outputs = [layer.output for layer in model.layers]          # all layer outputs
functors = [K.function([inp], [out]) for out in outputs]    # evaluation functions

# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = [func([test]) for func in functors]
print layer_outs

Edit 2: More optimized

I just realized that the previous answer is not that optimized as for each function evaluation the data will be transferred CPU->GPU memory and also the tensor calculations needs to be done for the lower layers over-n-over.

Instead this is a much better way as you don’t need multiple functions but a single function giving you the list of all outputs:

from keras import backend as K

inp = model.input                                           # input placeholder
outputs = [layer.output for layer in model.layers]          # all layer outputs
functor = K.function([inp, K.learning_phase()], outputs )   # evaluation function

# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = functor([test, 1.])
print layer_outs

回答 1



from keras.models import Model

model = ...  # include here your original model

layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
intermediate_output = intermediate_layer_model.predict(data)


from keras import backend as K

# with a Sequential model
get_3rd_layer_output = K.function([model.layers[0].input],
layer_output = get_3rd_layer_output([x])[0]

From https://keras.io/getting-started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer

One simple way is to create a new Model that will output the layers that you are interested in:

from keras.models import Model

model = ...  # include here your original model

layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
intermediate_output = intermediate_layer_model.predict(data)

Alternatively, you can build a Keras function that will return the output of a certain layer given a certain input, for example:

from keras import backend as K

# with a Sequential model
get_3rd_layer_output = K.function([model.layers[0].input],
layer_output = get_3rd_layer_output([x])[0]

回答 2





Based on all the good answers of this thread, I wrote a library to fetch the output of each layer. It abstracts all the complexity and has been designed to be as user-friendly as possible:


It handles almost all the edge cases

Hope it helps!

回答 3




例如,获得形状 model.layers[idx].output.get_shape()

idx 是图层的索引,您可以从中找到它 model.summary()

Following looks very simple to me:


Above is a tensor object, so you can modify it using operations that can be applied to a tensor object.

For example, to get the shape model.layers[idx].output.get_shape()

idx is the index of the layer and you can find it from model.summary()

回答 4


%matplotlib inline
import matplotlib.pyplot as plt
from keras import backend as K

def get_layer_outputs():
    test_image = YOUR IMAGE GOES HERE!!!
    outputs    = [layer.output for layer in model.layers]          # all layer outputs
    comp_graph = [K.function([model.input]+ [K.learning_phase()], [output]) for output in outputs]  # evaluation functions

    # Testing
    layer_outputs_list = [op([test_image, 1.]) for op in comp_graph]
    layer_outputs = []

    for layer_output in layer_outputs_list:
        print(layer_output[0][0].shape, end='\n-------------------\n')

    return layer_outputs

def plot_layer_outputs(layer_number):    
    layer_outputs = get_layer_outputs()

    x_max = layer_outputs[layer_number].shape[0]
    y_max = layer_outputs[layer_number].shape[1]
    n     = layer_outputs[layer_number].shape[2]

    L = []
    for i in range(n):
        L.append(np.zeros((x_max, y_max)))

    for i in range(n):
        for x in range(x_max):
            for y in range(y_max):
                L[i][x][y] = layer_outputs[layer_number][x][y][i]

    for img in L:
        plt.imshow(img, interpolation='nearest')

I wrote this function for myself (in Jupyter) and it was inspired by indraforyou‘s answer. It will plot all the layer outputs automatically. Your images must have a (x, y, 1) shape where 1 stands for 1 channel. You just call plot_layer_outputs(…) to plot.

%matplotlib inline
import matplotlib.pyplot as plt
from keras import backend as K

def get_layer_outputs():
    test_image = YOUR IMAGE GOES HERE!!!
    outputs    = [layer.output for layer in model.layers]          # all layer outputs
    comp_graph = [K.function([model.input]+ [K.learning_phase()], [output]) for output in outputs]  # evaluation functions

    # Testing
    layer_outputs_list = [op([test_image, 1.]) for op in comp_graph]
    layer_outputs = []

    for layer_output in layer_outputs_list:
        print(layer_output[0][0].shape, end='\n-------------------\n')

    return layer_outputs

def plot_layer_outputs(layer_number):    
    layer_outputs = get_layer_outputs()

    x_max = layer_outputs[layer_number].shape[0]
    y_max = layer_outputs[layer_number].shape[1]
    n     = layer_outputs[layer_number].shape[2]

    L = []
    for i in range(n):
        L.append(np.zeros((x_max, y_max)))

    for i in range(n):
        for x in range(x_max):
            for y in range(y_max):
                L[i][x][y] = layer_outputs[layer_number][x][y][i]

    for img in L:
        plt.imshow(img, interpolation='nearest')

回答 5

来自:https : //github.com/philipperemy/keras-visualize-activations/blob/master/read_activations.py

import keras.backend as K

def get_activations(model, model_inputs, print_shape_only=False, layer_name=None):
    print('----- activations -----')
    activations = []
    inp = model.input

    model_multi_inputs_cond = True
    if not isinstance(inp, list):
        # only one input! let's wrap it in a list.
        inp = [inp]
        model_multi_inputs_cond = False

    outputs = [layer.output for layer in model.layers if
               layer.name == layer_name or layer_name is None]  # all layer outputs

    funcs = [K.function(inp + [K.learning_phase()], [out]) for out in outputs]  # evaluation functions

    if model_multi_inputs_cond:
        list_inputs = []
        list_inputs = [model_inputs, 0.]

    # Learning phase. 0 = Test mode (no dropout or batch normalization)
    # layer_outputs = [func([model_inputs, 0.])[0] for func in funcs]
    layer_outputs = [func(list_inputs)[0] for func in funcs]
    for layer_activations in layer_outputs:
        if print_shape_only:
    return activations

From: https://github.com/philipperemy/keras-visualize-activations/blob/master/read_activations.py

import keras.backend as K

def get_activations(model, model_inputs, print_shape_only=False, layer_name=None):
    print('----- activations -----')
    activations = []
    inp = model.input

    model_multi_inputs_cond = True
    if not isinstance(inp, list):
        # only one input! let's wrap it in a list.
        inp = [inp]
        model_multi_inputs_cond = False

    outputs = [layer.output for layer in model.layers if
               layer.name == layer_name or layer_name is None]  # all layer outputs

    funcs = [K.function(inp + [K.learning_phase()], [out]) for out in outputs]  # evaluation functions

    if model_multi_inputs_cond:
        list_inputs = []
        list_inputs = [model_inputs, 0.]

    # Learning phase. 0 = Test mode (no dropout or batch normalization)
    # layer_outputs = [func([model_inputs, 0.])[0] for func in funcs]
    layer_outputs = [func(list_inputs)[0] for func in funcs]
    for layer_activations in layer_outputs:
        if print_shape_only:
    return activations

回答 6

想要将其添加为@indraforyou的答案作为注释(但没有足够高的声望)以纠正@mathtick的注释中提到的问题。为了避免InvalidArgumentError: input_X:Y is both fed and fetched.异常,只需更换行outputs = [layer.output for layer in model.layers]outputs = [layer.output for layer in model.layers][1:],即


from keras import backend as K 
inp = model.input                                           # input placeholder
outputs = [layer.output for layer in model.layers][1:]        # all layer outputs except first (input) layer
functor = K.function([inp, K.learning_phase()], outputs )   # evaluation function

# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = functor([test, 1.])
print layer_outs

PS我尝试的东西,如尝试outputs = [layer.output for layer in model.layers[1:]]不起作用。

Wanted to add this as a comment (but don’t have high enough rep.) to @indraforyou’s answer to correct for the issue mentioned in @mathtick’s comment. To avoid the InvalidArgumentError: input_X:Y is both fed and fetched. exception, simply replace the line outputs = [layer.output for layer in model.layers] with outputs = [layer.output for layer in model.layers][1:], i.e.

adapting indraforyou’s minimal working example:

from keras import backend as K 
inp = model.input                                           # input placeholder
outputs = [layer.output for layer in model.layers][1:]        # all layer outputs except first (input) layer
functor = K.function([inp, K.learning_phase()], outputs )   # evaluation function

# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = functor([test, 1.])
print layer_outs

p.s. my attempts trying things such as outputs = [layer.output for layer in model.layers[1:]] did not work.

回答 7


1- Keras训练有素model

2-输入x为图像或图像集。图像的分辨率应与输入层的尺寸兼容。例如对于3通道(RGB)图像为80 * 80 * 3

3- layer要激活的输出的名称。例如,“ flatten_2”层。这应该包含在layer_names变量中,代表给定层的名称model

4- batch_size是可选参数。


import six
import numpy as np
import keras.backend as k
from numpy import float32
def get_activations(x, model, layer, batch_size=128):
Return the output of the specified layer for input `x`. `layer` is specified by layer index (between 0 and
`nb_layers - 1`) or by name. The number of layers can be determined by counting the results returned by
calling `layer_names`.
:param x: Input for computing the activations.
:type x: `np.ndarray`. Example: x.shape = (80, 80, 3)
:param model: pre-trained Keras model. Including weights.
:type model: keras.engine.sequential.Sequential. Example: model.input_shape = (None, 80, 80, 3)
:param layer: Layer for computing the activations
:type layer: `int` or `str`. Example: layer = 'flatten_2'
:param batch_size: Size of batches.
:type batch_size: `int`
:return: The output of `layer`, where the first dimension is the batch size corresponding to `x`.
:rtype: `np.ndarray`. Example: activations.shape = (1, 2000)

    layer_names = [layer.name for layer in model.layers]
    if isinstance(layer, six.string_types):
        if layer not in layer_names:
            raise ValueError('Layer name %s is not part of the graph.' % layer)
        layer_name = layer
    elif isinstance(layer, int):
        if layer < 0 or layer >= len(layer_names):
            raise ValueError('Layer index %d is outside of range (0 to %d included).'
                             % (layer, len(layer_names) - 1))
        layer_name = layer_names[layer]
        raise TypeError('Layer must be of type `str` or `int`.')

    layer_output = model.get_layer(layer_name).output
    layer_input = model.input
    output_func = k.function([layer_input], [layer_output])

    # Apply preprocessing
    if x.shape == k.int_shape(model.input)[1:]:
        x_preproc = np.expand_dims(x, 0)
        x_preproc = x
    assert len(x_preproc.shape) == 4

    # Determine shape of expected output and prepare array
    output_shape = output_func([x_preproc[0][None, ...]])[0].shape
    activations = np.zeros((x_preproc.shape[0],) + output_shape[1:], dtype=float32)

    # Get activations with batching
    for batch_index in range(int(np.ceil(x_preproc.shape[0] / float(batch_size)))):
        begin, end = batch_index * batch_size, min((batch_index + 1) * batch_size, x_preproc.shape[0])
        activations[begin:end] = output_func([x_preproc[begin:end]])[0]

    return activations

Assuming you have:

1- Keras pre-trained model.

2- Input x as image or set of images. The resolution of image should be compatible with dimension of the input layer. For example 80*80*3 for 3-channels (RGB) image.

3- The name of the output layer to get the activation. For example, “flatten_2” layer. This should be include in the layer_names variable, represents name of layers of the given model.

4- batch_size is an optional argument.

Then you can easily use get_activation function to get the activation of the output layer for a given input x and pre-trained model:

import six
import numpy as np
import keras.backend as k
from numpy import float32
def get_activations(x, model, layer, batch_size=128):
Return the output of the specified layer for input `x`. `layer` is specified by layer index (between 0 and
`nb_layers - 1`) or by name. The number of layers can be determined by counting the results returned by
calling `layer_names`.
:param x: Input for computing the activations.
:type x: `np.ndarray`. Example: x.shape = (80, 80, 3)
:param model: pre-trained Keras model. Including weights.
:type model: keras.engine.sequential.Sequential. Example: model.input_shape = (None, 80, 80, 3)
:param layer: Layer for computing the activations
:type layer: `int` or `str`. Example: layer = 'flatten_2'
:param batch_size: Size of batches.
:type batch_size: `int`
:return: The output of `layer`, where the first dimension is the batch size corresponding to `x`.
:rtype: `np.ndarray`. Example: activations.shape = (1, 2000)

    layer_names = [layer.name for layer in model.layers]
    if isinstance(layer, six.string_types):
        if layer not in layer_names:
            raise ValueError('Layer name %s is not part of the graph.' % layer)
        layer_name = layer
    elif isinstance(layer, int):
        if layer < 0 or layer >= len(layer_names):
            raise ValueError('Layer index %d is outside of range (0 to %d included).'
                             % (layer, len(layer_names) - 1))
        layer_name = layer_names[layer]
        raise TypeError('Layer must be of type `str` or `int`.')

    layer_output = model.get_layer(layer_name).output
    layer_input = model.input
    output_func = k.function([layer_input], [layer_output])

    # Apply preprocessing
    if x.shape == k.int_shape(model.input)[1:]:
        x_preproc = np.expand_dims(x, 0)
        x_preproc = x
    assert len(x_preproc.shape) == 4

    # Determine shape of expected output and prepare array
    output_shape = output_func([x_preproc[0][None, ...]])[0].shape
    activations = np.zeros((x_preproc.shape[0],) + output_shape[1:], dtype=float32)

    # Get activations with batching
    for batch_index in range(int(np.ceil(x_preproc.shape[0] / float(batch_size)))):
        begin, end = batch_index * batch_size, min((batch_index + 1) * batch_size, x_preproc.shape[0])
        activations[begin:end] = output_func([x_preproc[begin:end]])[0]

    return activations

回答 8


  • 错误: InvalidArgumentError: input_X:Y is both fed and fetched
  • 多输入的情况


  • outputs变量中的输入层添加过滤器
  • 最小functors循环变化


from keras.engine.input_layer import InputLayer
inp = model.input
outputs = [layer.output for layer in model.layers if not isinstance(layer, InputLayer)]
functors = [K.function(inp + [K.learning_phase()], [x]) for x in outputs]
layer_outputs = [fun([x1, x2, xn, 1]) for fun in functors]

In case you have one of the following cases:

  • error: InvalidArgumentError: input_X:Y is both fed and fetched
  • case of multiple inputs

You need to do the following changes:

  • add filter out for input layers in outputs variable
  • minnor change on functors loop

Minimum example:

from keras.engine.input_layer import InputLayer
inp = model.input
outputs = [layer.output for layer in model.layers if not isinstance(layer, InputLayer)]
functors = [K.function(inp + [K.learning_phase()], [x]) for x in outputs]
layer_outputs = [fun([x1, x2, xn, 1]) for fun in functors]

回答 9



Well, other answers are very complete, but there is a very basic way to “see”, not to “get” the shapes.

Just do a model.summary(). It will print all layers and their output shapes. “None” values will indicate variable dimensions, and the first dimension will be the batch size.



AutoKera:一个基于KERS的AutoML系统。它是由DATA Lab在德克萨斯农工大学。AutoKera的目标是让每个人都可以使用机器学习


  • 一个简短的例子
import autokeras as ak

clf = ak.ImageClassifier()
clf.fit(x_train, y_train)
results = clf.predict(x_test)



pip3 install autokeras

请按照installation guide有关更多详细信息,请参阅





电子邮件:订阅我们的email list接收通知的步骤


GitHub讨论:请在我们的GitHub Discussions这是一个在GitHub上托管的论坛。我们将在那里监控并回答问题


松弛Request an invitation使用#autokeras通信通道


在线会议:加入online meeting Google group日历事件将出现在您的Google日历上



  • 对问题进行分类例如,我们从中挑选要解决的关键问题GitHub issues它们将被添加到此Project其中一些问题随后将添加到milestones,用于计划发布
  • 分配任务:我们在网上会议期间将任务分配给人们
  • 讨论:我们可以在多个地方进行讨论。代码审查在GitHub上。问题可以在Slake或在会议期间提问

请加入我们的Slack给金海峰发个口信。或顺道拜访我们的online meetings然后跟我们谈谈。我们将帮助您入门!

请参阅我们的Contributing Guide学习最佳实践



我们接受财政上的支持Open Collective感谢每一位赞助商对我们的支持!


金海峰、宋清泉、夏虎。“Auto-keras:一种高效的神经结构搜索系统。”第25届ACM SIGKDD知识发现与数据挖掘国际会议论文集。ACM,2019年。(Download)


  title={Auto-Keras: An Efficient Neural Architecture Search System},
  author={Jin, Haifeng and Song, Qingquan and Hu, Xia},
  booktitle={Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},

