标签归档:tensorflow

如何在TensorFlow中应用梯度裁剪?

问题:如何在TensorFlow中应用梯度裁剪?

考虑示例代码

我想知道如何在RNN上的该网络上应用梯度剪切,而梯度可能会爆炸。

tf.clip_by_value(t, clip_value_min, clip_value_max, name=None)

这是一个可以使用的示例,但是在哪里介绍呢?在RNN中

    lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
    # Split data because rnn cell needs a list of inputs for the RNN inner loop
    _X = tf.split(0, n_steps, _X) # n_steps
tf.clip_by_value(_X, -1, 1, name=None)

但这没有意义,因为张量_X是输入,而不是grad,要裁剪的内容是什么?

我是否需要为此定义自己的优化器,还是有一个更简单的选择?

Considering the example code.

I would like to know How to apply gradient clipping on this network on the RNN where there is a possibility of exploding gradients.

tf.clip_by_value(t, clip_value_min, clip_value_max, name=None)

This is an example that could be used but where do I introduce this ? In the def of RNN

    lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
    # Split data because rnn cell needs a list of inputs for the RNN inner loop
    _X = tf.split(0, n_steps, _X) # n_steps
tf.clip_by_value(_X, -1, 1, name=None)

But this doesn’t make sense as the tensor _X is the input and not the grad what is to be clipped?

Do I have to define my own Optimizer for this or is there a simpler option?


回答 0

在计算梯度之后,但在应用梯度更新模型参数之前,需要进行梯度修剪。在您的示例中,这两种AdamOptimizer.minimize()方法均由该方法处理。

为了裁剪您的渐变,您需要按照TensorFlow API文档本节中的说明显式计算,裁剪和应用它们。具体来说,您需要minimize()用以下类似的方法代替对方法的调用:

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
gvs = optimizer.compute_gradients(cost)
capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
train_op = optimizer.apply_gradients(capped_gvs)

Gradient clipping needs to happen after computing the gradients, but before applying them to update the model’s parameters. In your example, both of those things are handled by the AdamOptimizer.minimize() method.

In order to clip your gradients you’ll need to explicitly compute, clip, and apply them as described in this section in TensorFlow’s API documentation. Specifically you’ll need to substitute the call to the minimize() method with something like the following:

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
gvs = optimizer.compute_gradients(cost)
capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
train_op = optimizer.apply_gradients(capped_gvs)

回答 1

尽管看起来很流行,但您可能希望通过其全局范数来裁剪整个渐变:

optimizer = tf.train.AdamOptimizer(1e-3)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
optimize = optimizer.apply_gradients(zip(gradients, variables))

分别裁剪每个渐变矩阵会更改其相对比例,但是也可以:

optimizer = tf.train.AdamOptimizer(1e-3)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients = [
    None if gradient is None else tf.clip_by_norm(gradient, 5.0)
    for gradient in gradients]
optimize = optimizer.apply_gradients(zip(gradients, variables))

在TensorFlow 2中,磁带计算梯度,优化器来自Keras,我们不需要存储更新操作,因为它会自动运行而不将其传递给会话:

optimizer = tf.keras.optimizers.Adam(1e-3)
# ...
with tf.GradientTape() as tape:
  loss = ...
variables = ...
gradients = tape.gradient(loss, variables)
gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
optimizer.apply_gradients(zip(gradients, variables))

Despite what seems to be popular, you probably want to clip the whole gradient by its global norm:

optimizer = tf.train.AdamOptimizer(1e-3)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
optimize = optimizer.apply_gradients(zip(gradients, variables))

Clipping each gradient matrix individually changes their relative scale but is also possible:

optimizer = tf.train.AdamOptimizer(1e-3)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients = [
    None if gradient is None else tf.clip_by_norm(gradient, 5.0)
    for gradient in gradients]
optimize = optimizer.apply_gradients(zip(gradients, variables))

In TensorFlow 2, a tape computes the gradients, the optimizers come from Keras, and we don’t need to store the update op because it runs automatically without passing it to a session:

optimizer = tf.keras.optimizers.Adam(1e-3)
# ...
with tf.GradientTape() as tape:
  loss = ...
variables = ...
gradients = tape.gradient(loss, variables)
gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
optimizer.apply_gradients(zip(gradients, variables))

回答 2

实际上在文档中对此做了正确解释。

调用minimum()既要计算梯度,又要将其应用于变量。如果要在应用渐变之前对其进行处理,则可以分三步使用优化器:

  • 使用compute_gradients()计算梯度。
  • 根据需要处理渐变。
  • 使用apply_gradients()应用处理后的渐变。

在他们提供的示例中,他们使用以下3个步骤:

# Create an optimizer.
opt = GradientDescentOptimizer(learning_rate=0.1)

# Compute the gradients for a list of variables.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)

# grads_and_vars is a list of tuples (gradient, variable).  Do whatever you
# need to the 'gradient' part, for example cap them, etc.
capped_grads_and_vars = [(MyCapper(gv[0]), gv[1]) for gv in grads_and_vars]

# Ask the optimizer to apply the capped gradients.
opt.apply_gradients(capped_grads_and_vars)

MyCapper是限制渐变的任何函数。有用的功能列表(除外tf.clip_by_value())在此处

This is actually properly explained in the documentation.:

Calling minimize() takes care of both computing the gradients and applying them to the variables. If you want to process the gradients before applying them you can instead use the optimizer in three steps:

  • Compute the gradients with compute_gradients().
  • Process the gradients as you wish.
  • Apply the processed gradients with apply_gradients().

And in the example they provide they use these 3 steps:

# Create an optimizer.
opt = GradientDescentOptimizer(learning_rate=0.1)

# Compute the gradients for a list of variables.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)

# grads_and_vars is a list of tuples (gradient, variable).  Do whatever you
# need to the 'gradient' part, for example cap them, etc.
capped_grads_and_vars = [(MyCapper(gv[0]), gv[1]) for gv in grads_and_vars]

# Ask the optimizer to apply the capped gradients.
opt.apply_gradients(capped_grads_and_vars)

Here MyCapper is any function that caps your gradient. The list of useful functions (other than tf.clip_by_value()) is here.


回答 3

对于那些想了解梯度裁剪的想法(按规范)的人:

每当梯度范数大于特定阈值时,我们都会修剪梯度范数,以使其保持在阈值之内。此阈值有时设置为5

令梯度为g,max_norm_threshold为j

现在,如果|| g || > j,我们这样做:

g =( j * g)/ || G ||

这是在 tf.clip_by_norm

For those who would like to understand the idea of gradient clipping (by norm):

Whenever the gradient norm is greater than a particular threshold, we clip the gradient norm so that it stays within the threshold. This threshold is sometimes set to 5.

Let the gradient be g and the max_norm_threshold be j.

Now, if ||g|| > j , we do:

g = ( j * g ) / ||g||

This is the implementation done in tf.clip_by_norm


回答 4

IMO最好的解决方案是用TF的估算器装饰器包装优化器tf.contrib.estimator.clip_gradients_by_norm

original_optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
optimizer = tf.contrib.estimator.clip_gradients_by_norm(original_optimizer, clip_norm=5.0)
train_op = optimizer.minimize(loss)

这样,您只需要定义一次,而不必在每次梯度计算后运行它。

文档:https : //www.tensorflow.org/api_docs/python/tf/contrib/estimator/clip_gradients_by_norm

IMO the best solution is wrapping your optimizer with TF’s estimator decorator tf.contrib.estimator.clip_gradients_by_norm:

original_optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
optimizer = tf.contrib.estimator.clip_gradients_by_norm(original_optimizer, clip_norm=5.0)
train_op = optimizer.minimize(loss)

This way you only have to define this once, and not run it after every gradients calculation.

Documentation: https://www.tensorflow.org/api_docs/python/tf/contrib/estimator/clip_gradients_by_norm


回答 5

梯度修剪基本上可以在梯度爆炸或消失的情况下起到帮助作用。说您的损失太高,将会导致指数梯度流经网络,可能导致Nan值。为了克服这个问题,我们将梯度裁剪在特定范围内(-1到1或根据条件的任何范围)。

clipped_value=tf.clip_by_value(grad, -range, +range), var) for grad, var in grads_and_vars

其中grads _and_vars是渐变对(您可以通过tf.compute_gradients计算)及其变量。

裁剪后,我们只需使用优化器即可应用其值。 optimizer.apply_gradients(clipped_value)

Gradient Clipping basically helps in case of exploding or vanishing gradients.Say your loss is too high which will result in exponential gradients to flow through the network which may result in Nan values . To overcome this we clip gradients within a specific range (-1 to 1 or any range as per condition) .

clipped_value=tf.clip_by_value(grad, -range, +range), var) for grad, var in grads_and_vars

where grads _and_vars are the pairs of gradients (which you calculate via tf.compute_gradients) and their variables they will be applied to.

After clipping we simply apply its value using an optimizer. optimizer.apply_gradients(clipped_value)


在TensorFlow中使用预训练的单词嵌入(word2vec或Glove)

问题:在TensorFlow中使用预训练的单词嵌入(word2vec或Glove)

我最近审查了卷积文本分类的一个有趣的实现。但是我检查过的所有TensorFlow代码都使用如下的随机(未经预训练)嵌入向量:

with tf.device('/cpu:0'), tf.name_scope("embedding"):
    W = tf.Variable(
        tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),
        name="W")
    self.embedded_chars = tf.nn.embedding_lookup(W, self.input_x)
    self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)

有谁知道如何使用Word2vec或GloVe预训练词嵌入的结果,而不是随机的结果?

I’ve recently reviewed an interesting implementation for convolutional text classification. However all TensorFlow code I’ve reviewed uses a random (not pre-trained) embedding vectors like the following:

with tf.device('/cpu:0'), tf.name_scope("embedding"):
    W = tf.Variable(
        tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),
        name="W")
    self.embedded_chars = tf.nn.embedding_lookup(W, self.input_x)
    self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)

Does anybody know how to use the results of Word2vec or a GloVe pre-trained word embedding instead of a random one?


回答 0

您可以通过多种方式在TensorFlow中使用预训练的嵌入。假设你有一个与NumPy阵列称为嵌入embedding,用vocab_size行和embedding_dim列,要创建一个张量W,可以在一个呼叫中使用tf.nn.embedding_lookup()

  1. 只需创建W一个tf.constant()是需要embedding为它的价值:

    W = tf.constant(embedding, name="W")

    这是最简单的方法,但是由于a的值tf.constant()多次存储在内存中,因此内存使用效率不高。由于embedding可能很大,因此您仅应将此方法用于玩具示例。

  2. 创建W为a tf.Variable并通过NumPy数组对其进行初始化tf.placeholder()

    W = tf.Variable(tf.constant(0.0, shape=[vocab_size, embedding_dim]),
                    trainable=False, name="W")
    
    embedding_placeholder = tf.placeholder(tf.float32, [vocab_size, embedding_dim])
    embedding_init = W.assign(embedding_placeholder)
    
    # ...
    sess = tf.Session()
    
    sess.run(embedding_init, feed_dict={embedding_placeholder: embedding})
    

    这样可以避免embedding在图表中存储的副本,但确实需要足够的内存才能一次在内存中保留矩阵的两个副本(一个用于NumPy数组,一个用于tf.Variable)。请注意,我假设您想在训练期间保持嵌入矩阵不变,因此W是使用创建的trainable=False

  3. 如果将嵌入训练为另一个TensorFlow模型的一部分,则可以使用tf.train.Saver从其他模型的检查点文件加载值。这意味着嵌入矩阵可以完全绕过Python。W按照选项2 创建,然后执行以下操作:

    W = tf.Variable(...)
    
    embedding_saver = tf.train.Saver({"name_of_variable_in_other_model": W})
    
    # ...
    sess = tf.Session()
    embedding_saver.restore(sess, "checkpoint_filename.ckpt")
    

There are a few ways that you can use a pre-trained embedding in TensorFlow. Let’s say that you have the embedding in a NumPy array called embedding, with vocab_size rows and embedding_dim columns and you want to create a tensor W that can be used in a call to tf.nn.embedding_lookup().

  1. Simply create W as a tf.constant() that takes embedding as its value:

    W = tf.constant(embedding, name="W")
    

    This is the easiest approach, but it is not memory efficient because the value of a tf.constant() is stored multiple times in memory. Since embedding can be very large, you should only use this approach for toy examples.

  2. Create W as a tf.Variable and initialize it from the NumPy array via a tf.placeholder():

    W = tf.Variable(tf.constant(0.0, shape=[vocab_size, embedding_dim]),
                    trainable=False, name="W")
    
    embedding_placeholder = tf.placeholder(tf.float32, [vocab_size, embedding_dim])
    embedding_init = W.assign(embedding_placeholder)
    
    # ...
    sess = tf.Session()
    
    sess.run(embedding_init, feed_dict={embedding_placeholder: embedding})
    

    This avoid storing a copy of embedding in the graph, but it does require enough memory to keep two copies of the matrix in memory at once (one for the NumPy array, and one for the tf.Variable). Note that I’ve assumed that you want to hold the embedding matrix constant during training, so W is created with trainable=False.

  3. If the embedding was trained as part of another TensorFlow model, you can use a tf.train.Saver to load the value from the other model’s checkpoint file. This means that the embedding matrix can bypass Python altogether. Create W as in option 2, then do the following:

    W = tf.Variable(...)
    
    embedding_saver = tf.train.Saver({"name_of_variable_in_other_model": W})
    
    # ...
    sess = tf.Session()
    embedding_saver.restore(sess, "checkpoint_filename.ckpt")
    

回答 1

我使用这种方法来加载和共享嵌入。

W = tf.get_variable(name="W", shape=embedding.shape, initializer=tf.constant_initializer(embedding), trainable=False)

I use this method to load and share embedding.

W = tf.get_variable(name="W", shape=embedding.shape, initializer=tf.constant_initializer(embedding), trainable=False)

回答 2

@mrry的答案不正确,因为它会导致覆盖每个运行网络的嵌入权重,因此,如果您采用小批量方法来训练网络,则将覆盖嵌入权重。因此,以我的观点,预训练嵌入的正确方法是:

embeddings = tf.get_variable("embeddings", shape=[dim1, dim2], initializer=tf.constant_initializer(np.array(embeddings_matrix))

The answer of @mrry is not right because it provoques the overwriting of the embeddings weights each the network is run, so if you are following a minibatch approach to train your network, you are overwriting the weights of the embeddings. So, on my point of view the right way to pre-trained embeddings is:

embeddings = tf.get_variable("embeddings", shape=[dim1, dim2], initializer=tf.constant_initializer(np.array(embeddings_matrix))

回答 3

2.0兼容答案:有很多预训练的嵌入,这些嵌入是由Google开发的,并且是开源的。

其中一些是Universal Sentence Encoder (USE), ELMO, BERT,等等。在代码中重用它们非常容易。

重用代码Pre-Trained EmbeddingUniversal Sentence Encoder如下所示:

  !pip install "tensorflow_hub>=0.6.0"
  !pip install "tensorflow>=2.0.0"

  import tensorflow as tf
  import tensorflow_hub as hub

  module_url = "https://tfhub.dev/google/universal-sentence-encoder/4"
  embed = hub.KerasLayer(module_url)
  embeddings = embed(["A long sentence.", "single-word",
                      "http://example.com"])
  print(embeddings.shape)  #(3,128)

有关更多信息,请参见TF Hub Link,它是Google开发和开放源代码的预培训嵌入。

2.0 Compatible Answer: There are many Pre-Trained Embeddings, which are developed by Google and which have been Open Sourced.

Some of them are Universal Sentence Encoder (USE), ELMO, BERT, etc.. and it is very easy to reuse them in your code.

Code to reuse the Pre-Trained Embedding, Universal Sentence Encoder is shown below:

  !pip install "tensorflow_hub>=0.6.0"
  !pip install "tensorflow>=2.0.0"

  import tensorflow as tf
  import tensorflow_hub as hub

  module_url = "https://tfhub.dev/google/universal-sentence-encoder/4"
  embed = hub.KerasLayer(module_url)
  embeddings = embed(["A long sentence.", "single-word",
                      "http://example.com"])
  print(embeddings.shape)  #(3,128)

For more information the Pre-Trained Embeddings developed and open-sourced by Google, refer TF Hub Link.


回答 4

在Tensorflow版本2中,如果您使用Embedding层,则非常简单

X=tf.keras.layers.Embedding(input_dim=vocab_size,
                            output_dim=300,
                            input_length=Length_of_input_sequences,
                            embeddings_initializer=matrix_of_pretrained_weights
                            )(ur_inp)

With tensorflow version 2 its quite easy if you use the Embedding layer

X=tf.keras.layers.Embedding(input_dim=vocab_size,
                            output_dim=300,
                            input_length=Length_of_input_sequences,
                            embeddings_initializer=matrix_of_pretrained_weights
                            )(ur_inp)


回答 5

我也遇到嵌入问题,所以我写了有关数据集的详细教程。在这里我想补充一下我尝试过的方法也可以尝试这种方法,

import tensorflow as tf

tf.reset_default_graph()

input_x=tf.placeholder(tf.int32,shape=[None,None])

#you have to edit shape according to your embedding size


Word_embedding = tf.get_variable(name="W", shape=[400000,100], initializer=tf.constant_initializer(np.array(word_embedding)), trainable=False)
embedding_loopup= tf.nn.embedding_lookup(Word_embedding,input_x)

with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for ii in final_:
            print(sess.run(embedding_loopup,feed_dict={input_x:[ii]}))

如果您想从头开始理解,请看这里工作详细的Tutorial Ipython示例

I was also facing embedding issue, So i wrote detailed tutorial with dataset. Here I would like to add what I tried You can also try this method,

import tensorflow as tf

tf.reset_default_graph()

input_x=tf.placeholder(tf.int32,shape=[None,None])

#you have to edit shape according to your embedding size


Word_embedding = tf.get_variable(name="W", shape=[400000,100], initializer=tf.constant_initializer(np.array(word_embedding)), trainable=False)
embedding_loopup= tf.nn.embedding_lookup(Word_embedding,input_x)

with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for ii in final_:
            print(sess.run(embedding_loopup,feed_dict={input_x:[ii]}))

Here is working detailed Tutorial Ipython example if you want to understand from scratch , take a look .


加载经过训练的Keras模型并继续训练

问题:加载经过训练的Keras模型并继续训练

我想知道是否有可能保存经过部分训练的Keras模型并在再次加载模型后继续进行训练。

这样做的原因是,将来我将拥有更多的训练数据,并且我不想再次对整个模型进行训练。

我正在使用的功能是:

#Partly train model
model.fit(first_training, first_classes, batch_size=32, nb_epoch=20)

#Save partly trained model
model.save('partly_trained.h5')

#Load partly trained model
from keras.models import load_model
model = load_model('partly_trained.h5')

#Continue training
model.fit(second_training, second_classes, batch_size=32, nb_epoch=20)

编辑1:添加了完全正常的示例

对于第10个时期后的第一个数据集,最后一个时期的损失将为0.0748,精度为0.9863。

保存,删除和重新加载模型后,第二个数据集上训练的模型的损失和准确性分别为0.1711和0.9504。

这是由新的训练数据还是完全重新训练的模型引起的?

"""
Model by: http://machinelearningmastery.com/
"""
# load (downloaded if needed) the MNIST dataset
import numpy
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
from keras.models import load_model
numpy.random.seed(7)

def baseline_model():
    model = Sequential()
    model.add(Dense(num_pixels, input_dim=num_pixels, init='normal', activation='relu'))
    model.add(Dense(num_classes, init='normal', activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

if __name__ == '__main__':
    # load data
    (X_train, y_train), (X_test, y_test) = mnist.load_data()

    # flatten 28*28 images to a 784 vector for each image
    num_pixels = X_train.shape[1] * X_train.shape[2]
    X_train = X_train.reshape(X_train.shape[0], num_pixels).astype('float32')
    X_test = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')
    # normalize inputs from 0-255 to 0-1
    X_train = X_train / 255
    X_test = X_test / 255
    # one hot encode outputs
    y_train = np_utils.to_categorical(y_train)
    y_test = np_utils.to_categorical(y_test)
    num_classes = y_test.shape[1]

    # build the model
    model = baseline_model()

    #Partly train model
    dataset1_x = X_train[:3000]
    dataset1_y = y_train[:3000]
    model.fit(dataset1_x, dataset1_y, nb_epoch=10, batch_size=200, verbose=2)

    # Final evaluation of the model
    scores = model.evaluate(X_test, y_test, verbose=0)
    print("Baseline Error: %.2f%%" % (100-scores[1]*100))

    #Save partly trained model
    model.save('partly_trained.h5')
    del model

    #Reload model
    model = load_model('partly_trained.h5')

    #Continue training
    dataset2_x = X_train[3000:]
    dataset2_y = y_train[3000:]
    model.fit(dataset2_x, dataset2_y, nb_epoch=10, batch_size=200, verbose=2)
    scores = model.evaluate(X_test, y_test, verbose=0)
    print("Baseline Error: %.2f%%" % (100-scores[1]*100))

I was wondering if it was possible to save a partly trained Keras model and continue the training after loading the model again.

The reason for this is that I will have more training data in the future and I do not want to retrain the whole model again.

The functions which I am using are:

#Partly train model
model.fit(first_training, first_classes, batch_size=32, nb_epoch=20)

#Save partly trained model
model.save('partly_trained.h5')

#Load partly trained model
from keras.models import load_model
model = load_model('partly_trained.h5')

#Continue training
model.fit(second_training, second_classes, batch_size=32, nb_epoch=20)

Edit 1: added fully working example

With the first dataset after 10 epochs the loss of the last epoch will be 0.0748 and the accuracy 0.9863.

After saving, deleting and reloading the model the loss and accuracy of the model trained on the second dataset will be 0.1711 and 0.9504 respectively.

Is this caused by the new training data or by a completely re-trained model?

"""
Model by: http://machinelearningmastery.com/
"""
# load (downloaded if needed) the MNIST dataset
import numpy
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
from keras.models import load_model
numpy.random.seed(7)

def baseline_model():
    model = Sequential()
    model.add(Dense(num_pixels, input_dim=num_pixels, init='normal', activation='relu'))
    model.add(Dense(num_classes, init='normal', activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

if __name__ == '__main__':
    # load data
    (X_train, y_train), (X_test, y_test) = mnist.load_data()

    # flatten 28*28 images to a 784 vector for each image
    num_pixels = X_train.shape[1] * X_train.shape[2]
    X_train = X_train.reshape(X_train.shape[0], num_pixels).astype('float32')
    X_test = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')
    # normalize inputs from 0-255 to 0-1
    X_train = X_train / 255
    X_test = X_test / 255
    # one hot encode outputs
    y_train = np_utils.to_categorical(y_train)
    y_test = np_utils.to_categorical(y_test)
    num_classes = y_test.shape[1]

    # build the model
    model = baseline_model()

    #Partly train model
    dataset1_x = X_train[:3000]
    dataset1_y = y_train[:3000]
    model.fit(dataset1_x, dataset1_y, nb_epoch=10, batch_size=200, verbose=2)

    # Final evaluation of the model
    scores = model.evaluate(X_test, y_test, verbose=0)
    print("Baseline Error: %.2f%%" % (100-scores[1]*100))

    #Save partly trained model
    model.save('partly_trained.h5')
    del model

    #Reload model
    model = load_model('partly_trained.h5')

    #Continue training
    dataset2_x = X_train[3000:]
    dataset2_y = y_train[3000:]
    model.fit(dataset2_x, dataset2_y, nb_epoch=10, batch_size=200, verbose=2)
    scores = model.evaluate(X_test, y_test, verbose=0)
    print("Baseline Error: %.2f%%" % (100-scores[1]*100))

回答 0

实际上- model.save根据您的情况保存重新开始培训所需的所有信息。重新加载模型可能会破坏的唯一事情是优化器状态。要进行检查-尝试save重新加载模型并根据训练数据进行训练。

Actually – model.save saves all information need for restarting training in your case. The only thing which could be spoiled by reloading model is your optimizer state. To check that – try to save and reload model and train it on training data.


回答 1

问题可能是您使用了不同的优化器-或优化器使用了不同的参数。我只是在使用自定义预训练模型时遇到了相同的问题

reduce_lr = ReduceLROnPlateau(monitor='loss', factor=lr_reduction_factor,
                              patience=patience, min_lr=min_lr, verbose=1)

对于预训练模型,其中原始学习率从0.0003开始,在预训练过程中,原始学习率降低为min_learning率,即0.000003

我只是将该行复制到使用预训练模型的脚本中,并且准确性很差。直到我注意到预训练模型的最后学习率是最小学习率,即0.000003。如果我以该学习率开始,那么我得到的精确度与预训练模型的输出完全相同-这是有道理的,因为它的学习率是预训练模型中最后一次使用的学习率的100倍该模型将导致GD严重超调,从而导致精度大大降低。

The problem might be that you use a different optimizer – or different arguments to your optimizer. I just had the same issue with a custom pretrained model, using

reduce_lr = ReduceLROnPlateau(monitor='loss', factor=lr_reduction_factor,
                              patience=patience, min_lr=min_lr, verbose=1)

for the pretrained model, whereby the original learning rate starts at 0.0003 and during pre-training it is reduced to the min_learning rate, which is 0.000003

I just copied that line over to the script which uses the pre-trained model and got really bad accuracies. Until I noticed that the last learning rate of the pretrained model was the min learning rate, i.e. 0.000003. And if I start with that learning rate, I get exactly the same accuracies to start with as the output of the pretrained model – which makes sense, as starting with a learning rate that is 100 times bigger than the last learning rate used in the pretrained model will result in a huge overshoot of GD and hence in heavily decreased accuracies.


回答 2

以上大多数答案都涵盖了重点。如果您正在使用最新的Tensorflow(TF2.1或更高版本),则以下示例将为您提供帮助。该代码的模型部分来自Tensorflow网站。

import tensorflow as tf
from tensorflow import keras
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

def create_model():
  model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(512, activation=tf.nn.relu),  
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
    ])

  model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',metrics=['accuracy'])
  return model

# Create a basic model instance
model=create_model()
model.fit(x_train, y_train, epochs = 10, validation_data = (x_test,y_test),verbose=1)

请以* .tf格式保存模型。根据我的经验,如果您定义了任何custom_loss,*。h5格式将不会保存优化器状态,​​因此如果您要从我们离开的地方重新训练模型,将无法达到您的目的。

# saving the model in tensorflow format
model.save('./MyModel_tf',save_format='tf')


# loading the saved model
loaded_model = tf.keras.models.load_model('./MyModel_tf')

# retraining the model
loaded_model.fit(x_train, y_train, epochs = 10, validation_data = (x_test,y_test),verbose=1)

这种方法将在保存模型之前重新开始训练。正如其他人所提到的,如果你想保存最好的模型的重量或要保存模型的加权每次你需要使用keras回调函数(ModelCheckpoint)的选项,如时代save_weights_only=Truesave_freq='epoch'save_best_only

有关更多详细信息,请在此处检查在此处查看另一个示例。

Most of the above answers covered important points. If you are using recent Tensorflow (TF2.1 or above), Then the following example will help you. The model part of the code is from Tensorflow website.

import tensorflow as tf
from tensorflow import keras
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

def create_model():
  model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(512, activation=tf.nn.relu),  
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
    ])

  model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',metrics=['accuracy'])
  return model

# Create a basic model instance
model=create_model()
model.fit(x_train, y_train, epochs = 10, validation_data = (x_test,y_test),verbose=1)

Please save the model in *.tf format. From my experience, if you have any custom_loss defined, *.h5 format will not save optimizer status and hence will not serve your purpose if you want to retrain the model from where we left.

# saving the model in tensorflow format
model.save('./MyModel_tf',save_format='tf')


# loading the saved model
loaded_model = tf.keras.models.load_model('./MyModel_tf')

# retraining the model
loaded_model.fit(x_train, y_train, epochs = 10, validation_data = (x_test,y_test),verbose=1)

This approach will restart the training where we left before saving the model. As mentioned by others, if you want to save weights of best model or you want to save weights of model every epoch you need to use keras callbacks function (ModelCheckpoint) with options such as save_weights_only=True, save_freq='epoch', and save_best_only.

For more details, please check here and another example here.


回答 3

注意,Keras有时在加载的模型上有问题,如此处所示。这可能会解释一些情况,其中您并非从相同的训练准确性开始。

Notice that Keras sometimes has issues with loaded models, as in here. This might explain cases in which you don’t start from the same trained accuracy.


回答 4

所有上述帮助,保存模型和权重后,您必须从与LR相同的学习rate()中恢复。直接在优化器上进行设置。

请注意,由于模型可能已达到局部最小值(可能是全局最小值),因此无法保证从那里得到改善。除非您打算以受控方式提高学习率并将模型推向不远处的可能更好的最小值,否则没有必要恢复模型以搜索另一个局部最小值。

All above helps, you must resume from same learning rate() as the LR when the model and weights were saved. Set it directly on the optimizer.

Note that improvement from there is not guaranteed, because the model may have reached the local minimum, which may be global. There is no point to resume a model in order to search for another local minimum, unless you intent to increase the learning rate in a controlled fashion and nudge the model into a possibly better minimum not far away.


回答 5

您可能还遇到了概念漂移问题,请参阅在有新观测值时是否应重新训练模型。还有一些学术论文讨论的灾难性遗忘的概念。这是与MNIST一起进行的灾难性遗忘的实证研究

You might also be hitting Concept Drift, see Should you retrain a model when new observations are available. There’s also the concept of catastrophic forgetting which a bunch of academic papers discuss. Here’s one with MNIST Empirical investigation of catastrophic forgetting


如何在TensorFlow中添加正则化?

问题:如何在TensorFlow中添加正则化?

我在使用TensorFlow实现的许多可用神经网络代码中发现,正则化项通常是通过在损耗值上手动添加一个附加项来实现的。

我的问题是:

  1. 是否有比手动进行更优雅或推荐的正规化方法?

  2. 我也发现get_variable有一个论点regularizer。应该如何使用?根据我的观察,如果我们向其传递正则化器(例如tf.contrib.layers.l2_regularizer,将计算表示正则化项的张量并将其添加到名为的图集合中tf.GraphKeys.REGULARIZATOIN_LOSSES,该集合是否会被TensorFlow自动使用(例如,训练时由优化器使用)?期望我自己使用该收藏集吗?

I found in many available neural network code implemented using TensorFlow that regularization terms are often implemented by manually adding an additional term to loss value.

My questions are:

  1. Is there a more elegant or recommended way of regularization than doing it manually?

  2. I also find that get_variable has an argument regularizer. How should it be used? According to my observation, if we pass a regularizer to it (such as tf.contrib.layers.l2_regularizer, a tensor representing regularized term will be computed and added to a graph collection named tf.GraphKeys.REGULARIZATOIN_LOSSES. Will that collection be automatically used by TensorFlow (e.g. used by optimizers when training)? Or is it expected that I should use that collection by myself?


回答 0

如第二点regularizer所述,建议使用参数。您可以在中使用它get_variable,也可以在其中设置一次,variable_scope并对所有变量进行正则化。

损失收集在图中,您需要像这样将它们手动添加到成本函数中。

  reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
  reg_constant = 0.01  # Choose an appropriate one.
  loss = my_normal_loss + reg_constant * sum(reg_losses)

希望有帮助!

As you say in the second point, using the regularizer argument is the recommended way. You can use it in get_variable, or set it once in your variable_scope and have all your variables regularized.

The losses are collected in the graph, and you need to manually add them to your cost function like this.

  reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
  reg_constant = 0.01  # Choose an appropriate one.
  loss = my_normal_loss + reg_constant * sum(reg_losses)

Hope that helps!


回答 1

现有答案的几个方面对我来说还不是很清楚,所以这里是一个循序渐进的指南:

  1. 定义一个正则化器。在这里可以设置正则化常量,例如:

    regularizer = tf.contrib.layers.l2_regularizer(scale=0.1)
  2. 通过以下方式创建变量:

        weights = tf.get_variable(
            name="weights",
            regularizer=regularizer,
            ...
        )

    等效地,可以通过常规weights = tf.Variable(...)构造函数创建变量,然后通过tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, weights)

  3. 定义一些loss术语并添加正则化术语:

    reg_variables = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
    reg_term = tf.contrib.layers.apply_regularization(regularizer, reg_variables)
    loss += reg_term

    注意:看起来tf.contrib.layers.apply_regularization是作为实现的AddN,所以大致等同于sum(reg_variables)

A few aspects of the existing answer were not immediately clear to me, so here is a step-by-step guide:

  1. Define a regularizer. This is where the regularization constant can be set, e.g.:

    regularizer = tf.contrib.layers.l2_regularizer(scale=0.1)
    
  2. Create variables via:

        weights = tf.get_variable(
            name="weights",
            regularizer=regularizer,
            ...
        )
    

    Equivalently, variables can be created via the regular weights = tf.Variable(...) constructor, followed by tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, weights).

  3. Define some loss term and add the regularization term:

    reg_variables = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
    reg_term = tf.contrib.layers.apply_regularization(regularizer, reg_variables)
    loss += reg_term
    

    Note: It looks like tf.contrib.layers.apply_regularization is implemented as an AddN, so more or less equivalent to sum(reg_variables).


回答 2

由于找不到答案,我将提供一个简单正确的答案。您需要两个简单的步骤,其余步骤由tensorflow magic完成:

  1. 在创建变量或图层时添加正则化器:

    tf.layers.dense(x, kernel_regularizer=tf.contrib.layers.l2_regularizer(0.001))
    # or
    tf.get_variable('a', regularizer=tf.contrib.layers.l2_regularizer(0.001))
  2. 在定义损失时添加正则项:

    loss = ordinary_loss + tf.losses.get_regularization_loss()

I’ll provide a simple correct answer since I didn’t find one. You need two simple steps, the rest is done by tensorflow magic:

  1. Add regularizers when creating variables or layers:

    tf.layers.dense(x, kernel_regularizer=tf.contrib.layers.l2_regularizer(0.001))
    # or
    tf.get_variable('a', regularizer=tf.contrib.layers.l2_regularizer(0.001))
    
  2. Add the regularization term when defining loss:

    loss = ordinary_loss + tf.losses.get_regularization_loss()
    

回答 3

contrib.learn基于Tensorflow网站上的Deep MNIST教程,对该库执行此操作的另一种方法如下。首先,假设您已经导入了相关的库(例如import tensorflow.contrib.layers as layers),则可以使用单独的方法定义网络:

def easier_network(x, reg):
    """ A network based on tf.contrib.learn, with input `x`. """
    with tf.variable_scope('EasyNet'):
        out = layers.flatten(x)
        out = layers.fully_connected(out, 
                num_outputs=200,
                weights_initializer = layers.xavier_initializer(uniform=True),
                weights_regularizer = layers.l2_regularizer(scale=reg),
                activation_fn = tf.nn.tanh)
        out = layers.fully_connected(out, 
                num_outputs=200,
                weights_initializer = layers.xavier_initializer(uniform=True),
                weights_regularizer = layers.l2_regularizer(scale=reg),
                activation_fn = tf.nn.tanh)
        out = layers.fully_connected(out, 
                num_outputs=10, # Because there are ten digits!
                weights_initializer = layers.xavier_initializer(uniform=True),
                weights_regularizer = layers.l2_regularizer(scale=reg),
                activation_fn = None)
        return out 

然后,在主要方法中,您可以使用以下代码段:

def main(_):
    mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)
    x = tf.placeholder(tf.float32, [None, 784])
    y_ = tf.placeholder(tf.float32, [None, 10])

    # Make a network with regularization
    y_conv = easier_network(x, FLAGS.regu)
    weights = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'EasyNet') 
    print("")
    for w in weights:
        shp = w.get_shape().as_list()
        print("- {} shape:{} size:{}".format(w.name, shp, np.prod(shp)))
    print("")
    reg_ws = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES, 'EasyNet')
    for w in reg_ws:
        shp = w.get_shape().as_list()
        print("- {} shape:{} size:{}".format(w.name, shp, np.prod(shp)))
    print("")

    # Make the loss function `loss_fn` with regularization.
    cross_entropy = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
    loss_fn = cross_entropy + tf.reduce_sum(reg_ws)
    train_step = tf.train.AdamOptimizer(1e-4).minimize(loss_fn)

为了使它起作用,您需要遵循我之前链接的MNIST教程并导入相关的库,但是学习TensorFlow是一个不错的练习,并且很容易看到正则化如何影响输出。如果将正则化用作参数,则可以看到以下内容:

- EasyNet/fully_connected/weights:0 shape:[784, 200] size:156800
- EasyNet/fully_connected/biases:0 shape:[200] size:200
- EasyNet/fully_connected_1/weights:0 shape:[200, 200] size:40000
- EasyNet/fully_connected_1/biases:0 shape:[200] size:200
- EasyNet/fully_connected_2/weights:0 shape:[200, 10] size:2000
- EasyNet/fully_connected_2/biases:0 shape:[10] size:10

- EasyNet/fully_connected/kernel/Regularizer/l2_regularizer:0 shape:[] size:1.0
- EasyNet/fully_connected_1/kernel/Regularizer/l2_regularizer:0 shape:[] size:1.0
- EasyNet/fully_connected_2/kernel/Regularizer/l2_regularizer:0 shape:[] size:1.0

请注意,基于可用项目,正则化部分为您提供了三项。

使用0、0.0001、0.01和1.0的正则化,我得到的测试精度值分别为0.9468、0.9476、0.9183和0.1135,显示了高正则项的危险。

Another option to do this with the contrib.learn library is as follows, based on the Deep MNIST tutorial on the Tensorflow website. First, assuming you’ve imported the relevant libraries (such as import tensorflow.contrib.layers as layers), you can define a network in a separate method:

def easier_network(x, reg):
    """ A network based on tf.contrib.learn, with input `x`. """
    with tf.variable_scope('EasyNet'):
        out = layers.flatten(x)
        out = layers.fully_connected(out, 
                num_outputs=200,
                weights_initializer = layers.xavier_initializer(uniform=True),
                weights_regularizer = layers.l2_regularizer(scale=reg),
                activation_fn = tf.nn.tanh)
        out = layers.fully_connected(out, 
                num_outputs=200,
                weights_initializer = layers.xavier_initializer(uniform=True),
                weights_regularizer = layers.l2_regularizer(scale=reg),
                activation_fn = tf.nn.tanh)
        out = layers.fully_connected(out, 
                num_outputs=10, # Because there are ten digits!
                weights_initializer = layers.xavier_initializer(uniform=True),
                weights_regularizer = layers.l2_regularizer(scale=reg),
                activation_fn = None)
        return out 

Then, in a main method, you can use the following code snippet:

def main(_):
    mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)
    x = tf.placeholder(tf.float32, [None, 784])
    y_ = tf.placeholder(tf.float32, [None, 10])

    # Make a network with regularization
    y_conv = easier_network(x, FLAGS.regu)
    weights = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'EasyNet') 
    print("")
    for w in weights:
        shp = w.get_shape().as_list()
        print("- {} shape:{} size:{}".format(w.name, shp, np.prod(shp)))
    print("")
    reg_ws = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES, 'EasyNet')
    for w in reg_ws:
        shp = w.get_shape().as_list()
        print("- {} shape:{} size:{}".format(w.name, shp, np.prod(shp)))
    print("")

    # Make the loss function `loss_fn` with regularization.
    cross_entropy = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
    loss_fn = cross_entropy + tf.reduce_sum(reg_ws)
    train_step = tf.train.AdamOptimizer(1e-4).minimize(loss_fn)

To get this to work you need to follow the MNIST tutorial I linked to earlier and import the relevant libraries, but it’s a nice exercise to learn TensorFlow and it’s easy to see how the regularization affects the output. If you apply a regularization as an argument, you can see the following:

- EasyNet/fully_connected/weights:0 shape:[784, 200] size:156800
- EasyNet/fully_connected/biases:0 shape:[200] size:200
- EasyNet/fully_connected_1/weights:0 shape:[200, 200] size:40000
- EasyNet/fully_connected_1/biases:0 shape:[200] size:200
- EasyNet/fully_connected_2/weights:0 shape:[200, 10] size:2000
- EasyNet/fully_connected_2/biases:0 shape:[10] size:10

- EasyNet/fully_connected/kernel/Regularizer/l2_regularizer:0 shape:[] size:1.0
- EasyNet/fully_connected_1/kernel/Regularizer/l2_regularizer:0 shape:[] size:1.0
- EasyNet/fully_connected_2/kernel/Regularizer/l2_regularizer:0 shape:[] size:1.0

Notice that the regularization portion gives you three items, based on the items available.

With regularizations of 0, 0.0001, 0.01, and 1.0, I get test accuracy values of 0.9468, 0.9476, 0.9183, and 0.1135, respectively, showing the dangers of high regularization terms.


回答 4

如果有人还在寻找,我想在tf.keras中添加它,您可以通过将其作为参数传递到图层中来添加权重正则化。从Tensorflow Keras Tutorials站点批发获得的添加L2正则化的示例:

model = keras.models.Sequential([
    keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),
                       activation=tf.nn.relu, input_shape=(NUM_WORDS,)),
    keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),
                       activation=tf.nn.relu),
    keras.layers.Dense(1, activation=tf.nn.sigmoid)
])

据我所知,无需使用此方法手动添加正则化损失。

参考: https //www.tensorflow.org/tutorials/keras/overfit_and_underfit#add_weight_regularization

If anyone’s still looking, I’d just like to add on that in tf.keras you may add weight regularization by passing them as arguments in your layers. An example of adding L2 regularization taken wholesale from the Tensorflow Keras Tutorials site:

model = keras.models.Sequential([
    keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),
                       activation=tf.nn.relu, input_shape=(NUM_WORDS,)),
    keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),
                       activation=tf.nn.relu),
    keras.layers.Dense(1, activation=tf.nn.sigmoid)
])

There’s no need to manually add in the regularization losses with this method as far as I know.

Reference: https://www.tensorflow.org/tutorials/keras/overfit_and_underfit#add_weight_regularization


回答 5

我进行了测试tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)tf.losses.get_regularization_loss()l2_regularizer在图中使用了一个,发现它们返回相同的值。通过观察值的数量,我猜想reg_constant通过设置的参数已经对值有意义tf.contrib.layers.l2_regularizer

I tested tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES) and tf.losses.get_regularization_loss() with one l2_regularizer in the graph, and found that they return the same value. By observing the value’s quantity, I guess reg_constant has already make sense on the value by setting the parameter of tf.contrib.layers.l2_regularizer.


回答 6

如果您有CNN,则可以执行以下操作:

在您的模型函数中:

conv = tf.layers.conv2d(inputs=input_layer,
                        filters=32,
                        kernel_size=[3, 3],
                        kernel_initializer='xavier',
                        kernel_regularizer=tf.contrib.layers.l2_regularizer(1e-5),
                        padding="same",
                        activation=None) 
...

在损失函数中:

onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=num_classes)
loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)
regularization_losses = tf.losses.get_regularization_losses()
loss = tf.add_n([loss] + regularization_losses)

If you have CNN you may do the following:

In your model function:

conv = tf.layers.conv2d(inputs=input_layer,
                        filters=32,
                        kernel_size=[3, 3],
                        kernel_initializer='xavier',
                        kernel_regularizer=tf.contrib.layers.l2_regularizer(1e-5),
                        padding="same",
                        activation=None) 
...

In your loss function:

onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=num_classes)
loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)
regularization_losses = tf.losses.get_regularization_losses()
loss = tf.add_n([loss] + regularization_losses)

回答 7

一些答案让我更加困惑。在这里,我给出两种方法来使它变得清晰。

#1.adding all regs by hand
var1 = tf.get_variable(name='v1',shape=[1],dtype=tf.float32)
var2 = tf.Variable(name='v2',initial_value=1.0,dtype=tf.float32)
regularizer = tf.contrib.layers.l1_regularizer(0.1)
reg_term = tf.contrib.layers.apply_regularization(regularizer,[var1,var2])
#here reg_term is a scalar

#2.auto added and read,but using get_variable
with tf.variable_scope('x',
        regularizer=tf.contrib.layers.l2_regularizer(0.1)):
    var1 = tf.get_variable(name='v1',shape=[1],dtype=tf.float32)
    var2 = tf.get_variable(name='v2',shape=[1],dtype=tf.float32)
reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
#here reg_losses is a list,should be summed 

然后,可以将其添加到总损失中

Some answers make me more confused.Here I give two methods to make it clearly.

#1.adding all regs by hand
var1 = tf.get_variable(name='v1',shape=[1],dtype=tf.float32)
var2 = tf.Variable(name='v2',initial_value=1.0,dtype=tf.float32)
regularizer = tf.contrib.layers.l1_regularizer(0.1)
reg_term = tf.contrib.layers.apply_regularization(regularizer,[var1,var2])
#here reg_term is a scalar

#2.auto added and read,but using get_variable
with tf.variable_scope('x',
        regularizer=tf.contrib.layers.l2_regularizer(0.1)):
    var1 = tf.get_variable(name='v1',shape=[1],dtype=tf.float32)
    var2 = tf.get_variable(name='v2',shape=[1],dtype=tf.float32)
reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
#here reg_losses is a list,should be summed 

Then,it can be added into the total loss


回答 8

cross_entropy = tf.losses.softmax_cross_entropy(
  logits=logits, onehot_labels=labels)

l2_loss = weight_decay * tf.add_n(
     [tf.nn.l2_loss(tf.cast(v, tf.float32)) for v in tf.trainable_variables()])

loss = cross_entropy + l2_loss
cross_entropy = tf.losses.softmax_cross_entropy(
  logits=logits, onehot_labels=labels)

l2_loss = weight_decay * tf.add_n(
     [tf.nn.l2_loss(tf.cast(v, tf.float32)) for v in tf.trainable_variables()])

loss = cross_entropy + l2_loss

回答 9

tf.GraphKeys.REGULARIZATION_LOSSES 不会自动添加,但是有一种简单的添加方法:

reg_loss = tf.losses.get_regularization_loss()
total_loss = loss + reg_loss

tf.losses.get_regularization_loss()用于tf.add_ntf.GraphKeys.REGULARIZATION_LOSSES逐个元素的项求和。tf.GraphKeys.REGULARIZATION_LOSSES通常是使用正则化函数计算的标量列表。它从tf.get_variable具有regularizer指定参数的调用中获取条目。您也可以手动添加到该集合。这在使用时tf.Variable以及在指定活动正则器或其他自定义正则器时将很有用。例如:

#This will add an activity regularizer on y to the regloss collection
regularizer = tf.contrib.layers.l2_regularizer(0.1)
y = tf.nn.sigmoid(x)
act_reg = regularizer(y)
tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, act_reg)

(在此示例中,对x进行正则可能会更有效,因为y对于大x而言确实变平了。)

tf.GraphKeys.REGULARIZATION_LOSSES will not be added automatically, but there is a simple way to add them:

reg_loss = tf.losses.get_regularization_loss()
total_loss = loss + reg_loss

tf.losses.get_regularization_loss() uses tf.add_n to sum the entries of tf.GraphKeys.REGULARIZATION_LOSSES element-wise. tf.GraphKeys.REGULARIZATION_LOSSES will typically be a list of scalars, calculated using regularizer functions. It gets entries from calls to tf.get_variable that have the regularizer parameter specified. You can also add to that collection manually. That would be useful when using tf.Variable and also when specifying activity regularizers or other custom regularizers. For instance:

#This will add an activity regularizer on y to the regloss collection
regularizer = tf.contrib.layers.l2_regularizer(0.1)
y = tf.nn.sigmoid(x)
act_reg = regularizer(y)
tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, act_reg)

(In this example it would presumably be more effective to regularize x, as y really flattens out for large x.)


如何为GradientDescentOptimizer设置自适应学习率?

问题:如何为GradientDescentOptimizer设置自适应学习率?

我正在使用TensorFlow训练神经网络。这就是我初始化的方式GradientDescentOptimizer

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

mse        = tf.reduce_mean(tf.square(out - out_))
train_step = tf.train.GradientDescentOptimizer(0.3).minimize(mse)

这里的问题是我不知道如何为学习率设置更新规则或为它设置衰减值。

在这里如何使用自适应学习率?

I am using TensorFlow to train a neural network. This is how I am initializing the GradientDescentOptimizer:

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

mse        = tf.reduce_mean(tf.square(out - out_))
train_step = tf.train.GradientDescentOptimizer(0.3).minimize(mse)

The thing here is that I don’t know how to set an update rule for the learning rate or a decay value for that.

How can I use an adaptive learning rate here?


回答 0

首先,tf.train.GradientDescentOptimizer旨在对所有步骤中的所有变量使用恒定的学习率。TensorFlow还提供了开箱即用的自适应优化器,包括tf.train.AdagradOptimizertf.train.AdamOptimizer,这些可以用作即插即用的替代品。

但是,如果您想通过其他方式降低梯度来控制学习率,则可以利用构造函数learning_rate参数可以是对象的事实。这使您可以在每个步骤中为学习率计算一个不同的值,例如:tf.train.GradientDescentOptimizerTensor

learning_rate = tf.placeholder(tf.float32, shape=[])
# ...
train_step = tf.train.GradientDescentOptimizer(
    learning_rate=learning_rate).minimize(mse)

sess = tf.Session()

# Feed different values for learning rate to each training step.
sess.run(train_step, feed_dict={learning_rate: 0.1})
sess.run(train_step, feed_dict={learning_rate: 0.1})
sess.run(train_step, feed_dict={learning_rate: 0.01})
sess.run(train_step, feed_dict={learning_rate: 0.01})

或者,您可以创建一个tf.Variable保存学习率的标量,并在每次您想要更改学习率时对其进行分配。

First of all, tf.train.GradientDescentOptimizer is designed to use a constant learning rate for all variables in all steps. TensorFlow also provides out-of-the-box adaptive optimizers including the tf.train.AdagradOptimizer and the tf.train.AdamOptimizer, and these can be used as drop-in replacements.

However, if you want to control the learning rate with otherwise-vanilla gradient descent, you can take advantage of the fact that the learning_rate argument to the tf.train.GradientDescentOptimizer constructor can be a Tensor object. This allows you to compute a different value for the learning rate in each step, for example:

learning_rate = tf.placeholder(tf.float32, shape=[])
# ...
train_step = tf.train.GradientDescentOptimizer(
    learning_rate=learning_rate).minimize(mse)

sess = tf.Session()

# Feed different values for learning rate to each training step.
sess.run(train_step, feed_dict={learning_rate: 0.1})
sess.run(train_step, feed_dict={learning_rate: 0.1})
sess.run(train_step, feed_dict={learning_rate: 0.01})
sess.run(train_step, feed_dict={learning_rate: 0.01})

Alternatively, you could create a scalar tf.Variable that holds the learning rate, and assign it each time you want to change the learning rate.


回答 1

Tensorflow提供运到指数衰减自动应用到学习率张量:tf.train.exponential_decay。有关其使用示例,请参见MNIST卷积模型示例中的这一行。然后,使用上面的@mrry的建议将此变量作为您选择的优化程序的learning_rate参数提供。

要看的关键摘录是:

# Optimizer: set up a variable that's incremented once per batch and
# controls the learning rate decay.
batch = tf.Variable(0)

learning_rate = tf.train.exponential_decay(
  0.01,                # Base learning rate.
  batch * BATCH_SIZE,  # Current index into the dataset.
  train_size,          # Decay step.
  0.95,                # Decay rate.
  staircase=True)
# Use simple momentum for the optimization.
optimizer = tf.train.MomentumOptimizer(learning_rate,
                                     0.9).minimize(loss,
                                                   global_step=batch)

请注意global_step=batch参数以最小化。这告诉优化器在每次训练时为您有用地增加’batch’参数。

Tensorflow provides an op to automatically apply an exponential decay to a learning rate tensor: tf.train.exponential_decay. For an example of it in use, see this line in the MNIST convolutional model example. Then use @mrry’s suggestion above to supply this variable as the learning_rate parameter to your optimizer of choice.

The key excerpt to look at is:

# Optimizer: set up a variable that's incremented once per batch and
# controls the learning rate decay.
batch = tf.Variable(0)

learning_rate = tf.train.exponential_decay(
  0.01,                # Base learning rate.
  batch * BATCH_SIZE,  # Current index into the dataset.
  train_size,          # Decay step.
  0.95,                # Decay rate.
  staircase=True)
# Use simple momentum for the optimization.
optimizer = tf.train.MomentumOptimizer(learning_rate,
                                     0.9).minimize(loss,
                                                   global_step=batch)

Note the global_step=batch parameter to minimize. That tells the optimizer to helpfully increment the ‘batch’ parameter for you every time it trains.


回答 2

梯度下降算法使用您可以在初始化期间提供的恒定学习率。您可以通过Mrry展示的方式通过各种学习率。

但是,除了它,您还可以使用更高级的优化器,这些优化器具有更快的收敛速度并可以适应这种情况。

根据我的理解,这是一个简短的解释:

  • 动量 可帮助 SGD沿相关方向导航并软化无关的振荡。它只是将上一步的方向的一部分添加到当前步骤中。这样可以以正确的方向放大速度,并软化错误方向的振动。该分数通常在(0,1)范围内。使用自适应动量也很有意义。在开始学习时,很大的动力只会阻碍您的进步,因此使用0.01之类的东西就显得有些麻木了,一旦所有的高梯度消失了,您就可以使用更大的动量。动量存在一个问题:当我们非常接近目标时,在大多数情况下我们的动量很高,并且不知道它会放慢速度。这可能会导致它丢失或在最小值附近振荡
  • Nesterov加速梯度可以通过尽早降低速度来解决此问题。在动量中,我们首先计算坡度,然后在该方向上进行跳跃,并被我们之前拥有的任何动量放大。NAG的功能相同,但顺序相反:首先,我们根据存储的信息进行较大的跳跃,然后计算梯度并进行较小的校正。这种看似无关紧要的变化大大提高了实用速度。
  • AdaGrad或自适应梯度允许学习率根据参数进行自适应。它对不频繁的参数执行较大的更新,对频繁的参数执行较小的更新。因此,它非常适合稀疏数据(NLP或图像识别)。另一个优点是,它基本上不需要调整学习速度。每个参数都有其自己的学习速率,由于算法的特殊性,学习速率单调降低。这导致了最大的问题:在某些时候,学习率太小,系统停止学习
  • AdaDelta解决了AdaGrad中单调降低学习率的问题。在AdaGrad中,学习率大约是用除以平方根之和得出的。在每个阶段,您都要在总和上加上另一个平方根,这会使分母不断减小。在AdaDelta中,它使用滑动窗口而不是将所有过去的平方根求和,而是使总和减少。RMSprop与AdaDelta非常相似
  • 亚当或自适应动量是类似于AdaDelta的算法。但是,除了存储每个参数的学习率之外,它还分别存储每个参数的动量变化

    一个几可视化

Gradient descent algorithm uses the constant learning rate which you can provide in during the initialization. You can pass various learning rates in a way showed by Mrry.

But instead of it you can also use more advanced optimizers which have faster convergence rate and adapts to the situation.

Here is a brief explanation based on my understanding:

  • momentum helps SGD to navigate along the relevant directions and softens the oscillations in the irrelevant. It simply adds a fraction of the direction of the previous step to a current step. This achieves amplification of speed in the correct dirrection and softens oscillation in wrong directions. This fraction is usually in the (0, 1) range. It also makes sense to use adaptive momentum. In the beginning of learning a big momentum will only hinder your progress, so it makse sense to use something like 0.01 and once all the high gradients disappeared you can use a bigger momentom. There is one problem with momentum: when we are very close to the goal, our momentum in most of the cases is very high and it does not know that it should slow down. This can cause it to miss or oscillate around the minima
  • nesterov accelerated gradient overcomes this problem by starting to slow down early. In momentum we first compute gradient and then make a jump in that direction amplified by whatever momentum we had previously. NAG does the same thing but in another order: at first we make a big jump based on our stored information, and then we calculate the gradient and make a small correction. This seemingly irrelevant change gives significant practical speedups.
  • AdaGrad or adaptive gradient allows the learning rate to adapt based on parameters. It performs larger updates for infrequent parameters and smaller updates for frequent one. Because of this it is well suited for sparse data (NLP or image recognition). Another advantage is that it basically illiminates the need to tune the learning rate. Each parameter has its own learning rate and due to the peculiarities of the algorithm the learning rate is monotonically decreasing. This causes the biggest problem: at some point of time the learning rate is so small that the system stops learning
  • AdaDelta resolves the problem of monotonically decreasing learning rate in AdaGrad. In AdaGrad the learning rate was calculated approximately as one divided by the sum of square roots. At each stage you add another square root to the sum, which causes denominator to constantly decrease. In AdaDelta instead of summing all past square roots it uses sliding window which allows the sum to decrease. RMSprop is very similar to AdaDelta
  • Adam or adaptive momentum is an algorithm similar to AdaDelta. But in addition to storing learning rates for each of the parameters it also stores momentum changes for each of them separately

    A few visualizations:


回答 3

来自Tensorflow官方文档

global_step = tf.Variable(0, trainable=False)
starter_learning_rate = 0.1
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
                                       100000, 0.96, staircase=True)

# Passing global_step to minimize() will increment it at each step.
learning_step = (
tf.train.GradientDescentOptimizer(learning_rate)
.minimize(...my loss..., global_step=global_step))

From tensorflow official docs

global_step = tf.Variable(0, trainable=False)
starter_learning_rate = 0.1
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
                                       100000, 0.96, staircase=True)

# Passing global_step to minimize() will increment it at each step.
learning_step = (
tf.train.GradientDescentOptimizer(learning_rate)
.minimize(...my loss..., global_step=global_step))

回答 4

如果您想为各个时间间隔设置特定的学习率,例如 0 < a < b < c < ...。然后,您可以将学习率定义为一个条件张量,以全局步长为条件,并将其正常地馈送到优化器。

您可以使用一堆嵌套tf.cond语句来实现此目的,但是以递归方式构建张量更容易:

def make_learning_rate_tensor(reduction_steps, learning_rates, global_step):
    assert len(reduction_steps) + 1 == len(learning_rates)
    if len(reduction_steps) == 1:
        return tf.cond(
            global_step < reduction_steps[0],
            lambda: learning_rates[0],
            lambda: learning_rates[1]
        )
    else:
        return tf.cond(
            global_step < reduction_steps[0],
            lambda: learning_rates[0],
            lambda: make_learning_rate_tensor(
                reduction_steps[1:],
                learning_rates[1:],
                global_step,)
            )

然后,要使用它,您需要知道一个时期中有多少个训练步骤,以便我们可以使用全局步骤在正确的时间切换,并最终定义您想要的时期和学习率。因此,如果我想分别[0.1, 0.01, 0.001, 0.0001]在每个纪元间隔内学习率[0, 19], [20, 59], [60, 99], [100, \infty],我会这样做:

global_step = tf.train.get_or_create_global_step()
learning_rates = [0.1, 0.01, 0.001, 0.0001]
steps_per_epoch = 225
epochs_to_switch_at = [20, 60, 100]
epochs_to_switch_at = [x*steps_per_epoch for x in epochs_to_switch_at ]
learning_rate = make_learning_rate_tensor(epochs_to_switch_at , learning_rates, global_step)

If you want to set specific learning rates for intervals of epochs like 0 < a < b < c < .... Then you can define your learning rate as a conditional tensor, conditional on the global step, and feed this as normal to the optimiser.

You could achieve this with a bunch of nested tf.cond statements, but its easier to build the tensor recursively:

def make_learning_rate_tensor(reduction_steps, learning_rates, global_step):
    assert len(reduction_steps) + 1 == len(learning_rates)
    if len(reduction_steps) == 1:
        return tf.cond(
            global_step < reduction_steps[0],
            lambda: learning_rates[0],
            lambda: learning_rates[1]
        )
    else:
        return tf.cond(
            global_step < reduction_steps[0],
            lambda: learning_rates[0],
            lambda: make_learning_rate_tensor(
                reduction_steps[1:],
                learning_rates[1:],
                global_step,)
            )

Then to use it you need to know how many training steps there are in a single epoch, so that we can use the global step to switch at the right time, and finally define the epochs and learning rates you want. So if I want the learning rates [0.1, 0.01, 0.001, 0.0001] during the epoch intervals of [0, 19], [20, 59], [60, 99], [100, \infty] respectively, I would do:

global_step = tf.train.get_or_create_global_step()
learning_rates = [0.1, 0.01, 0.001, 0.0001]
steps_per_epoch = 225
epochs_to_switch_at = [20, 60, 100]
epochs_to_switch_at = [x*steps_per_epoch for x in epochs_to_switch_at ]
learning_rate = make_learning_rate_tensor(epochs_to_switch_at , learning_rates, global_step)

TensorFlow保存到文件中/从文件中加载图形

问题:TensorFlow保存到文件中/从文件中加载图形

根据到目前为止的经验,有几种不同的方法可以将TensorFlow图转储到文件中,然后再将其加载到另一个程序中,但是我无法找到关于它们如何工作的清晰示例/信息。我已经知道的是:

  1. 使用a将模型的变量保存到检查点文件(.ckpt)中,tf.train.Saver()并在以后还原它们(
  2. 将模型保存到.pb文件,然后使用tf.train.write_graph()tf.import_graph_def()source)将其加载回
  3. 从.pb文件加载模型,对其进行重新训练,然后使用Bazel将其转储到新的.pb文件中(
  4. 冻结图形以将图形和权重保存在一起(
  5. 使用as_graph_def()保存模型,并为权重/变量,它们映射到常数(

但是,我无法清除有关这些不同方法的几个问题:

  1. 关于检查点文件,它们仅保存模型的训练权重吗?是否可以将检查点文件加载到新程序中并用于运行模型,还是仅将它们用作在特定时间/阶段将权重保存在模型中的方法?
  2. 关于tf.train.write_graph(),权重/变量也被保存吗?
  3. 关于Bazel,它只能保存到.pb文件中或从中加载以进行重新训练吗?是否有一个简单的Bazel命令只是将图形转储到.pb中?
  4. 关于冻结,是否可以使用来加载冻结图tf.import_graph_def()
  5. TensorFlow的Android演示从.pb文件加载到Google的Inception模型中。如果我想替换自己的.pb文件,该怎么做?我需要更改任何本机代码/方法吗?
  6. 通常,所有这些方法之间到底有什么区别?或更广泛地说,/。as_graph_def()ckpt / .pb有什么区别?

简而言之,我正在寻找一种将图形(如各种操作等)及其权重/变量都保存到文件中的方法,然后可以将其用于将图形和权重加载到另一个程序中,以供使用(不一定要继续/训练)。

关于此主题的文档不是很简单,因此,非常感谢您提供任何答案/信息。

From what I’ve gathered so far, there are several different ways of dumping a TensorFlow graph into a file and then loading it into another program, but I haven’t been able to find clear examples/information on how they work. What I already know is this:

  1. Save the model’s variables into a checkpoint file (.ckpt) using a tf.train.Saver() and restore them later (source)
  2. Save a model into a .pb file and load it back in using tf.train.write_graph() and tf.import_graph_def() (source)
  3. Load in a model from a .pb file, retrain it, and dump it into a new .pb file using Bazel (source)
  4. Freeze the graph to save the graph and weights together (source)
  5. Use as_graph_def() to save the model, and for weights/variables, map them into constants (source)

However, I haven’t been able to clear up several questions regarding these different methods:

  1. Regarding checkpoint files, do they only save the trained weights of a model? Could checkpoint files be loaded into a new program, and be used to run the model, or do they simply serve as ways to save the weights in a model at a certain time/stage?
  2. Regarding tf.train.write_graph(), are the weights/variables saved as well?
  3. Regarding Bazel, can it only save into/load from .pb files for retraining? Is there a simple Bazel command just to dump a graph into a .pb?
  4. Regarding freezing, can a frozen graph be loaded in using tf.import_graph_def()?
  5. The Android demo for TensorFlow loads in Google’s Inception model from a .pb file. If I wanted to substitute my own .pb file, how would I go about doing that? Would I need to change any native code/methods?
  6. In general, what exactly is the difference between all these methods? Or more broadly, what is the difference between as_graph_def()/.ckpt/.pb?

In short, what I’m looking for is a method to save both a graph (as in, the various operations and such) and its weights/variables into a file, which can then be used to load the graph and weights into another program, for use (not necessarily continuing/retraining).

Documentation about this topic isn’t very straightforward, so any answers/information would be greatly appreciated.


回答 0

有很多方法可以解决在TensorFlow中保存模型的问题,这可能会使它有些混乱。依次处理您的每个子问题:

  1. 检查点文件(例如产生通过调用saver.save()一个上tf.train.Saver对象)只包含的权重,并且在相同程序中定义的任何其它变量。要在另一个程序中使用它们,您必须重新创建关联的图形结构(例如,通过运行代码以再次构建它,或调用tf.import_graph_def()),这告诉TensorFlow如何处理这些权重。请注意,调用saver.save()还会生成一个包含的文件MetaGraphDef,该文件包含一个图形以及如何将检查点的权重与该图形相关联的详细信息。有关更多详细信息,请参见教程

  2. tf.train.write_graph()只写图结构;不是重量。

  3. Bazel与读取或写入TensorFlow图无关。(也许我误会了您的问题:请随时在评论中予以澄清。)

  4. 冻结的图可以使用加载tf.import_graph_def()。在这种情况下,权重(通常)嵌入在图形中,因此您无需加载单独的检查点。

  5. 主要更改将是更新输入到模型中的张量的名称以及从模型中获取的张量的名称。在TensorFlow Android演示中,这将与传递给的inputNameoutputName字符串相对应TensorFlowClassifier.initializeTensorFlow()

  6. GraphDef是该程序的结构,其通常不通过训练过程而改变。检查点是训练过程状态的快照,通常在训练过程的每个步骤都会改变。结果,TensorFlow对这些类型的数据使用不同的存储格式,并且低级API提供了不同的方式来保存和加载它们。更高级别的库,如MetaGraphDef图书馆,Kerasskflow对这些机制的构建提供更加便捷的方式来保存和恢复整个模型。

There are many ways to approach the problem of saving a model in TensorFlow, which can make it a bit confusing. Taking each of your sub-questions in turn:

  1. The checkpoint files (produced e.g. by calling saver.save() on a tf.train.Saver object) contain only the weights, and any other variables defined in the same program. To use them in another program, you must re-create the associated graph structure (e.g. by running code to build it again, or calling tf.import_graph_def()), which tells TensorFlow what to do with those weights. Note that calling saver.save() also produces a file containing a MetaGraphDef, which contains a graph and details of how to associate the weights from a checkpoint with that graph. See the tutorial for more details.

  2. tf.train.write_graph() only writes the graph structure; not the weights.

  3. Bazel is unrelated to reading or writing TensorFlow graphs. (Perhaps I misunderstand your question: feel free to clarify it in a comment.)

  4. A frozen graph can be loaded using tf.import_graph_def(). In this case, the weights are (typically) embedded in the graph, so you don’t need to load a separate checkpoint.

  5. The main change would be to update the names of the tensor(s) that are fed into the model, and the names of the tensor(s) that are fetched from the model. In the TensorFlow Android demo, this would correspond to the inputName and outputName strings that are passed to TensorFlowClassifier.initializeTensorFlow().

  6. The GraphDef is the program structure, which typically does not change through the training process. The checkpoint is a snapshot of the state of a training process, which typically changes at every step of the training process. As a result, TensorFlow uses different storage formats for these types of data, and the low-level API provides different ways to save and load them. Higher-level libraries, such as the MetaGraphDef libraries, Keras, and skflow build on these mechanisms to provide more convenient ways to save and restore an entire model.


回答 1

您可以尝试以下代码:

with tf.gfile.FastGFile('model/frozen_inference_graph.pb', "rb") as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    g_in = tf.import_graph_def(graph_def, name="")
sess = tf.Session(graph=g_in)

You can try the following code:

with tf.gfile.FastGFile('model/frozen_inference_graph.pb', "rb") as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    g_in = tf.import_graph_def(graph_def, name="")
sess = tf.Session(graph=g_in)

Stanford-tensorflow-tutorials-此存储库包含斯坦福课程的代码示例


斯坦福-TensorFlow-教程

此存储库包含课程CS 20:深度学习研究的TensorFlow的代码示例。
它将随着课程的进展而更新。
详细的教学大纲和课堂讲稿可以在这里找到。here
对于本课程,我使用python3.6和TensorFlow 1.4.1

前一年课程的代码和备注请参见文件夹2017和网站https://web.stanford.edu/class/cs20si/2017

有关安装说明和依赖项列表,请参阅此存储库的安装文件夹

如何在张量流中获取当前可用的GPU?

问题:如何在张量流中获取当前可用的GPU?

我有一个使用分布式TensorFlow的计划,并且看到TensorFlow可以使用GPU进行培训和测试。在集群环境中,每台机器可能具有0个或1个或更多个GPU,我想将TensorFlow图运行到尽可能多的机器上的GPU中。

我发现运行tf.Session()TensorFlow时会在如下所示的日志消息中提供有关GPU的信息:

I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)

我的问题是如何从TensorFlow获取有关当前可用GPU的信息?我可以从日志中获取已加载的GPU信息,但我想以更复杂的编程方式进行操作。我还可以使用CUDA_VISIBLE_DEVICES环境变量有意地限制GPU,所以我不想知道一种从OS内核获取GPU信息的方法。

简而言之,如果机器中有两个GPU ,我希望这样的函数tf.get_available_gpus()将返回['/gpu:0', '/gpu:1']。我该如何实施?

I have a plan to use distributed TensorFlow, and I saw TensorFlow can use GPUs for training and testing. In a cluster environment, each machine could have 0 or 1 or more GPUs, and I want to run my TensorFlow graph into GPUs on as many machines as possible.

I found that when running tf.Session() TensorFlow gives information about GPU in the log messages like below:

I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)

My question is how do I get information about current available GPU from TensorFlow? I can get loaded GPU information from the log, but I want to do it in a more sophisticated, programmatic way. I also could restrict GPUs intentionally using the CUDA_VISIBLE_DEVICES environment variable, so I don’t want to know a way of getting GPU information from OS kernel.

In short, I want a function like tf.get_available_gpus() that will return ['/gpu:0', '/gpu:1'] if there are two GPUs available in the machine. How can I implement this?


回答 0

有一个未记录的方法device_lib.list_local_devices(),该方法使您可以列出本地进程中可用的设备。(注意,作为一种未公开的方法,此方法可能会向后不兼容更改。)该函数返回DeviceAttributes协议缓冲区对象的列表。您可以按以下方式提取GPU设备的字符串设备名称列表:

from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type == 'GPU']

请注意(至少在TensorFlow 1.4之前),调用device_lib.list_local_devices()将运行一些初始化代码,默认情况下,这些初始化代码将在所有设备上分配所有GPU内存(GitHub issue)。为避免这种情况,请首先使用一个显着小的per_process_gpu_fraction或创建一个会话allow_growth=True,以防止分配所有内存。有关更多详细信息,请参见此问题

There is an undocumented method called device_lib.list_local_devices() that enables you to list the devices available in the local process. (N.B. As an undocumented method, this is subject to backwards incompatible changes.) The function returns a list of DeviceAttributes protocol buffer objects. You can extract a list of string device names for the GPU devices as follows:

from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type == 'GPU']

Note that (at least up to TensorFlow 1.4), calling device_lib.list_local_devices() will run some initialization code that, by default, will allocate all of the GPU memory on all of the devices (GitHub issue). To avoid this, first create a session with an explicitly small per_process_gpu_fraction, or allow_growth=True, to prevent all of the memory being allocated. See this question for more details.


回答 1

您可以使用以下代码检查所有设备列表:

from tensorflow.python.client import device_lib

device_lib.list_local_devices()

You can check all device list using following code:

from tensorflow.python.client import device_lib

device_lib.list_local_devices()

回答 2

测试工具中还有一种方法。因此,所有要做的就是:

tf.test.is_gpu_available()

和/或

tf.test.gpu_device_name()

在Tensorflow文档中查找参数。

There is also a method in the test util. So all that has to be done is:

tf.test.is_gpu_available()

and/or

tf.test.gpu_device_name()

Look up the Tensorflow docs for arguments.


回答 3

在TensorFlow 2.0中,您可以使用 tf.config.experimental.list_physical_devices('GPU')

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    print("Name:", gpu.name, "  Type:", gpu.device_type)

如果您安装了两个GPU,它将输出以下内容:

Name: /physical_device:GPU:0   Type: GPU
Name: /physical_device:GPU:1   Type: GPU

从2.1开始,您可以 experimental

gpus = tf.config.list_physical_devices('GPU')

看到:

In TensorFlow 2.0, you can use tf.config.experimental.list_physical_devices('GPU'):

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    print("Name:", gpu.name, "  Type:", gpu.device_type)

If you have two GPUs installed, it outputs this:

Name: /physical_device:GPU:0   Type: GPU
Name: /physical_device:GPU:1   Type: GPU

From 2.1, you can drop experimental:

gpus = tf.config.list_physical_devices('GPU')

See:


回答 4

接受的答案给你GPU的数量,但它也分配所有这些GPU的内存。您可以通过在调用device_lib.list_local_devices()之前创建具有固定较低内存的会话来避免这种情况,这对于某些应用程序可能是不需要的。

我最终使用nvidia-smi来获取GPU的数量,而没有在其上分配任何内存。

import subprocess

n = str(subprocess.check_output(["nvidia-smi", "-L"])).count('UUID')

The accepted answer gives you the number of GPUs but it also allocates all the memory on those GPUs. You can avoid this by creating a session with fixed lower memory before calling device_lib.list_local_devices() which may be unwanted for some applications.

I ended up using nvidia-smi to get the number of GPUs without allocating any memory on them.

import subprocess

n = str(subprocess.check_output(["nvidia-smi", "-L"])).count('UUID')

回答 5

除了Mrry的出色解释之外,他建议在哪里使用,device_lib.list_local_devices()我可以向您展示如何从命令行检查GPU相关信息。

因为目前只有Nvidia的GPU适用于NN框架,所以答案只涵盖了它们。Nvidia上有一个页面,其中记录了如何使用/ proc文件系统接口来获取有关驱动程序,任何已安装的NVIDIA图形卡以及AGP状态的运行时信息。

/proc/driver/nvidia/gpus/0..N/information

提供有关每个已安装的NVIDIA图形适配器的信息(型号名称,IRQ,BIOS版本,总线类型)。请注意,BIOS版本仅在X运行时可用。

因此,您可以从命令行运行此命令,cat /proc/driver/nvidia/gpus/0/information并查看有关第一个GPU的信息。从python运行它很容易并且您可以检查第二,第三,第四GPU直到失败。

肯定Mrry的答案更可靠,而且我不确定我的答案是否可以在非Linux机器上使用,但是Nvidia的页面提供了其他有趣的信息,但鲜为人知。

Apart from the excellent explanation by Mrry, where he suggested to use device_lib.list_local_devices() I can show you how you can check for GPU related information from the command line.

Because currently only Nvidia’s gpus work for NN frameworks, the answer covers only them. Nvidia has a page where they document how you can use the /proc filesystem interface to obtain run-time information about the driver, any installed NVIDIA graphics cards, and the AGP status.

/proc/driver/nvidia/gpus/0..N/information

Provide information about each of the installed NVIDIA graphics adapters (model name, IRQ, BIOS version, Bus Type). Note that the BIOS version is only available while X is running.

So you can run this from command line cat /proc/driver/nvidia/gpus/0/information and see information about your first GPU. It is easy to run this from python and also you can check second, third, fourth GPU till it will fail.

Definitely Mrry’s answer is more robust and I am not sure whether my answer will work on non-linux machine, but that Nvidia’s page provide other interesting information, which not many people know about.


回答 6

以下工作在tensorflow 2中:

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    print("Name:", gpu.name, "  Type:", gpu.device_type)

从2.1开始,您可以删除experimental

    gpus = tf.config.list_physical_devices('GPU')

https://www.tensorflow.org/api_docs/python/tf/config/list_physical_devices

The following works in tensorflow 2:

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    print("Name:", gpu.name, "  Type:", gpu.device_type)

From 2.1, you can drop experimental:

    gpus = tf.config.list_physical_devices('GPU')

https://www.tensorflow.org/api_docs/python/tf/config/list_physical_devices


回答 7

NVIDIA GTX GeForce 1650 Ti我的机器中调用了一个GPUtensorflow-gpu==2.2.0

运行以下两行代码:

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

输出:

Num GPUs Available:  1

I got a GPU called NVIDIA GTX GeForce 1650 Ti in my machine with tensorflow-gpu==2.2.0

Run the following two lines of code:

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

Output:

Num GPUs Available:  1

回答 8

使用这种方式并检查所有零件:

from __future__ import absolute_import, division, print_function, unicode_literals

import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds


version = tf.__version__
executing_eagerly = tf.executing_eagerly()
hub_version = hub.__version__
available = tf.config.experimental.list_physical_devices("GPU")

print("Version: ", version)
print("Eager mode: ", executing_eagerly)
print("Hub Version: ", h_version)
print("GPU is", "available" if avai else "NOT AVAILABLE")

Use this way and check all parts :

from __future__ import absolute_import, division, print_function, unicode_literals

import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds


version = tf.__version__
executing_eagerly = tf.executing_eagerly()
hub_version = hub.__version__
available = tf.config.experimental.list_physical_devices("GPU")

print("Version: ", version)
print("Eager mode: ", executing_eagerly)
print("Hub Version: ", h_version)
print("GPU is", "available" if avai else "NOT AVAILABLE")

回答 9

确保在您的GPU支持计算机中安装了最新的TensorFlow 2.x GPU,在python中执行以下代码,

from __future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf 

print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

会得到一个输出看起来像,

2020-02-07 10:45:37.587838:我tensorflow / stream_executor / cuda / cuda_gpu_executor.cc:1006]从SysFS读取成功的NUMA节点具有负值(-1),但必须至少有一个NUMA节点,因此返回NUMA节点为零2020-02-07 10:45:37.588896:I tensorflow / core / common_runtime / gpu / gpu_device.cc:1746]添加可见的gpu设备:0、1、2、3、4、5、6、7 Num可用GPU:8

Ensure you have the latest TensorFlow 2.x GPU installed in your GPU supporting machine, Execute the following code in python,

from __future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf 

print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

Will get an output looks like,

2020-02-07 10:45:37.587838: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-02-07 10:45:37.588896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7 Num GPUs Available: 8


tf.nn.embedding_lookup函数有什么作用?

问题:tf.nn.embedding_lookup函数有什么作用?

tf.nn.embedding_lookup(params, ids, partition_strategy='mod', name=None)

我不了解此功能的职责。像查找表吗?用哪种方法返回每个ID对应的参数(以ID为单位)?

例如,在skip-gram模型中,如果使用tf.nn.embedding_lookup(embeddings, train_inputs),则为每个train_input找到对应的嵌入?

tf.nn.embedding_lookup(params, ids, partition_strategy='mod', name=None)

I cannot understand the duty of this function. Is it like a lookup table? Which means to return the parameters corresponding to each id (in ids)?

For instance, in the skip-gram model if we use tf.nn.embedding_lookup(embeddings, train_inputs), then for each train_input it finds the correspond embedding?


回答 0

embedding_lookup函数检索params张量的行。该行为类似于对numpy中的数组使用索引。例如

matrix = np.random.random([1024, 64])  # 64-dimensional embeddings
ids = np.array([0, 5, 17, 33])
print matrix[ids]  # prints a matrix of shape [4, 64] 

params参数也可以是张量的列表,在这种情况下,ids将在张量之间分配。例如,给定的3张量列表[2, 64],默认行为是,他们将代表ids[0, 3][1, 4][2, 5]

partition_strategy控制ids列表中的分布方式。当矩阵可能太大而无法合为一体时,分区对于较大规模的问题很有用。

embedding_lookup function retrieves rows of the params tensor. The behavior is similar to using indexing with arrays in numpy. E.g.

matrix = np.random.random([1024, 64])  # 64-dimensional embeddings
ids = np.array([0, 5, 17, 33])
print matrix[ids]  # prints a matrix of shape [4, 64] 

params argument can be also a list of tensors in which case the ids will be distributed among the tensors. For example, given a list of 3 tensors [2, 64], the default behavior is that they will represent ids: [0, 3], [1, 4], [2, 5].

partition_strategy controls the way how the ids are distributed among the list. The partitioning is useful for larger scale problems when the matrix might be too large to keep in one piece.


回答 1

是的,在您明白这一点之前,很难理解此功能。

最简单的形式类似于tf.gather。它params根据所指定的索引返回的元素ids

例如(假设您在里面tf.InteractiveSession()

params = tf.constant([10,20,30,40])
ids = tf.constant([0,1,2,3])
print tf.nn.embedding_lookup(params,ids).eval()

将返回[10 20 30 40],因为params的第一个元素(索引0)为,params 10的第二个元素(索引1)为20,依此类推。

同样,

params = tf.constant([10,20,30,40])
ids = tf.constant([1,1,3])
print tf.nn.embedding_lookup(params,ids).eval()

会回来的[20 20 40]

embedding_lookup比这更。该params参数可以是张量列表,而不是单个张量。

params1 = tf.constant([1,2])
params2 = tf.constant([10,20])
ids = tf.constant([2,0,2,1,2,3])
result = tf.nn.embedding_lookup([params1, params2], ids)

在这种情况下,ids根据分区策略,在中指定的索引对应于张量的元素,其中默认分区策略为’mod’。

在’mod’策略中,索引0对应于列表中第一个张量的第一个元素。索引1对应于第二张量的第一元素。索引2对应于第三张量的第一个元素,依此类推。假设params是张量的列表,对于所有索引,简单地index 对应第(i + 1)张量的第一个元素。i0..(n-1)n

现在,索引n不能对应于张量n + 1,因为列表params仅包含n张量。因此index n对应于第一个张量的第二个元素。类似地,index n+1对应于第二张量的第二个元素,依此类推。

因此,在代码中

params1 = tf.constant([1,2])
params2 = tf.constant([10,20])
ids = tf.constant([2,0,2,1,2,3])
result = tf.nn.embedding_lookup([params1, params2], ids)

下标0对应于第一个张量的第一个元素:1

索引1对应于第二张量的第一个元素:10

索引2对应于第一个张量的第二个元素:2

索引3对应于第二张量的第二个元素:20

因此,结果将是:

[ 2  1  2 10  2 20]

Yes, this function is hard to understand, until you get the point.

In its simplest form, it is similar to tf.gather. It returns the elements of params according to the indexes specified by ids.

For example (assuming you are inside tf.InteractiveSession())

params = tf.constant([10,20,30,40])
ids = tf.constant([0,1,2,3])
print tf.nn.embedding_lookup(params,ids).eval()

would return [10 20 30 40], because the first element (index 0) of params is 10, the second element of params (index 1) is 20, etc.

Similarly,

params = tf.constant([10,20,30,40])
ids = tf.constant([1,1,3])
print tf.nn.embedding_lookup(params,ids).eval()

would return [20 20 40].

But embedding_lookup is more than that. The params argument can be a list of tensors, rather than a single tensor.

params1 = tf.constant([1,2])
params2 = tf.constant([10,20])
ids = tf.constant([2,0,2,1,2,3])
result = tf.nn.embedding_lookup([params1, params2], ids)

In such a case, the indexes, specified in ids, correspond to elements of tensors according to a partition strategy, where the default partition strategy is ‘mod’.

In the ‘mod’ strategy, index 0 corresponds to the first element of the first tensor in the list. Index 1 corresponds to the first element of the second tensor. Index 2 corresponds to the first element of the third tensor, and so on. Simply index i corresponds to the first element of the (i+1)th tensor , for all the indexes 0..(n-1), assuming params is a list of n tensors.

Now, index n cannot correspond to tensor n+1, because the list params contains only n tensors. So index n corresponds to the second element of the first tensor. Similarly, index n+1 corresponds to the second element of the second tensor, etc.

So, in the code

params1 = tf.constant([1,2])
params2 = tf.constant([10,20])
ids = tf.constant([2,0,2,1,2,3])
result = tf.nn.embedding_lookup([params1, params2], ids)

index 0 corresponds to the first element of the first tensor: 1

index 1 corresponds to the first element of the second tensor: 10

index 2 corresponds to the second element of the first tensor: 2

index 3 corresponds to the second element of the second tensor: 20

Thus, the result would be:

[ 2  1  2 10  2 20]

回答 2

是的,该tf.nn.embedding_lookup()函数的目的是在嵌入矩阵中执行查找并返回单词的嵌入(或简单地说是矢量表示)。

一个简单的嵌入矩阵(形状vocabulary_size x embedding_dimension:)如下所示。(即每个单词将由一个数字向量表示;因此,名称为word2vec


嵌入矩阵

the 0.418 0.24968 -0.41242 0.1217 0.34527 -0.044457 -0.49688 -0.17862
like 0.36808 0.20834 -0.22319 0.046283 0.20098 0.27515 -0.77127 -0.76804
between 0.7503 0.71623 -0.27033 0.20059 -0.17008 0.68568 -0.061672 -0.054638
did 0.042523 -0.21172 0.044739 -0.19248 0.26224 0.0043991 -0.88195 0.55184
just 0.17698 0.065221 0.28548 -0.4243 0.7499 -0.14892 -0.66786 0.11788
national -1.1105 0.94945 -0.17078 0.93037 -0.2477 -0.70633 -0.8649 -0.56118
day 0.11626 0.53897 -0.39514 -0.26027 0.57706 -0.79198 -0.88374 0.30119
country -0.13531 0.15485 -0.07309 0.034013 -0.054457 -0.20541 -0.60086 -0.22407
under 0.13721 -0.295 -0.05916 -0.59235 0.02301 0.21884 -0.34254 -0.70213
such 0.61012 0.33512 -0.53499 0.36139 -0.39866 0.70627 -0.18699 -0.77246
second -0.29809 0.28069 0.087102 0.54455 0.70003 0.44778 -0.72565 0.62309 

我分裂上述嵌入基质并装载仅vocab,这将是我们的词汇并在相应的向量emb阵列。

vocab = ['the','like','between','did','just','national','day','country','under','such','second']

emb = np.array([[0.418, 0.24968, -0.41242, 0.1217, 0.34527, -0.044457, -0.49688, -0.17862],
   [0.36808, 0.20834, -0.22319, 0.046283, 0.20098, 0.27515, -0.77127, -0.76804],
   [0.7503, 0.71623, -0.27033, 0.20059, -0.17008, 0.68568, -0.061672, -0.054638],
   [0.042523, -0.21172, 0.044739, -0.19248, 0.26224, 0.0043991, -0.88195, 0.55184],
   [0.17698, 0.065221, 0.28548, -0.4243, 0.7499, -0.14892, -0.66786, 0.11788],
   [-1.1105, 0.94945, -0.17078, 0.93037, -0.2477, -0.70633, -0.8649, -0.56118],
   [0.11626, 0.53897, -0.39514, -0.26027, 0.57706, -0.79198, -0.88374, 0.30119],
   [-0.13531, 0.15485, -0.07309, 0.034013, -0.054457, -0.20541, -0.60086, -0.22407],
   [ 0.13721, -0.295, -0.05916, -0.59235, 0.02301, 0.21884, -0.34254, -0.70213],
   [ 0.61012, 0.33512, -0.53499, 0.36139, -0.39866, 0.70627, -0.18699, -0.77246 ],
   [ -0.29809, 0.28069, 0.087102, 0.54455, 0.70003, 0.44778, -0.72565, 0.62309 ]])


emb.shape
# (11, 8)

在TensorFlow中嵌入查找

现在,我们将看到如何对某些任意输入语句执行嵌入查找

In [54]: from collections import OrderedDict

# embedding as TF tensor (for now constant; could be tf.Variable() during training)
In [55]: tf_embedding = tf.constant(emb, dtype=tf.float32)

# input for which we need the embedding
In [56]: input_str = "like the country"

# build index based on our `vocabulary`
In [57]: word_to_idx = OrderedDict({w:vocab.index(w) for w in input_str.split() if w in vocab})

# lookup in embedding matrix & return the vectors for the input words
In [58]: tf.nn.embedding_lookup(tf_embedding, list(word_to_idx.values())).eval()
Out[58]: 
array([[ 0.36807999,  0.20834   , -0.22318999,  0.046283  ,  0.20097999,
         0.27515   , -0.77126998, -0.76804   ],
       [ 0.41800001,  0.24968   , -0.41242   ,  0.1217    ,  0.34527001,
        -0.044457  , -0.49687999, -0.17862   ],
       [-0.13530999,  0.15485001, -0.07309   ,  0.034013  , -0.054457  ,
        -0.20541   , -0.60086   , -0.22407   ]], dtype=float32)

注意我们是怎么得到的嵌入使用从我们原来的嵌入矩阵(文字)的话指数在我们的词汇。

通常,此类嵌入查找是由第一层(称为“ 嵌入层”)执行的,然后将这些嵌入传递到RNN / LSTM / GRU层以进行进一步处理。


旁注:通常,词汇表还将具有特殊unk标记。因此,如果词汇表中不存在来自我们输入句子的标记,则将unk在嵌入矩阵中查找与之相对应的索引。


PS注意,embedding_dimension是一个超参数是一个具有调整他们的应用程序,但受欢迎的车型,如Word2Vec手套使用300维向量表示每个字。

奖励阅读 word2vec跳过语法模型

Yes, the purpose of tf.nn.embedding_lookup() function is to perform a lookup in the embedding matrix and return the embeddings (or in simple terms the vector representation) of words.

A simple embedding matrix (of shape: vocabulary_size x embedding_dimension) would look like below. (i.e. each word will be represented by a vector of numbers; hence the name word2vec)


Embedding Matrix

the 0.418 0.24968 -0.41242 0.1217 0.34527 -0.044457 -0.49688 -0.17862
like 0.36808 0.20834 -0.22319 0.046283 0.20098 0.27515 -0.77127 -0.76804
between 0.7503 0.71623 -0.27033 0.20059 -0.17008 0.68568 -0.061672 -0.054638
did 0.042523 -0.21172 0.044739 -0.19248 0.26224 0.0043991 -0.88195 0.55184
just 0.17698 0.065221 0.28548 -0.4243 0.7499 -0.14892 -0.66786 0.11788
national -1.1105 0.94945 -0.17078 0.93037 -0.2477 -0.70633 -0.8649 -0.56118
day 0.11626 0.53897 -0.39514 -0.26027 0.57706 -0.79198 -0.88374 0.30119
country -0.13531 0.15485 -0.07309 0.034013 -0.054457 -0.20541 -0.60086 -0.22407
under 0.13721 -0.295 -0.05916 -0.59235 0.02301 0.21884 -0.34254 -0.70213
such 0.61012 0.33512 -0.53499 0.36139 -0.39866 0.70627 -0.18699 -0.77246
second -0.29809 0.28069 0.087102 0.54455 0.70003 0.44778 -0.72565 0.62309 

I split the above embedding matrix and loaded only the words in vocab which will be our vocabulary and the corresponding vectors in emb array.

vocab = ['the','like','between','did','just','national','day','country','under','such','second']

emb = np.array([[0.418, 0.24968, -0.41242, 0.1217, 0.34527, -0.044457, -0.49688, -0.17862],
   [0.36808, 0.20834, -0.22319, 0.046283, 0.20098, 0.27515, -0.77127, -0.76804],
   [0.7503, 0.71623, -0.27033, 0.20059, -0.17008, 0.68568, -0.061672, -0.054638],
   [0.042523, -0.21172, 0.044739, -0.19248, 0.26224, 0.0043991, -0.88195, 0.55184],
   [0.17698, 0.065221, 0.28548, -0.4243, 0.7499, -0.14892, -0.66786, 0.11788],
   [-1.1105, 0.94945, -0.17078, 0.93037, -0.2477, -0.70633, -0.8649, -0.56118],
   [0.11626, 0.53897, -0.39514, -0.26027, 0.57706, -0.79198, -0.88374, 0.30119],
   [-0.13531, 0.15485, -0.07309, 0.034013, -0.054457, -0.20541, -0.60086, -0.22407],
   [ 0.13721, -0.295, -0.05916, -0.59235, 0.02301, 0.21884, -0.34254, -0.70213],
   [ 0.61012, 0.33512, -0.53499, 0.36139, -0.39866, 0.70627, -0.18699, -0.77246 ],
   [ -0.29809, 0.28069, 0.087102, 0.54455, 0.70003, 0.44778, -0.72565, 0.62309 ]])


emb.shape
# (11, 8)

Embedding Lookup in TensorFlow

Now we will see how can we perform embedding lookup for some arbitrary input sentence.

In [54]: from collections import OrderedDict

# embedding as TF tensor (for now constant; could be tf.Variable() during training)
In [55]: tf_embedding = tf.constant(emb, dtype=tf.float32)

# input for which we need the embedding
In [56]: input_str = "like the country"

# build index based on our `vocabulary`
In [57]: word_to_idx = OrderedDict({w:vocab.index(w) for w in input_str.split() if w in vocab})

# lookup in embedding matrix & return the vectors for the input words
In [58]: tf.nn.embedding_lookup(tf_embedding, list(word_to_idx.values())).eval()
Out[58]: 
array([[ 0.36807999,  0.20834   , -0.22318999,  0.046283  ,  0.20097999,
         0.27515   , -0.77126998, -0.76804   ],
       [ 0.41800001,  0.24968   , -0.41242   ,  0.1217    ,  0.34527001,
        -0.044457  , -0.49687999, -0.17862   ],
       [-0.13530999,  0.15485001, -0.07309   ,  0.034013  , -0.054457  ,
        -0.20541   , -0.60086   , -0.22407   ]], dtype=float32)

Observe how we got the embeddings from our original embedding matrix (with words) using the indices of words in our vocabulary.

Usually, such an embedding lookup is performed by the first layer (called Embedding layer) which then passes these embeddings to RNN/LSTM/GRU layers for further processing.


Side Note: Usually the vocabulary will also have a special unk token. So, if a token from our input sentence is not present in our vocabulary, then the index corresponding to unk will be looked up in the embedding matrix.


P.S. Note that embedding_dimension is a hyperparameter that one has to tune for their application but popular models like Word2Vec and GloVe uses 300 dimension vector for representing each word.

Bonus Reading word2vec skip-gram model


回答 3

这是描述嵌入查找过程的图像。



简而言之,它获取由ID列表指定的嵌入层的相应行,并将其作为张量提供。它是通过以下过程实现的。

  1. 定义一个占位符 lookup_ids = tf.placeholder([10])
  2. 定义嵌入层 embeddings = tf.Variable([100,10],...)
  3. 定义张量流操作 embed_lookup = tf.embedding_lookup(embeddings, lookup_ids)
  4. 通过运行获取结果 lookup = session.run(embed_lookup, feed_dict={lookup_ids:[95,4,14]})

Here’s an image depicting the process of embedding lookup.



Concisely, it gets the corresponding rows of a embedding layer, specified by a list of IDs and provide that as a tensor. It is achieved through the following process.

  1. Define a placeholder lookup_ids = tf.placeholder([10])
  2. Define a embedding layer embeddings = tf.Variable([100,10],...)
  3. Define the tensorflow operation embed_lookup = tf.embedding_lookup(embeddings, lookup_ids)
  4. Get the results by running lookup = session.run(embed_lookup, feed_dict={lookup_ids:[95,4,14]})

回答 4

当参数张量为高维时,id仅指最大维。也许对大多数人来说这很明显,但是我必须运行以下代码来理解这一点:

embeddings = tf.constant([[[1,1],[2,2],[3,3],[4,4]],[[11,11],[12,12],[13,13],[14,14]],
                          [[21,21],[22,22],[23,23],[24,24]]])
ids=tf.constant([0,2,1])
embed = tf.nn.embedding_lookup(embeddings, ids, partition_strategy='div')

with tf.Session() as session:
    result = session.run(embed)
    print (result)

只是尝试“ div”策略,对于一个张量,这没有什么区别。

这是输出:

[[[ 1  1]
  [ 2  2]
  [ 3  3]
  [ 4  4]]

 [[21 21]
  [22 22]
  [23 23]
  [24 24]]

 [[11 11]
  [12 12]
  [13 13]
  [14 14]]]

When the params tensor is in high dimensions, the ids only refers to top dimension. Maybe it’s obvious to most of people but I have to run the following code to understand that:

embeddings = tf.constant([[[1,1],[2,2],[3,3],[4,4]],[[11,11],[12,12],[13,13],[14,14]],
                          [[21,21],[22,22],[23,23],[24,24]]])
ids=tf.constant([0,2,1])
embed = tf.nn.embedding_lookup(embeddings, ids, partition_strategy='div')

with tf.Session() as session:
    result = session.run(embed)
    print (result)

Just trying the ‘div’ strategy and for one tensor, it makes no difference.

Here is the output:

[[[ 1  1]
  [ 2  2]
  [ 3  3]
  [ 4  4]]

 [[21 21]
  [22 22]
  [23 23]
  [24 24]]

 [[11 11]
  [12 12]
  [13 13]
  [14 14]]]

回答 5

另一种查看方式是,假设您将张量展平为一维数组,然后执行查找。

(例如)Tensor0 = [1,2,3],Tensor1 = [4,5,6],Tensor2 = [7,8,9]

展平的张量将如下[1,4,7,2,5,8,3,6,9]

现在,当您执行[0,3,4,1,7]的查找时,将会产生[1,2,5,4,6]

(i,e)例如,如果lookup值为7,而我们有3个张量(或具有3行的张量),

7/3 :(提醒为1,商为2)因此将显示Tensor1的第二个元素,即6

Another way to look at it is , assume that you flatten out the tensors to one dimensional array, and then you are performing a lookup

(eg) Tensor0=[1,2,3], Tensor1=[4,5,6], Tensor2=[7,8,9]

The flattened out tensor will be as follows [1,4,7,2,5,8,3,6,9]

Now when you do a lookup of [0,3,4,1,7] it will yeild [1,2,5,4,6]

(i,e) if lookup value is 7 for example , and we have 3 tensors (or a tensor with 3 rows) then,

7 / 3 : (Reminder is 1, Quotient is 2) So 2nd element of Tensor1 will be shown, which is 6


回答 6

由于我也对此功能感兴趣,因此我将给我两分钱。

我在2D情况下看到它的方式就像矩阵乘法(很容易推广到其他维度)。

考虑一个带有N个符号的词汇表。然后,您可以将符号x表示为尺寸为Nx1的矢量,并进行一次热编码。

但是,您不希望将此符号表示为Nx1的矢量,而是表示为尺寸为Mx1的y

因此,要将x转换为y,可以使用和嵌入尺寸为MxN的矩阵E

y = E x

本质上,这就是tf.nn.embedding_lookup(params,ids,…)所做的事情,细微的差别是ids只是一个数字,代表1在热编码矢量x中的位置1 。

Since I was also intrigued by this function, I’ll give my two cents.

The way I see it in the 2D case is just as a matrix multiplication (it’s easy to generalize to other dimensions).

Consider a vocabulary with N symbols. Then, you can represent a symbol x as a vector of dimensions Nx1, one-hot-encoded.

But you want a representation of this symbol not as a vector of Nx1, but as one with dimensions Mx1, called y.

So, to transform x into y, you can use and embedding matrix E, with dimensions MxN:

y = E x.

This is essentially what tf.nn.embedding_lookup(params, ids, …) is doing, with the nuance that ids are just one number that represents the position of the 1 in the one-hot-encoded vector x.


回答 7

添加到Asher Stern的答案中, params被解释为大嵌入张量的划分。它可以是表示完整嵌入张量的单个张量,也可以是X形张量的列表,除了第一维以外,它们均具有相同的形状,表示分片嵌入张量。

tf.nn.embedding_lookup考虑到嵌入(参数)会很大这一事实来编写函数。因此我们需要partition_strategy

Adding to Asher Stern’s answer, params is interpreted as a partitioning of a large embedding tensor. It can be a single tensor representing the complete embedding tensor, or a list of X tensors all of same shape except for the first dimension, representing sharded embedding tensors.

The function tf.nn.embedding_lookup is written considering the fact that embedding (params) will be large. Therefore we need partition_strategy.


Keras,如何获得每一层的输出?

问题:Keras,如何获得每一层的输出?

我已经使用CNN训练了二进制分类模型,这是我的代码

model = Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
                        border_mode='valid',
                        input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
# (16, 16, 32)
model.add(Convolution2D(nb_filters*2, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters*2, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
# (8, 8, 64) = (2048)
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(2))  # define a binary classification problem
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy'])
model.fit(x_train, y_train,
          batch_size=batch_size,
          nb_epoch=nb_epoch,
          verbose=1,
          validation_data=(x_test, y_test))

在这里,我想像TensorFlow一样获得每一层的输出,我该怎么做?

I have trained a binary classification model with CNN, and here is my code

model = Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
                        border_mode='valid',
                        input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
# (16, 16, 32)
model.add(Convolution2D(nb_filters*2, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters*2, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
# (8, 8, 64) = (2048)
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(2))  # define a binary classification problem
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy'])
model.fit(x_train, y_train,
          batch_size=batch_size,
          nb_epoch=nb_epoch,
          verbose=1,
          validation_data=(x_test, y_test))

And here, I wanna get the output of each layer just like TensorFlow, how can I do that?


回答 0

您可以使用以下命令轻松获取任何图层的输出: model.layers[index].output

对于所有图层,请使用以下命令:

from keras import backend as K

inp = model.input                                           # input placeholder
outputs = [layer.output for layer in model.layers]          # all layer outputs
functors = [K.function([inp, K.learning_phase()], [out]) for out in outputs]    # evaluation functions

# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = [func([test, 1.]) for func in functors]
print layer_outs

注:为了模拟差使用learning_phase1.layer_outs以其它方式使用0.

编辑:(基于评论)

K.function 创建theano / tensorflow张量函数,该函数随后用于从给定输入的符号图中获取输出。

现在K.learning_phase()需要作为输入,因为很多Keras层(如Dropout / Batchnomalization)都依赖于它,以在训练和测试期间更改行为。

因此,如果您删除代码中的辍学层,则可以简单地使用:

from keras import backend as K

inp = model.input                                           # input placeholder
outputs = [layer.output for layer in model.layers]          # all layer outputs
functors = [K.function([inp], [out]) for out in outputs]    # evaluation functions

# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = [func([test]) for func in functors]
print layer_outs

编辑2:更优化

我只是意识到,先前的答案并不是针对每个函数评估进行了优化,因为数据将被传输到CPU-> GPU内存中,并且还需要对低层进行n-n-over的张量计算。

相反,这是一种更好的方法,因为您不需要多个函数,而只需一个函数即可为您提供所有输出的列表:

from keras import backend as K

inp = model.input                                           # input placeholder
outputs = [layer.output for layer in model.layers]          # all layer outputs
functor = K.function([inp, K.learning_phase()], outputs )   # evaluation function

# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = functor([test, 1.])
print layer_outs

You can easily get the outputs of any layer by using: model.layers[index].output

For all layers use this:

from keras import backend as K

inp = model.input                                           # input placeholder
outputs = [layer.output for layer in model.layers]          # all layer outputs
functors = [K.function([inp, K.learning_phase()], [out]) for out in outputs]    # evaluation functions

# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = [func([test, 1.]) for func in functors]
print layer_outs

Note: To simulate Dropout use learning_phase as 1. in layer_outs otherwise use 0.

Edit: (based on comments)

K.function creates theano/tensorflow tensor functions which is later used to get the output from the symbolic graph given the input.

Now K.learning_phase() is required as an input as many Keras layers like Dropout/Batchnomalization depend on it to change behavior during training and test time.

So if you remove the dropout layer in your code you can simply use:

from keras import backend as K

inp = model.input                                           # input placeholder
outputs = [layer.output for layer in model.layers]          # all layer outputs
functors = [K.function([inp], [out]) for out in outputs]    # evaluation functions

# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = [func([test]) for func in functors]
print layer_outs

Edit 2: More optimized

I just realized that the previous answer is not that optimized as for each function evaluation the data will be transferred CPU->GPU memory and also the tensor calculations needs to be done for the lower layers over-n-over.

Instead this is a much better way as you don’t need multiple functions but a single function giving you the list of all outputs:

from keras import backend as K

inp = model.input                                           # input placeholder
outputs = [layer.output for layer in model.layers]          # all layer outputs
functor = K.function([inp, K.learning_phase()], outputs )   # evaluation function

# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = functor([test, 1.])
print layer_outs

回答 1

https://keras.io/getting-started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer

一种简单的方法是创建一个新模型,该模型将输出您感兴趣的图层:

from keras.models import Model

model = ...  # include here your original model

layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
                                 outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(data)

或者,您可以构建Keras函数,该函数将在给定特定输入的情况下返回特定图层的输出,例如:

from keras import backend as K

# with a Sequential model
get_3rd_layer_output = K.function([model.layers[0].input],
                                  [model.layers[3].output])
layer_output = get_3rd_layer_output([x])[0]

From https://keras.io/getting-started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer

One simple way is to create a new Model that will output the layers that you are interested in:

from keras.models import Model

model = ...  # include here your original model

layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
                                 outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(data)

Alternatively, you can build a Keras function that will return the output of a certain layer given a certain input, for example:

from keras import backend as K

# with a Sequential model
get_3rd_layer_output = K.function([model.layers[0].input],
                                  [model.layers[3].output])
layer_output = get_3rd_layer_output([x])[0]

回答 2

基于此线程的所有良好答案,我编写了一个库来获取每一层的输出。它抽象了所有复杂性,并被设计为尽可能易于使用:

https://github.com/philipperemy/keract

它处理几乎所有边缘情况

希望能帮助到你!

Based on all the good answers of this thread, I wrote a library to fetch the output of each layer. It abstracts all the complexity and has been designed to be as user-friendly as possible:

https://github.com/philipperemy/keract

It handles almost all the edge cases

Hope it helps!


回答 3

以下对我来说看起来很简单:

model.layers[idx].output

上面是张量对象,因此您可以使用可应用于张量对象的操作对其进行修改。

例如,获得形状 model.layers[idx].output.get_shape()

idx 是图层的索引,您可以从中找到它 model.summary()

Following looks very simple to me:

model.layers[idx].output

Above is a tensor object, so you can modify it using operations that can be applied to a tensor object.

For example, to get the shape model.layers[idx].output.get_shape()

idx is the index of the layer and you can find it from model.summary()


回答 4

我为自己编写了此函数(在Jupyter中),它的灵感来自indraforyou的回答。它将自动绘制所有图层输出。您的图像必须具有(x,y,1)形状,其中1代表1个通道。您只需调用plot_layer_outputs(…)即可进行绘制。

%matplotlib inline
import matplotlib.pyplot as plt
from keras import backend as K

def get_layer_outputs():
    test_image = YOUR IMAGE GOES HERE!!!
    outputs    = [layer.output for layer in model.layers]          # all layer outputs
    comp_graph = [K.function([model.input]+ [K.learning_phase()], [output]) for output in outputs]  # evaluation functions

    # Testing
    layer_outputs_list = [op([test_image, 1.]) for op in comp_graph]
    layer_outputs = []

    for layer_output in layer_outputs_list:
        print(layer_output[0][0].shape, end='\n-------------------\n')
        layer_outputs.append(layer_output[0][0])

    return layer_outputs

def plot_layer_outputs(layer_number):    
    layer_outputs = get_layer_outputs()

    x_max = layer_outputs[layer_number].shape[0]
    y_max = layer_outputs[layer_number].shape[1]
    n     = layer_outputs[layer_number].shape[2]

    L = []
    for i in range(n):
        L.append(np.zeros((x_max, y_max)))

    for i in range(n):
        for x in range(x_max):
            for y in range(y_max):
                L[i][x][y] = layer_outputs[layer_number][x][y][i]


    for img in L:
        plt.figure()
        plt.imshow(img, interpolation='nearest')

I wrote this function for myself (in Jupyter) and it was inspired by indraforyou‘s answer. It will plot all the layer outputs automatically. Your images must have a (x, y, 1) shape where 1 stands for 1 channel. You just call plot_layer_outputs(…) to plot.

%matplotlib inline
import matplotlib.pyplot as plt
from keras import backend as K

def get_layer_outputs():
    test_image = YOUR IMAGE GOES HERE!!!
    outputs    = [layer.output for layer in model.layers]          # all layer outputs
    comp_graph = [K.function([model.input]+ [K.learning_phase()], [output]) for output in outputs]  # evaluation functions

    # Testing
    layer_outputs_list = [op([test_image, 1.]) for op in comp_graph]
    layer_outputs = []

    for layer_output in layer_outputs_list:
        print(layer_output[0][0].shape, end='\n-------------------\n')
        layer_outputs.append(layer_output[0][0])

    return layer_outputs

def plot_layer_outputs(layer_number):    
    layer_outputs = get_layer_outputs()

    x_max = layer_outputs[layer_number].shape[0]
    y_max = layer_outputs[layer_number].shape[1]
    n     = layer_outputs[layer_number].shape[2]

    L = []
    for i in range(n):
        L.append(np.zeros((x_max, y_max)))

    for i in range(n):
        for x in range(x_max):
            for y in range(y_max):
                L[i][x][y] = layer_outputs[layer_number][x][y][i]


    for img in L:
        plt.figure()
        plt.imshow(img, interpolation='nearest')

回答 5

来自:https : //github.com/philipperemy/keras-visualize-activations/blob/master/read_activations.py

import keras.backend as K

def get_activations(model, model_inputs, print_shape_only=False, layer_name=None):
    print('----- activations -----')
    activations = []
    inp = model.input

    model_multi_inputs_cond = True
    if not isinstance(inp, list):
        # only one input! let's wrap it in a list.
        inp = [inp]
        model_multi_inputs_cond = False

    outputs = [layer.output for layer in model.layers if
               layer.name == layer_name or layer_name is None]  # all layer outputs

    funcs = [K.function(inp + [K.learning_phase()], [out]) for out in outputs]  # evaluation functions

    if model_multi_inputs_cond:
        list_inputs = []
        list_inputs.extend(model_inputs)
        list_inputs.append(0.)
    else:
        list_inputs = [model_inputs, 0.]

    # Learning phase. 0 = Test mode (no dropout or batch normalization)
    # layer_outputs = [func([model_inputs, 0.])[0] for func in funcs]
    layer_outputs = [func(list_inputs)[0] for func in funcs]
    for layer_activations in layer_outputs:
        activations.append(layer_activations)
        if print_shape_only:
            print(layer_activations.shape)
        else:
            print(layer_activations)
    return activations

From: https://github.com/philipperemy/keras-visualize-activations/blob/master/read_activations.py

import keras.backend as K

def get_activations(model, model_inputs, print_shape_only=False, layer_name=None):
    print('----- activations -----')
    activations = []
    inp = model.input

    model_multi_inputs_cond = True
    if not isinstance(inp, list):
        # only one input! let's wrap it in a list.
        inp = [inp]
        model_multi_inputs_cond = False

    outputs = [layer.output for layer in model.layers if
               layer.name == layer_name or layer_name is None]  # all layer outputs

    funcs = [K.function(inp + [K.learning_phase()], [out]) for out in outputs]  # evaluation functions

    if model_multi_inputs_cond:
        list_inputs = []
        list_inputs.extend(model_inputs)
        list_inputs.append(0.)
    else:
        list_inputs = [model_inputs, 0.]

    # Learning phase. 0 = Test mode (no dropout or batch normalization)
    # layer_outputs = [func([model_inputs, 0.])[0] for func in funcs]
    layer_outputs = [func(list_inputs)[0] for func in funcs]
    for layer_activations in layer_outputs:
        activations.append(layer_activations)
        if print_shape_only:
            print(layer_activations.shape)
        else:
            print(layer_activations)
    return activations

回答 6

想要将其添加为@indraforyou的答案作为注释(但没有足够高的声望)以纠正@mathtick的注释中提到的问题。为了避免InvalidArgumentError: input_X:Y is both fed and fetched.异常,只需更换行outputs = [layer.output for layer in model.layers]outputs = [layer.output for layer in model.layers][1:],即

调整indraforyou的最小工作示例:

from keras import backend as K 
inp = model.input                                           # input placeholder
outputs = [layer.output for layer in model.layers][1:]        # all layer outputs except first (input) layer
functor = K.function([inp, K.learning_phase()], outputs )   # evaluation function

# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = functor([test, 1.])
print layer_outs

PS我尝试的东西,如尝试outputs = [layer.output for layer in model.layers[1:]]不起作用。

Wanted to add this as a comment (but don’t have high enough rep.) to @indraforyou’s answer to correct for the issue mentioned in @mathtick’s comment. To avoid the InvalidArgumentError: input_X:Y is both fed and fetched. exception, simply replace the line outputs = [layer.output for layer in model.layers] with outputs = [layer.output for layer in model.layers][1:], i.e.

adapting indraforyou’s minimal working example:

from keras import backend as K 
inp = model.input                                           # input placeholder
outputs = [layer.output for layer in model.layers][1:]        # all layer outputs except first (input) layer
functor = K.function([inp, K.learning_phase()], outputs )   # evaluation function

# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = functor([test, 1.])
print layer_outs

p.s. my attempts trying things such as outputs = [layer.output for layer in model.layers[1:]] did not work.


回答 7

假设您有:

1- Keras训练有素model

2-输入x为图像或图像集。图像的分辨率应与输入层的尺寸兼容。例如对于3通道(RGB)图像为80 * 80 * 3

3- layer要激活的输出的名称。例如,“ flatten_2”层。这应该包含在layer_names变量中,代表给定层的名称model

4- batch_size是可选参数。

然后,您可以轻松地使用get_activation函数来获得layer给定输入x和预训练的输出激活model

import six
import numpy as np
import keras.backend as k
from numpy import float32
def get_activations(x, model, layer, batch_size=128):
"""
Return the output of the specified layer for input `x`. `layer` is specified by layer index (between 0 and
`nb_layers - 1`) or by name. The number of layers can be determined by counting the results returned by
calling `layer_names`.
:param x: Input for computing the activations.
:type x: `np.ndarray`. Example: x.shape = (80, 80, 3)
:param model: pre-trained Keras model. Including weights.
:type model: keras.engine.sequential.Sequential. Example: model.input_shape = (None, 80, 80, 3)
:param layer: Layer for computing the activations
:type layer: `int` or `str`. Example: layer = 'flatten_2'
:param batch_size: Size of batches.
:type batch_size: `int`
:return: The output of `layer`, where the first dimension is the batch size corresponding to `x`.
:rtype: `np.ndarray`. Example: activations.shape = (1, 2000)
"""

    layer_names = [layer.name for layer in model.layers]
    if isinstance(layer, six.string_types):
        if layer not in layer_names:
            raise ValueError('Layer name %s is not part of the graph.' % layer)
        layer_name = layer
    elif isinstance(layer, int):
        if layer < 0 or layer >= len(layer_names):
            raise ValueError('Layer index %d is outside of range (0 to %d included).'
                             % (layer, len(layer_names) - 1))
        layer_name = layer_names[layer]
    else:
        raise TypeError('Layer must be of type `str` or `int`.')

    layer_output = model.get_layer(layer_name).output
    layer_input = model.input
    output_func = k.function([layer_input], [layer_output])

    # Apply preprocessing
    if x.shape == k.int_shape(model.input)[1:]:
        x_preproc = np.expand_dims(x, 0)
    else:
        x_preproc = x
    assert len(x_preproc.shape) == 4

    # Determine shape of expected output and prepare array
    output_shape = output_func([x_preproc[0][None, ...]])[0].shape
    activations = np.zeros((x_preproc.shape[0],) + output_shape[1:], dtype=float32)

    # Get activations with batching
    for batch_index in range(int(np.ceil(x_preproc.shape[0] / float(batch_size)))):
        begin, end = batch_index * batch_size, min((batch_index + 1) * batch_size, x_preproc.shape[0])
        activations[begin:end] = output_func([x_preproc[begin:end]])[0]

    return activations

Assuming you have:

1- Keras pre-trained model.

2- Input x as image or set of images. The resolution of image should be compatible with dimension of the input layer. For example 80*80*3 for 3-channels (RGB) image.

3- The name of the output layer to get the activation. For example, “flatten_2” layer. This should be include in the layer_names variable, represents name of layers of the given model.

4- batch_size is an optional argument.

Then you can easily use get_activation function to get the activation of the output layer for a given input x and pre-trained model:

import six
import numpy as np
import keras.backend as k
from numpy import float32
def get_activations(x, model, layer, batch_size=128):
"""
Return the output of the specified layer for input `x`. `layer` is specified by layer index (between 0 and
`nb_layers - 1`) or by name. The number of layers can be determined by counting the results returned by
calling `layer_names`.
:param x: Input for computing the activations.
:type x: `np.ndarray`. Example: x.shape = (80, 80, 3)
:param model: pre-trained Keras model. Including weights.
:type model: keras.engine.sequential.Sequential. Example: model.input_shape = (None, 80, 80, 3)
:param layer: Layer for computing the activations
:type layer: `int` or `str`. Example: layer = 'flatten_2'
:param batch_size: Size of batches.
:type batch_size: `int`
:return: The output of `layer`, where the first dimension is the batch size corresponding to `x`.
:rtype: `np.ndarray`. Example: activations.shape = (1, 2000)
"""

    layer_names = [layer.name for layer in model.layers]
    if isinstance(layer, six.string_types):
        if layer not in layer_names:
            raise ValueError('Layer name %s is not part of the graph.' % layer)
        layer_name = layer
    elif isinstance(layer, int):
        if layer < 0 or layer >= len(layer_names):
            raise ValueError('Layer index %d is outside of range (0 to %d included).'
                             % (layer, len(layer_names) - 1))
        layer_name = layer_names[layer]
    else:
        raise TypeError('Layer must be of type `str` or `int`.')

    layer_output = model.get_layer(layer_name).output
    layer_input = model.input
    output_func = k.function([layer_input], [layer_output])

    # Apply preprocessing
    if x.shape == k.int_shape(model.input)[1:]:
        x_preproc = np.expand_dims(x, 0)
    else:
        x_preproc = x
    assert len(x_preproc.shape) == 4

    # Determine shape of expected output and prepare array
    output_shape = output_func([x_preproc[0][None, ...]])[0].shape
    activations = np.zeros((x_preproc.shape[0],) + output_shape[1:], dtype=float32)

    # Get activations with batching
    for batch_index in range(int(np.ceil(x_preproc.shape[0] / float(batch_size)))):
        begin, end = batch_index * batch_size, min((batch_index + 1) * batch_size, x_preproc.shape[0])
        activations[begin:end] = output_func([x_preproc[begin:end]])[0]

    return activations

回答 8

如果您具有以下情况之一:

  • 错误: InvalidArgumentError: input_X:Y is both fed and fetched
  • 多输入的情况

您需要进行以下更改:

  • outputs变量中的输入层添加过滤器
  • 最小functors循环变化

最小示例:

from keras.engine.input_layer import InputLayer
inp = model.input
outputs = [layer.output for layer in model.layers if not isinstance(layer, InputLayer)]
functors = [K.function(inp + [K.learning_phase()], [x]) for x in outputs]
layer_outputs = [fun([x1, x2, xn, 1]) for fun in functors]

In case you have one of the following cases:

  • error: InvalidArgumentError: input_X:Y is both fed and fetched
  • case of multiple inputs

You need to do the following changes:

  • add filter out for input layers in outputs variable
  • minnor change on functors loop

Minimum example:

from keras.engine.input_layer import InputLayer
inp = model.input
outputs = [layer.output for layer in model.layers if not isinstance(layer, InputLayer)]
functors = [K.function(inp + [K.learning_phase()], [x]) for x in outputs]
layer_outputs = [fun([x1, x2, xn, 1]) for fun in functors]

回答 9

好吧,其他答案也很完整,但是有一种非常基本的方法可以“看到”形状,而不是“获得”形状。

只是做一个model.summary()。它将打印所有图层及其输出形状。“无”值将指示可变尺寸,而第一维将是批量大小。

Well, other answers are very complete, but there is a very basic way to “see”, not to “get” the shapes.

Just do a model.summary(). It will print all layers and their output shapes. “None” values will indicate variable dimensions, and the first dimension will be the batch size.