标签归档:lstm

如何在TensorFlow中应用梯度裁剪?

问题:如何在TensorFlow中应用梯度裁剪?

考虑示例代码

我想知道如何在RNN上的该网络上应用梯度剪切,而梯度可能会爆炸。

tf.clip_by_value(t, clip_value_min, clip_value_max, name=None)

这是一个可以使用的示例,但是在哪里介绍呢?在RNN中

    lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
    # Split data because rnn cell needs a list of inputs for the RNN inner loop
    _X = tf.split(0, n_steps, _X) # n_steps
tf.clip_by_value(_X, -1, 1, name=None)

但这没有意义,因为张量_X是输入,而不是grad,要裁剪的内容是什么?

我是否需要为此定义自己的优化器,还是有一个更简单的选择?

Considering the example code.

I would like to know How to apply gradient clipping on this network on the RNN where there is a possibility of exploding gradients.

tf.clip_by_value(t, clip_value_min, clip_value_max, name=None)

This is an example that could be used but where do I introduce this ? In the def of RNN

    lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
    # Split data because rnn cell needs a list of inputs for the RNN inner loop
    _X = tf.split(0, n_steps, _X) # n_steps
tf.clip_by_value(_X, -1, 1, name=None)

But this doesn’t make sense as the tensor _X is the input and not the grad what is to be clipped?

Do I have to define my own Optimizer for this or is there a simpler option?


回答 0

在计算梯度之后,但在应用梯度更新模型参数之前,需要进行梯度修剪。在您的示例中,这两种AdamOptimizer.minimize()方法均由该方法处理。

为了裁剪您的渐变,您需要按照TensorFlow API文档本节中的说明显式计算,裁剪和应用它们。具体来说,您需要minimize()用以下类似的方法代替对方法的调用:

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
gvs = optimizer.compute_gradients(cost)
capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
train_op = optimizer.apply_gradients(capped_gvs)

Gradient clipping needs to happen after computing the gradients, but before applying them to update the model’s parameters. In your example, both of those things are handled by the AdamOptimizer.minimize() method.

In order to clip your gradients you’ll need to explicitly compute, clip, and apply them as described in this section in TensorFlow’s API documentation. Specifically you’ll need to substitute the call to the minimize() method with something like the following:

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
gvs = optimizer.compute_gradients(cost)
capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
train_op = optimizer.apply_gradients(capped_gvs)

回答 1

尽管看起来很流行,但您可能希望通过其全局范数来裁剪整个渐变:

optimizer = tf.train.AdamOptimizer(1e-3)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
optimize = optimizer.apply_gradients(zip(gradients, variables))

分别裁剪每个渐变矩阵会更改其相对比例,但是也可以:

optimizer = tf.train.AdamOptimizer(1e-3)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients = [
    None if gradient is None else tf.clip_by_norm(gradient, 5.0)
    for gradient in gradients]
optimize = optimizer.apply_gradients(zip(gradients, variables))

在TensorFlow 2中,磁带计算梯度,优化器来自Keras,我们不需要存储更新操作,因为它会自动运行而不将其传递给会话:

optimizer = tf.keras.optimizers.Adam(1e-3)
# ...
with tf.GradientTape() as tape:
  loss = ...
variables = ...
gradients = tape.gradient(loss, variables)
gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
optimizer.apply_gradients(zip(gradients, variables))

Despite what seems to be popular, you probably want to clip the whole gradient by its global norm:

optimizer = tf.train.AdamOptimizer(1e-3)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
optimize = optimizer.apply_gradients(zip(gradients, variables))

Clipping each gradient matrix individually changes their relative scale but is also possible:

optimizer = tf.train.AdamOptimizer(1e-3)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients = [
    None if gradient is None else tf.clip_by_norm(gradient, 5.0)
    for gradient in gradients]
optimize = optimizer.apply_gradients(zip(gradients, variables))

In TensorFlow 2, a tape computes the gradients, the optimizers come from Keras, and we don’t need to store the update op because it runs automatically without passing it to a session:

optimizer = tf.keras.optimizers.Adam(1e-3)
# ...
with tf.GradientTape() as tape:
  loss = ...
variables = ...
gradients = tape.gradient(loss, variables)
gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
optimizer.apply_gradients(zip(gradients, variables))

回答 2

实际上在文档中对此做了正确解释。

调用minimum()既要计算梯度,又要将其应用于变量。如果要在应用渐变之前对其进行处理,则可以分三步使用优化器:

  • 使用compute_gradients()计算梯度。
  • 根据需要处理渐变。
  • 使用apply_gradients()应用处理后的渐变。

在他们提供的示例中,他们使用以下3个步骤:

# Create an optimizer.
opt = GradientDescentOptimizer(learning_rate=0.1)

# Compute the gradients for a list of variables.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)

# grads_and_vars is a list of tuples (gradient, variable).  Do whatever you
# need to the 'gradient' part, for example cap them, etc.
capped_grads_and_vars = [(MyCapper(gv[0]), gv[1]) for gv in grads_and_vars]

# Ask the optimizer to apply the capped gradients.
opt.apply_gradients(capped_grads_and_vars)

MyCapper是限制渐变的任何函数。有用的功能列表(除外tf.clip_by_value())在此处

This is actually properly explained in the documentation.:

Calling minimize() takes care of both computing the gradients and applying them to the variables. If you want to process the gradients before applying them you can instead use the optimizer in three steps:

  • Compute the gradients with compute_gradients().
  • Process the gradients as you wish.
  • Apply the processed gradients with apply_gradients().

And in the example they provide they use these 3 steps:

# Create an optimizer.
opt = GradientDescentOptimizer(learning_rate=0.1)

# Compute the gradients for a list of variables.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)

# grads_and_vars is a list of tuples (gradient, variable).  Do whatever you
# need to the 'gradient' part, for example cap them, etc.
capped_grads_and_vars = [(MyCapper(gv[0]), gv[1]) for gv in grads_and_vars]

# Ask the optimizer to apply the capped gradients.
opt.apply_gradients(capped_grads_and_vars)

Here MyCapper is any function that caps your gradient. The list of useful functions (other than tf.clip_by_value()) is here.


回答 3

对于那些想了解梯度裁剪的想法(按规范)的人:

每当梯度范数大于特定阈值时,我们都会修剪梯度范数,以使其保持在阈值之内。此阈值有时设置为5

令梯度为g,max_norm_threshold为j

现在,如果|| g || > j,我们这样做:

g =( j * g)/ || G ||

这是在 tf.clip_by_norm

For those who would like to understand the idea of gradient clipping (by norm):

Whenever the gradient norm is greater than a particular threshold, we clip the gradient norm so that it stays within the threshold. This threshold is sometimes set to 5.

Let the gradient be g and the max_norm_threshold be j.

Now, if ||g|| > j , we do:

g = ( j * g ) / ||g||

This is the implementation done in tf.clip_by_norm


回答 4

IMO最好的解决方案是用TF的估算器装饰器包装优化器tf.contrib.estimator.clip_gradients_by_norm

original_optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
optimizer = tf.contrib.estimator.clip_gradients_by_norm(original_optimizer, clip_norm=5.0)
train_op = optimizer.minimize(loss)

这样,您只需要定义一次,而不必在每次梯度计算后运行它。

文档:https : //www.tensorflow.org/api_docs/python/tf/contrib/estimator/clip_gradients_by_norm

IMO the best solution is wrapping your optimizer with TF’s estimator decorator tf.contrib.estimator.clip_gradients_by_norm:

original_optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
optimizer = tf.contrib.estimator.clip_gradients_by_norm(original_optimizer, clip_norm=5.0)
train_op = optimizer.minimize(loss)

This way you only have to define this once, and not run it after every gradients calculation.

Documentation: https://www.tensorflow.org/api_docs/python/tf/contrib/estimator/clip_gradients_by_norm


回答 5

梯度修剪基本上可以在梯度爆炸或消失的情况下起到帮助作用。说您的损失太高,将会导致指数梯度流经网络,可能导致Nan值。为了克服这个问题,我们将梯度裁剪在特定范围内(-1到1或根据条件的任何范围)。

clipped_value=tf.clip_by_value(grad, -range, +range), var) for grad, var in grads_and_vars

其中grads _and_vars是渐变对(您可以通过tf.compute_gradients计算)及其变量。

裁剪后,我们只需使用优化器即可应用其值。 optimizer.apply_gradients(clipped_value)

Gradient Clipping basically helps in case of exploding or vanishing gradients.Say your loss is too high which will result in exponential gradients to flow through the network which may result in Nan values . To overcome this we clip gradients within a specific range (-1 to 1 or any range as per condition) .

clipped_value=tf.clip_by_value(grad, -range, +range), var) for grad, var in grads_and_vars

where grads _and_vars are the pairs of gradients (which you calculate via tf.compute_gradients) and their variables they will be applied to.

After clipping we simply apply its value using an optimizer. optimizer.apply_gradients(clipped_value)


了解Keras LSTM

问题:了解Keras LSTM

我试图调和对LSTM的理解,并在克里斯托弗·奥拉(Christopher Olah)在Keras中实现的这篇文章中指出了这一点。我正在关注Jason Brownlee为Keras教程撰写博客。我主要感到困惑的是

  1. 将数据系列重塑为 [samples, time steps, features]和,
  2. 有状态的LSTM

让我们参考下面粘贴的代码专注于以上两个问题:

# reshape into X=t and Y=t+1
look_back = 3
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)

# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], look_back, 1))
testX = numpy.reshape(testX, (testX.shape[0], look_back, 1))
########################
# The IMPORTANT BIT
##########################
# create and fit the LSTM network
batch_size = 1
model = Sequential()
model.add(LSTM(4, batch_input_shape=(batch_size, look_back, 1), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(100):
    model.fit(trainX, trainY, nb_epoch=1, batch_size=batch_size, verbose=2, shuffle=False)
    model.reset_states()

注意:create_dataset采用长度为N的序列,并返回一个N-look_back数组,其中每个元素都是一个look_back长度序列。

什么是时间步骤和功能?

可以看出TrainX是一个3D数组,其中Time_steps和Feature是最后两个维度(在此特定代码中为3和1)。关于下图,这是否意味着我们正在考虑many to one粉红色盒数为3的情况?还是字面上的意思是链长为3(即仅考虑了3个绿色框)。

当我们考虑多元序列时,features参数是否有意义?例如同时模拟两个金融股票?

有状态的LSTM

有状态LSTM是否意味着我们在批次运行之间保存了单元内存值?如果是这样,batch_size则为1,并且在两次训练之间将内存重置,那么说它是有状态的就意味着什么。我猜想这与训练数据没有被改组的事实有关,但是我不确定如何做。

有什么想法吗?图片参考:http : //karpathy.github.io/2015/05/21/rnn-efficiency/

编辑1:

@van对红色和绿色方框相等的评论有点困惑。因此,为了确认一下,以下API调用是否与展开的图相对应?特别注意第二张图(batch_size被任意选择):

编辑2:

对于已经完成Udacity深度学习类但仍对time_step参数感到困惑的人,请查看以下讨论:https ://discussions.udacity.com/t/rnn-lstm-use-implementation/163169

更新:

原来model.add(TimeDistributed(Dense(vocab_len)))是我要找的东西。这是一个示例:https : //github.com/sachinruk/ShakespeareBot

更新2:

我在这里总结了我对LSTM的大部分理解:https : //www.youtube.com/watch?v= ywinX5wgdEU

I am trying to reconcile my understand of LSTMs and pointed out here in this post by Christopher Olah implemented in Keras. I am following the blog written by Jason Brownlee for the Keras tutorial. What I am mainly confused about is,

  1. The reshaping of the data series into [samples, time steps, features] and,
  2. The stateful LSTMs

Lets concentrate on the above two questions with reference to the code pasted below:

# reshape into X=t and Y=t+1
look_back = 3
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)

# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], look_back, 1))
testX = numpy.reshape(testX, (testX.shape[0], look_back, 1))
########################
# The IMPORTANT BIT
##########################
# create and fit the LSTM network
batch_size = 1
model = Sequential()
model.add(LSTM(4, batch_input_shape=(batch_size, look_back, 1), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(100):
    model.fit(trainX, trainY, nb_epoch=1, batch_size=batch_size, verbose=2, shuffle=False)
    model.reset_states()

Note: create_dataset takes a sequence of length N and returns a N-look_back array of which each element is a look_back length sequence.

What is Time Steps and Features?

As can be seen TrainX is a 3-D array with Time_steps and Feature being the last two dimensions respectively (3 and 1 in this particular code). With respect to the image below, does this mean that we are considering the many to one case, where the number of pink boxes are 3? Or does it literally mean the chain length is 3 (i.e. only 3 green boxes considered).

Does the features argument become relevant when we consider multivariate series? e.g. modelling two financial stocks simultaneously?

Stateful LSTMs

Does stateful LSTMs mean that we save the cell memory values between runs of batches? If this is the case, batch_size is one, and the memory is reset between the training runs so what was the point of saying that it was stateful. I’m guessing this is related to the fact that training data is not shuffled, but I’m not sure how.

Any thoughts? Image reference: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Edit 1:

A bit confused about @van’s comment about the red and green boxes being equal. So just to confirm, does the following API calls correspond to the unrolled diagrams? Especially noting the second diagram (batch_size was arbitrarily chosen.):

Edit 2:

For people who have done Udacity’s deep learning course and still confused about the time_step argument, look at the following discussion: https://discussions.udacity.com/t/rnn-lstm-use-implementation/163169

Update:

It turns out model.add(TimeDistributed(Dense(vocab_len))) was what I was looking for. Here is an example: https://github.com/sachinruk/ShakespeareBot

Update2:

I have summarised most of my understanding of LSTMs here: https://www.youtube.com/watch?v=ywinX5wgdEU


回答 0

首先,你选择伟大的教程(12)开始。

Time-step的含义Time-steps==3X.shape(描述数据形状)表示三个粉红色的框。由于在Keras中,每个步骤都需要输入,因此绿色框的数量通常应等于红色框的数量。除非您破解结构。

多对多与多对一:在keras中,return_sequences初始化LSTMor GRU或时有一个参数SimpleRNN。当return_sequencesFalse(默认情况下)时,则如图所示多对一。其返回形状为(batch_size, hidden_unit_length),代表最后一个状态。如果return_sequences是的True话,那就是很多很多。它的返回形状是(batch_size, time_step, hidden_unit_length)

features参数是否相关:Feature参数表示“您的红框有多大”或每步的输入维数是多少?例如,如果您要从8种市场信息中进行预测,则可以使用生成数据feature==8

有状态:您可以查找源代码。初始化状态时,如果为stateful==True,则将最后一次训练的状态用作初始状态,否则将生成新状态。我还没打开stateful呢。但是,我不同意的是,当batch_size只能为1 stateful==True

当前,您将使用收集的数据生成数据。将您的库存信息以流的形式显示,而不是等待一天收集所有顺序的图像,而是想在通过网络进行培训/预测时在线生成输入数据。如果您有400只股票共享同一网络,则可以设置batch_size==400

First of all, you choose great tutorials(1,2) to start.

What Time-step means: Time-steps==3 in X.shape (Describing data shape) means there are three pink boxes. Since in Keras each step requires an input, therefore the number of the green boxes should usually equal to the number of red boxes. Unless you hack the structure.

many to many vs. many to one: In keras, there is a return_sequences parameter when your initializing LSTM or GRU or SimpleRNN. When return_sequences is False (by default), then it is many to one as shown in the picture. Its return shape is (batch_size, hidden_unit_length), which represent the last state. When return_sequences is True, then it is many to many. Its return shape is (batch_size, time_step, hidden_unit_length)

Does the features argument become relevant: Feature argument means “How big is your red box” or what is the input dimension each step. If you want to predict from, say, 8 kinds of market information, then you can generate your data with feature==8.

Stateful: You can look up the source code. When initializing the state, if stateful==True, then the state from last training will be used as the initial state, otherwise it will generate a new state. I haven’t turn on stateful yet. However, I disagree with that the batch_size can only be 1 when stateful==True.

Currently, you generate your data with collected data. Image your stock information is coming as stream, rather than waiting for a day to collect all sequential, you would like to generate input data online while training/predicting with network. If you have 400 stocks sharing a same network, then you can set batch_size==400.


回答 1

作为已接受答案的补充,此答案显示了keras行为以及如何获得每张照片。

一般Keras行为

标准keras内部处理总是如下图所示(我在其中使用features=2,压力和温度为例):

在此图中,我将步骤数增加到5,以避免与其他维度混淆。

对于此示例:

  • 我们有N个油箱
  • 我们每小时花费5个小时采取措施(时间步长)
  • 我们测量了两个功能:
    • 压力P
    • 温度T

输入数组的形状应为(N,5,2)

        [     Step1      Step2      Step3      Step4      Step5
Tank A:    [[Pa1,Ta1], [Pa2,Ta2], [Pa3,Ta3], [Pa4,Ta4], [Pa5,Ta5]],
Tank B:    [[Pb1,Tb1], [Pb2,Tb2], [Pb3,Tb3], [Pb4,Tb4], [Pb5,Tb5]],
  ....
Tank N:    [[Pn1,Tn1], [Pn2,Tn2], [Pn3,Tn3], [Pn4,Tn4], [Pn5,Tn5]],
        ]

滑动窗输入

通常,LSTM层应该处理整个序列。划分窗口可能不是最好的主意。该层具有有关序列前进过程的内部状态。Windows消除了学习长序列的可能性,从而将所有序列限制为窗口大小。

在窗口中,每个窗口都是一个较长的原始序列的一部分,但是Keras会将它们视为独立的序列:

        [     Step1    Step2    Step3    Step4    Step5
Window  A:  [[P1,T1], [P2,T2], [P3,T3], [P4,T4], [P5,T5]],
Window  B:  [[P2,T2], [P3,T3], [P4,T4], [P5,T5], [P6,T6]],
Window  C:  [[P3,T3], [P4,T4], [P5,T5], [P6,T6], [P7,T7]],
  ....
        ]

请注意,在这种情况下,最初只有一个序列,但是您将其分为许多序列以创建窗口。

“什么是序列”的概念是抽象的。重要的部分是:

  • 您可以批量处理许多单独的序列
  • 使序列成为序列的原因是它们是逐步演化的(通常是时间步长)

通过“单层”实现每种情况

实现许多标准:

您可以使用一个简单的LSTM层来实现很多对很多return_sequences=True

outputs = LSTM(units, return_sequences=True)(inputs)

#output_shape -> (batch_size, steps, units)

实现多对一:

使用完全相同的图层,keras将执行完全相同的内部预处理,但是当您使用return_sequences=False(或简单地忽略此参数)时,keras会自动放弃最后一步的步骤:

outputs = LSTM(units)(inputs)

#output_shape -> (batch_size, units) --> steps were discarded, only the last was returned

实现一对多

现在,仅keras LSTM层不支持此功能。您将必须创建自己的策略来重复步骤。有两种好的方法:

  • 通过重复张量创建恒定的多步输入
  • 使用a stateful=True反复获取一个步骤的输出,并将其用作下一步的输入(需要output_features == input_features

一对多与重复向量

为了适应keras的标准行为,我们需要分步进行输入,因此,我们只需重复输入所需的长度即可:

outputs = RepeatVector(steps)(inputs) #where inputs is (batch,features)
outputs = LSTM(units,return_sequences=True)(outputs)

#output_shape -> (batch_size, steps, units)

了解状态=真

现在出现一种可能的用法 stateful=True(避免避免一次加载无法容纳计算机内存的数据)

有状态允许我们分阶段输入序列的“部分”。区别在于:

  • 在中stateful=False,第二批包含完整的新序列,独立于第一批
  • 在中stateful=True,第二批继续第一批,扩展了相同的序列。

这就像在Windows中划分序列一样,有两个主要区别:

  • 这些窗户不叠加!
  • stateful=True 将看到这些窗口作为单个长序列连接

在中stateful=True,每个新批次将被解释为继续前一个批次(直到您调用model.reset_states())。

  • 批次2中的序列1将继续批次1中的序列1。
  • 批次2中的序列2将继续批次1中的序列2。
  • 批次2中的序列n将继续批次1中的序列n。

输入示例,批次1包含步骤1和2,批次2包含步骤3至5:

                   BATCH 1                           BATCH 2
        [     Step1      Step2        |    [    Step3      Step4      Step5
Tank A:    [[Pa1,Ta1], [Pa2,Ta2],     |       [Pa3,Ta3], [Pa4,Ta4], [Pa5,Ta5]],
Tank B:    [[Pb1,Tb1], [Pb2,Tb2],     |       [Pb3,Tb3], [Pb4,Tb4], [Pb5,Tb5]],
  ....                                |
Tank N:    [[Pn1,Tn1], [Pn2,Tn2],     |       [Pn3,Tn3], [Pn4,Tn4], [Pn5,Tn5]],
        ]                                  ]

注意批次1和批次2中的储罐对齐!这就是我们需要的原因shuffle=False(当然,除非我们仅使用一个序列)。

您可以无限期地拥有任意数量的批次。(对于每批具有可变长度,请使用input_shape=(None,features)

一对多与有状态= True

对于我们这里的情况,每批将只使用1步,因为我们希望获得一个输出步并将其作为输入。

请注意,图片中的行为不是由“引起的” stateful=True。我们将在下面的手动循环中强制执行该操作。在此示例中,stateful=True是“允许”我们停止序列,操纵我们想要的并从我们停止的地方继续进行操作的东西。

老实说,对于这种情况,重复方法可能是更好的选择。但是,由于我们正在研究stateful=True,所以这是一个很好的例子。最好的使用方法是下一个“多对多”案例。

层:

outputs = LSTM(units=features, 
               stateful=True, 
               return_sequences=True, #just to keep a nice output shape even with length 1
               input_shape=(None,features))(inputs) 
    #units = features because we want to use the outputs as inputs
    #None because we want variable length

#output_shape -> (batch_size, steps, units) 

现在,我们将需要一个手动循环进行预测:

input_data = someDataWithShape((batch, 1, features))

#important, we're starting new sequences, not continuing old ones:
model.reset_states()

output_sequence = []
last_step = input_data
for i in steps_to_predict:

    new_step = model.predict(last_step)
    output_sequence.append(new_step)
    last_step = new_step

 #end of the sequences
 model.reset_states()

有状态=真对多对多

现在,在这里,我们得到一个非常好的应用程序:给定一个输入序列,尝试预测其未来未知的步骤。

我们使用的方法与上述“一对多”方法相同,不同之处在于:

  • 我们将使用序列本身作为目标数据,向前迈出一步
  • 我们知道序列的一部分(因此我们丢弃了这部分结果)。

图层(与上面相同):

outputs = LSTM(units=features, 
               stateful=True, 
               return_sequences=True, 
               input_shape=(None,features))(inputs) 
    #units = features because we want to use the outputs as inputs
    #None because we want variable length

#output_shape -> (batch_size, steps, units) 

训练:

我们将训练模型以预测序列的下一步:

totalSequences = someSequencesShaped((batch, steps, features))
    #batch size is usually 1 in these cases (often you have only one Tank in the example)

X = totalSequences[:,:-1] #the entire known sequence, except the last step
Y = totalSequences[:,1:] #one step ahead of X

#loop for resetting states at the start/end of the sequences:
for epoch in range(epochs):
    model.reset_states()
    model.train_on_batch(X,Y)

预测:

我们预测的第一阶段涉及“调整状态”。这就是为什么即使我们已经知道序列的这一部分,我们也要再次预测整个序列:

model.reset_states() #starting a new sequence
predicted = model.predict(totalSequences)
firstNewStep = predicted[:,-1:] #the last step of the predictions is the first future step

现在我们像一对多情况一样进入循环。但是请不要在这里重置状态!。我们希望模型知道序列的哪一步(并且由于上面我们所做的预测,它知道它在第一步)

output_sequence = [firstNewStep]
last_step = firstNewStep
for i in steps_to_predict:

    new_step = model.predict(last_step)
    output_sequence.append(new_step)
    last_step = new_step

 #end of the sequences
 model.reset_states()

这些答案和文件中使用了这种方法:

实现复杂的配置

在上面的所有示例中,我都展示了“一层”的行为。

当然,您可以在彼此之上堆叠许多层,而不必全部遵循相同的模式,然后创建自己的模型。

出现的一个有趣的例子是“自动编码器”,它具有“多对一编码器”,后跟“一对多”解码器:

编码器:

inputs = Input((steps,features))

#a few many to many layers:
outputs = LSTM(hidden1,return_sequences=True)(inputs)
outputs = LSTM(hidden2,return_sequences=True)(outputs)    

#many to one layer:
outputs = LSTM(hidden3)(outputs)

encoder = Model(inputs,outputs)

解码器:

使用“重复”方法;

inputs = Input((hidden3,))

#repeat to make one to many:
outputs = RepeatVector(steps)(inputs)

#a few many to many layers:
outputs = LSTM(hidden4,return_sequences=True)(outputs)

#last layer
outputs = LSTM(features,return_sequences=True)(outputs)

decoder = Model(inputs,outputs)

自动编码器:

inputs = Input((steps,features))
outputs = encoder(inputs)
outputs = decoder(outputs)

autoencoder = Model(inputs,outputs)

与一起训练 fit(X,X)

补充说明

如果您想了解有关LSTM中如何计算步数的详细信息,或有关上述stateful=True情况的详细信息,则可以在此答案中阅读更多内容:关于“了解Keras LSTM”的疑问

As a complement to the accepted answer, this answer shows keras behaviors and how to achieve each picture.

General Keras behavior

The standard keras internal processing is always a many to many as in the following picture (where I used features=2, pressure and temperature, just as an example):

In this image, I increased the number of steps to 5, to avoid confusion with the other dimensions.

For this example:

  • We have N oil tanks
  • We spent 5 hours taking measures hourly (time steps)
  • We measured two features:
    • Pressure P
    • Temperature T

Our input array should then be something shaped as (N,5,2):

        [     Step1      Step2      Step3      Step4      Step5
Tank A:    [[Pa1,Ta1], [Pa2,Ta2], [Pa3,Ta3], [Pa4,Ta4], [Pa5,Ta5]],
Tank B:    [[Pb1,Tb1], [Pb2,Tb2], [Pb3,Tb3], [Pb4,Tb4], [Pb5,Tb5]],
  ....
Tank N:    [[Pn1,Tn1], [Pn2,Tn2], [Pn3,Tn3], [Pn4,Tn4], [Pn5,Tn5]],
        ]

Inputs for sliding windows

Often, LSTM layers are supposed to process the entire sequences. Dividing windows may not be the best idea. The layer has internal states about how a sequence is evolving as it steps forward. Windows eliminate the possibility of learning long sequences, limiting all sequences to the window size.

In windows, each window is part of a long original sequence, but by Keras they will be seen each as an independent sequence:

        [     Step1    Step2    Step3    Step4    Step5
Window  A:  [[P1,T1], [P2,T2], [P3,T3], [P4,T4], [P5,T5]],
Window  B:  [[P2,T2], [P3,T3], [P4,T4], [P5,T5], [P6,T6]],
Window  C:  [[P3,T3], [P4,T4], [P5,T5], [P6,T6], [P7,T7]],
  ....
        ]

Notice that in this case, you have initially only one sequence, but you’re dividing it in many sequences to create windows.

The concept of “what is a sequence” is abstract. The important parts are:

  • you can have batches with many individual sequences
  • what makes the sequences be sequences is that they evolve in steps (usually time steps)

Achieving each case with “single layers”

Achieving standard many to many:

You can achieve many to many with a simple LSTM layer, using return_sequences=True:

outputs = LSTM(units, return_sequences=True)(inputs)

#output_shape -> (batch_size, steps, units)

Achieving many to one:

Using the exact same layer, keras will do the exact same internal preprocessing, but when you use return_sequences=False (or simply ignore this argument), keras will automatically discard the steps previous to the last:

outputs = LSTM(units)(inputs)

#output_shape -> (batch_size, units) --> steps were discarded, only the last was returned

Achieving one to many

Now, this is not supported by keras LSTM layers alone. You will have to create your own strategy to multiplicate the steps. There are two good approaches:

  • Create a constant multi-step input by repeating a tensor
  • Use a stateful=True to recurrently take the output of one step and serve it as the input of the next step (needs output_features == input_features)

One to many with repeat vector

In order to fit to keras standard behavior, we need inputs in steps, so, we simply repeat the inputs for the length we want:

outputs = RepeatVector(steps)(inputs) #where inputs is (batch,features)
outputs = LSTM(units,return_sequences=True)(outputs)

#output_shape -> (batch_size, steps, units)

Understanding stateful = True

Now comes one of the possible usages of stateful=True (besides avoiding loading data that can’t fit your computer’s memory at once)

Stateful allows us to input “parts” of the sequences in stages. The difference is:

  • In stateful=False, the second batch contains whole new sequences, independent from the first batch
  • In stateful=True, the second batch continues the first batch, extending the same sequences.

It’s like dividing the sequences in windows too, with these two main differences:

  • these windows do not superpose!!
  • stateful=True will see these windows connected as a single long sequence

In stateful=True, every new batch will be interpreted as continuing the previous batch (until you call model.reset_states()).

  • Sequence 1 in batch 2 will continue sequence 1 in batch 1.
  • Sequence 2 in batch 2 will continue sequence 2 in batch 1.
  • Sequence n in batch 2 will continue sequence n in batch 1.

Example of inputs, batch 1 contains steps 1 and 2, batch 2 contains steps 3 to 5:

                   BATCH 1                           BATCH 2
        [     Step1      Step2        |    [    Step3      Step4      Step5
Tank A:    [[Pa1,Ta1], [Pa2,Ta2],     |       [Pa3,Ta3], [Pa4,Ta4], [Pa5,Ta5]],
Tank B:    [[Pb1,Tb1], [Pb2,Tb2],     |       [Pb3,Tb3], [Pb4,Tb4], [Pb5,Tb5]],
  ....                                |
Tank N:    [[Pn1,Tn1], [Pn2,Tn2],     |       [Pn3,Tn3], [Pn4,Tn4], [Pn5,Tn5]],
        ]                                  ]

Notice the alignment of tanks in batch 1 and batch 2! That’s why we need shuffle=False (unless we are using only one sequence, of course).

You can have any number of batches, indefinitely. (For having variable lengths in each batch, use input_shape=(None,features).

One to many with stateful=True

For our case here, we are going to use only 1 step per batch, because we want to get one output step and make it be an input.

Please notice that the behavior in the picture is not “caused by” stateful=True. We will force that behavior in a manual loop below. In this example, stateful=True is what “allows” us to stop the sequence, manipulate what we want, and continue from where we stopped.

Honestly, the repeat approach is probably a better choice for this case. But since we’re looking into stateful=True, this is a good example. The best way to use this is the next “many to many” case.

Layer:

outputs = LSTM(units=features, 
               stateful=True, 
               return_sequences=True, #just to keep a nice output shape even with length 1
               input_shape=(None,features))(inputs) 
    #units = features because we want to use the outputs as inputs
    #None because we want variable length

#output_shape -> (batch_size, steps, units) 

Now, we’re going to need a manual loop for predictions:

input_data = someDataWithShape((batch, 1, features))

#important, we're starting new sequences, not continuing old ones:
model.reset_states()

output_sequence = []
last_step = input_data
for i in steps_to_predict:

    new_step = model.predict(last_step)
    output_sequence.append(new_step)
    last_step = new_step

 #end of the sequences
 model.reset_states()

Many to many with stateful=True

Now, here, we get a very nice application: given an input sequence, try to predict its future unknown steps.

We’re using the same method as in the “one to many” above, with the difference that:

  • we will use the sequence itself to be the target data, one step ahead
  • we know part of the sequence (so we discard this part of the results).

Layer (same as above):

outputs = LSTM(units=features, 
               stateful=True, 
               return_sequences=True, 
               input_shape=(None,features))(inputs) 
    #units = features because we want to use the outputs as inputs
    #None because we want variable length

#output_shape -> (batch_size, steps, units) 

Training:

We are going to train our model to predict the next step of the sequences:

totalSequences = someSequencesShaped((batch, steps, features))
    #batch size is usually 1 in these cases (often you have only one Tank in the example)

X = totalSequences[:,:-1] #the entire known sequence, except the last step
Y = totalSequences[:,1:] #one step ahead of X

#loop for resetting states at the start/end of the sequences:
for epoch in range(epochs):
    model.reset_states()
    model.train_on_batch(X,Y)

Predicting:

The first stage of our predicting involves “ajusting the states”. That’s why we’re going to predict the entire sequence again, even if we already know this part of it:

model.reset_states() #starting a new sequence
predicted = model.predict(totalSequences)
firstNewStep = predicted[:,-1:] #the last step of the predictions is the first future step

Now we go to the loop as in the one to many case. But don’t reset states here!. We want the model to know in which step of the sequence it is (and it knows it’s at the first new step because of the prediction we just made above)

output_sequence = [firstNewStep]
last_step = firstNewStep
for i in steps_to_predict:

    new_step = model.predict(last_step)
    output_sequence.append(new_step)
    last_step = new_step

 #end of the sequences
 model.reset_states()

This approach was used in these answers and file:

Achieving complex configurations

In all examples above, I showed the behavior of “one layer”.

You can, of course, stack many layers on top of each other, not necessarly all following the same pattern, and create your own models.

One interesting example that has been appearing is the “autoencoder” that has a “many to one encoder” followed by a “one to many” decoder:

Encoder:

inputs = Input((steps,features))

#a few many to many layers:
outputs = LSTM(hidden1,return_sequences=True)(inputs)
outputs = LSTM(hidden2,return_sequences=True)(outputs)    

#many to one layer:
outputs = LSTM(hidden3)(outputs)

encoder = Model(inputs,outputs)

Decoder:

Using the “repeat” method;

inputs = Input((hidden3,))

#repeat to make one to many:
outputs = RepeatVector(steps)(inputs)

#a few many to many layers:
outputs = LSTM(hidden4,return_sequences=True)(outputs)

#last layer
outputs = LSTM(features,return_sequences=True)(outputs)

decoder = Model(inputs,outputs)

Autoencoder:

inputs = Input((steps,features))
outputs = encoder(inputs)
outputs = decoder(outputs)

autoencoder = Model(inputs,outputs)

Train with fit(X,X)

Additional explanations

If you want details about how steps are calculated in LSTMs, or details about the stateful=True cases above, you can read more in this answer: Doubts regarding `Understanding Keras LSTMs`


回答 2

当您在RNN的最后一层中有return_sequences时,您不能使用简单的Dense层,而应使用TimeDistributed。

这是一段示例代码,可能会对其他人有所帮助。

单词= keras.layers.Input(batch_shape =(None,self.maxSequenceLength),名称=“输入”)

    # Build a matrix of size vocabularySize x EmbeddingDimension 
    # where each row corresponds to a "word embedding" vector.
    # This layer will convert replace each word-id with a word-vector of size Embedding Dimension.
    embeddings = keras.layers.embeddings.Embedding(self.vocabularySize, self.EmbeddingDimension,
        name = "embeddings")(words)
    # Pass the word-vectors to the LSTM layer.
    # We are setting the hidden-state size to 512.
    # The output will be batchSize x maxSequenceLength x hiddenStateSize
    hiddenStates = keras.layers.GRU(512, return_sequences = True, 
                                        input_shape=(self.maxSequenceLength,
                                        self.EmbeddingDimension),
                                        name = "rnn")(embeddings)
    hiddenStates2 = keras.layers.GRU(128, return_sequences = True, 
                                        input_shape=(self.maxSequenceLength, self.EmbeddingDimension),
                                        name = "rnn2")(hiddenStates)

    denseOutput = TimeDistributed(keras.layers.Dense(self.vocabularySize), 
        name = "linear")(hiddenStates2)
    predictions = TimeDistributed(keras.layers.Activation("softmax"), 
        name = "softmax")(denseOutput)  

    # Build the computational graph by specifying the input, and output of the network.
    model = keras.models.Model(input = words, output = predictions)
    # model.compile(loss='kullback_leibler_divergence', \
    model.compile(loss='sparse_categorical_crossentropy', \
        optimizer = keras.optimizers.Adam(lr=0.009, \
            beta_1=0.9,\
            beta_2=0.999, \
            epsilon=None, \
            decay=0.01, \
            amsgrad=False))

When you have return_sequences in your last layer of RNN you cannot use a simple Dense layer instead use TimeDistributed.

Here is an example piece of code this might help others.

words = keras.layers.Input(batch_shape=(None, self.maxSequenceLength), name = “input”)

    # Build a matrix of size vocabularySize x EmbeddingDimension 
    # where each row corresponds to a "word embedding" vector.
    # This layer will convert replace each word-id with a word-vector of size Embedding Dimension.
    embeddings = keras.layers.embeddings.Embedding(self.vocabularySize, self.EmbeddingDimension,
        name = "embeddings")(words)
    # Pass the word-vectors to the LSTM layer.
    # We are setting the hidden-state size to 512.
    # The output will be batchSize x maxSequenceLength x hiddenStateSize
    hiddenStates = keras.layers.GRU(512, return_sequences = True, 
                                        input_shape=(self.maxSequenceLength,
                                        self.EmbeddingDimension),
                                        name = "rnn")(embeddings)
    hiddenStates2 = keras.layers.GRU(128, return_sequences = True, 
                                        input_shape=(self.maxSequenceLength, self.EmbeddingDimension),
                                        name = "rnn2")(hiddenStates)

    denseOutput = TimeDistributed(keras.layers.Dense(self.vocabularySize), 
        name = "linear")(hiddenStates2)
    predictions = TimeDistributed(keras.layers.Activation("softmax"), 
        name = "softmax")(denseOutput)  

    # Build the computational graph by specifying the input, and output of the network.
    model = keras.models.Model(input = words, output = predictions)
    # model.compile(loss='kullback_leibler_divergence', \
    model.compile(loss='sparse_categorical_crossentropy', \
        optimizer = keras.optimizers.Adam(lr=0.009, \
            beta_1=0.9,\
            beta_2=0.999, \
            epsilon=None, \
            decay=0.01, \
            amsgrad=False))

AiLearning-AiLearning:机器学习-MachineLearning-ML、深度学习-DeepLearning-DL、自然语言处理nlp

网站地址

下载

Docker

docker pull apachecn0/ailearning
docker run -tid -p <port>:80 apachecn0/ailearning
# 访问 http://localhost:{port} 查看文档

PYPI

pip install apachecn-ailearning
apachecn-ailearning <port>
# 访问 http://localhost:{port} 查看文档

NPM

npm install -g ailearning
ailearning <port>
# 访问 http://localhost:{port} 查看文档

组织介绍

  • 合作或侵权,请联系:apachecn@163.com
  • 我们不是apache的官方组织/机构/团体,只是apache技术栈(以及AI)的爱好者!

一种新技术一旦开始流行,你要么坐上压路机,要么成为铺路石.–斯图尔特·布兰德(Stewart Brand)

路线图

补充

1.机器学习-基础

支持版本

版本 支持
3.6.x
2.7.x

注意事项:

  • 机器学习实战:仅仅只是学习,请使用Python 2.7.x版本(3.6.x只是修改了部分)

基本介绍

学习文档

模块 章节 类型 负责人(GiHub) QQ
机器学习实战 第 1 章: 机器学习基础 介绍 @毛红动 1306014226
机器学习实战 第 2 章: KNN 近邻算法 分类 @尤永江 279393323
机器学习实战 第 3 章: 决策树 分类 @景涛 844300439
机器学习实战 第 4 章: 朴素贝叶斯 分类 @wnma3mz
@分析
1003324213
244970749
机器学习实战 第 5 章: Logistic回归 分类 @微光同尘 529925688
机器学习实战 第 6 章: SVM 支持向量机 分类 @王德红 934969547
网上组合内容 第 7 章: 集成方法(随机森林和 AdaBoost) 分类 @片刻 529815144
机器学习实战 第 8 章: 回归 回归 @微光同尘 529925688
机器学习实战 第 9 章: 树回归 回归 @微光同尘 529925688
机器学习实战 第 10 章: K-Means 聚类 聚类 @徐昭清 827106588
机器学习实战 第 11 章: 利用 Apriori 算法进行关联分析 频繁项集 @刘海飞 1049498972
机器学习实战 第 12 章: FP-growth 高效发现频繁项集 频繁项集 @程威 842725815
机器学习实战 第 13 章: 利用 PCA 来简化数据 工具 @廖立娟 835670618
机器学习实战 第 14 章: 利用 SVD 来简化数据 工具 @张俊皓 714974242
机器学习实战 第 15 章: 大数据与 MapReduce 工具 @wnma3mz 1003324213
ml项目实战 第 16 章: 推荐系统(已迁移) 项目 推荐系统(迁移后地址)
第一期的总结 2017-04-08: 第一期的总结 总结 总结 529815144

网站视频

知乎问答-爆炸啦-机器学习该怎么入门?

当然我知道,第一句就会被吐槽,因为科班出身的人,不屑的吐了一口唾沫,说傻X,还评论Andrew Ng的视频.

我还知道还有一部分人,看Andrew Ng的视频就是看不懂,那神秘的数学推导,那迷之微笑的英文版的教学,我何尝又不是这样走过来的??我的心可能比你们都痛,因为我在网上收藏过上10部“机器学习”相关视频,外加国内本土风格的教程:7月+小象等等,我都很难去听懂,直到有一天,被一个百度的高级算法分析师推荐说:“机器学习实战”还不错,通俗易懂,你去试试??

我试了试,还好我的Python基础和调试能力还不错,基本上代码都调试过一遍,很多高大上的“理论+推导”,在我眼中变成了几个“加减乘除+循环”,我想这不就是像我这样的程序员想要的入门教程么?

很多程序员说机器学习TM太难学了,是的,真TM难学,我想最难的是:没有一本像“机器学习实战”那样的作者愿意以程序员Coding角度去给大家讲解!!

最近几天、GitHub涨了300颗STAR、加群的200人,现在还在不断的增加++,我想大家可能都是感同身受吧!

很多想入门新手就是被忽悠着收藏收藏再收藏,但是最后还是什么都没有学到,也就是“资源收藏家”,也许新手要的就是MachineLearning(机器学习) 学习路线图那就是。没错,我可以给你们的一份,因为我们还通过视频记录下来我们的学习过程.水平当然也有限,不过对于新手入门,绝对没问题,如果你还不会,那算我输!!

视频怎么看?

  1. 理论科班出身-建议去学习Andrew Ng的视频(Ng的视频绝对是权威,这个毋庸置疑)
  2. 编码能力强-建议看我们的《机器学习实战-教学版》
  3. 编码能力弱-建议看我们的《机器学习实战-讨论版》、不过在看理论的时候,看教学版-理论部分;讨论版的废话太多,不过在讲解代码的时候是一行一行讲解的;所以,根据自己的需求,自由的组合.

[免费]数学教学视频-可汗学院入门篇

概率 统计 线性代数
可汗学院(概率) 可汗学院(统计学) 可汗学院(线性代数)

机器学习视频-ApacheCN教学版

AcFun B站
优酷 网易云课堂

[免费]机器/深度学习视频-吴恩达

机器学习 深度学习
吴恩达机器学习 神经网络和深度学习

2.深度学习

支持版本

版本 支持
3.6.x
2.7.x

入门基础

  1. 反向传递https://www.cnblogs.com/charlotte77/p/5629865.html
  2. CNN原理http://www.cnblogs.com/charlotte77/p/7759802.html
  3. RNN原理https://blog.csdn.net/qq_39422642/article/details/78676567
  4. LSTM原理https://blog.csdn.net/weixin_42111770/article/details/80900575

火炬-教程

–待更新

TensorFlow2.0-教程

–待更新

目录结构:

(切分(分词)

词性标注

命名实体识别

句法分析

wordnet可以被看作是一个同义词词典

词干提取(词干)与词形还原(词汇化)

TensorFlow2.0学习网址

3.自然语言处理

支持版本

版本 支持
3.6.x
2.7.x

学习过程中-内心复杂的变化!

自从学习NLP以后,才发现国内与国外的典型区别:
1. 对资源的态度是完全相反的:
  1) 国内: 就好像为了名气,举办工作装逼的会议,就是没有干货,全部都是象征性的PPT介绍,不是针对在做的各位
  2)国外: 就好像是为了推动nlp进步一样,分享者各种干货资料和具体的实现。(特别是: python自然语言处理)
2. 论文的实现: 
  1) 各种高大上的论文实现,却还是没看到一个像样的GitHub项目!(可能我的搜索能力差了点,一直没找到)
  2)国外就不举例了,我看不懂!
3. 开源的框架
  1)国外的开源框架:  tensorflow/pytorch 文档+教程+视频(官方提供)
  2) 国内的开源框架: 额额,还真举例不出来!但是牛逼吹得不比国外差!(MXNet虽然有众多国人参与开发,但不能算是国内开源框架。基于MXNet的动手学深度学习(http://zh.d2l.ai & https://discuss.gluon.ai/t/topic/753)中文教程,已经由沐神(李沐)以及阿斯顿·张讲授录制,公开发布(文档+第一季教程+视频)。)
每一次深入都要去翻墙,每一次深入都要Google,每一次看着国内的说: 哈工大、讯飞、中科大、百度、阿里多牛逼,但是资料还是得国外去找!
有时候真的挺恨的!真的有点瞧不起自己国内的技术环境!

当然谢谢国内很多博客大佬,特别是一些入门的Demo和基本概念。【深入的水平有限,没看懂】

1.(使用场景(百度公开课)

第一部分入门介绍

第二部分机器翻译

第三部分篇章分析

第四部分单元-语言理解与交互技术

应用领域

中文分词:

  • 构建DAG图
  • 动态规划查找,综合正反向(正向加权反向输出)求得DAG最大概率路径
  • 使用了SBME语料训练了一套HMM+维特比模型,解决未登录词问题

1.文本分类(文本分类)

文本分类是指标记句子或文档,例如电子邮件垃圾邮件分类和情感分析.

下面是一些很好的初学者文本分类数据集.

  1. 路透社Newswire主题分类(路透社-21578)。1987年年路透社出现的一系列新闻文件,按类别编制索引。另见RCV1,RCV2和TRC2那就是。
  2. IMDB电影评论情感分类(斯坦福)那就是。来自网站imdb.com的一系列电影评论及其积极或消极的情绪。
  3. 新闻组电影评论情感分类(康奈尔)那就是。来自网站imdb.com的一系列电影评论及其积极或消极的情绪。

有关更多信息,请参阅帖子:单标签文本分类的数据集那就是。

情感分析

比赛地址:https://www.kaggle.com/c/word2vec-nlp-tutorial

  • 方案一(0.86):字数+朴素贝叶斯
  • 方案二(0.94):lda+分类模型(knn/决策树/逻辑回归/svm/xgBoost/随机森林)
    • a)决策树效果不是很好,这种连续特征不太适合的
    • b)通过参数调整200年个主题,信息量保存效果较优(计算主题)
  • 美国有线电视新闻网(方案三):word2vec+cnn
    • 说实话:没有一个好的机器,是调不出来一个好的结果(:逃

通过AuC来评估模型的效果

2.语言模型(语言建模)

语言建模涉及开发一种统计模型,用于预测句子中的下一个单词或一个单词中的下一个单词.它是语音识别和机器翻译等任务中的前置任务.

它是语音识别和机器翻译等任务中的前置任务.

下面是一些很好的初学者语言建模数据集.

  1. 古腾堡项目、一系列免费书籍,可以用纯文本检索各种语言.
  2. 还有更多正式的语料库得到了很好的研究;例如:布朗大学现代美国英语标准语料库那就是。大量英语单词样本.谷歌10亿字语料库那就是。

新词发现

句子相似度识别

文本纠错

  • 双字母+双音

3.图像字幕(图像字幕)

法师字幕是为给定图像生成文本描述的任务。

下面是一些很好的初学者图像字幕数据集.

  1. 上下文中的公共对象(COCO)那就是。包含超过12万张带描述的图像的集合
  2. Flickr 8K那就是。从Flickr.com获取的8千个描述图像的集合。
  3. Flickr 30K那就是。从Flickr.com获取的3万个描述图像的集合。欲了解更多,请看帖子:

探索图像字幕数据集,2016年

4.机器翻译(机器翻译)

机器翻译是将文本从一种语言翻译成另一种语言的任务.

下面是一些很好的初学者机器翻译数据集.

  1. 加拿大第36届议会的协调国会议员那就是。成对的英语和法语句子.
  2. 欧洲议会诉讼平行语料库1996-2011那就是。句子对一套欧洲语言.有大量标准数据集用于年度机器翻译挑战;看到:

统计机器翻译

机器翻译

5.问答系统(问答)

问答是一项任务,其中提供了一个句子或文本样本,从中提出问题并且必须回答问题.

下面是一些很好的初学者问题回答数据集.

  1. 斯坦福问题回答数据集(SQuAD)那就是。回答有关维基百科文章的问题.
  2. Deepmind问题回答语料库那就是。从每日邮报回答有关新闻文章的问题.
  3. 亚马逊问答数据那就是。回答有关亚马逊产品的问题.有关更多信息,请参阅帖子:

数据集: 我如何获得问答网站的语料库,如Quora或Yahoo Answers或Stack Overflow来分析答案质量?

6.语音识别(语音识别)

语音识别是将口语的音频转换为人类可读文本的任务.

下面是一些很好的初学者语音识别数据集.

  1. TIMIT声学 – 语音连续语音语料库那就是。不是免费的,但因其广泛使用而上市.口语美国英语和相关的转录.
  2. VoxForge那就是。用于构建用于语音识别的开源数据库的项目.
  3. LibriSpeech ASR语料库那就是。从librivox收集的大量英语有声读物.

7.自动文摘(文档摘要)

文档摘要是创建较大文档的简短有意义描述的任务.

下面是一些很好的初学者文档摘要数据集.

  1. 法律案例报告数据集那就是。收集了4000份法律案件及其摘要。
  2. TIPSTER文本摘要评估会议语料库那就是。收集了近200份文件及其摘要。
  3. 英语新闻文本的AQUAINT语料库那就是。不是免费的,而是广泛使用的.新闻文章的语料库.欲了解更多信息:

文档理解会议(DUC)任务那就是。在哪里可以找到用于文本摘要的良好数据集?

命名实体识别

文本摘要

图形图计算[慢慢更新]

  • 数据集:data/nlp/graph
  • 学习资料:电光图片X实战.pdf[文件太大不方便提供,自己百度]

知识图谱

进一步阅读

如果您希望更深入,本节提供了其他数据集列表.

  1. 维基百科研究中使用的文本数据集
  2. 数据集: 计算语言学家和自然语言处理研究人员使用的主要文本语料库是什么?
  3. 斯坦福统计自然语言处理语料库
  4. 按字母顺序排列的NLP数据集列表
  5. 该机构NLTK
  6. 在DL4J上打开深度学习数据
  7. NLP数据集
  8. 国内开放数据集:https://bosonnlp.com/dev/resource

贡献者信息

欢迎贡献者不断的追加

免责声明-[只供学习参考]

  • ApacheCN纯粹出于学习目的与个人兴趣翻译本书
  • ApacheCN保留对此版本译文的署名权及其它相关权利

协议

  • 以各项目协议为准.
  • ApacheCN账号下没有协议的项目,一律视为CC BY-NC-SA 4.0那就是。

资料来源:

感谢信

最近无意收到群友推送的链接,发现得到大佬高度的认可,并在热心的推广

在此感谢:

赞助我们