Why does tf.nn.sparse_softmax_cross_entropy_with

D

Dplll2019-02-10 18:54:07

Python

Dplll, 2019-02-10 18:54:07

Why does tf.nn.sparse_softmax_cross_entropy_with_logits() return nan?

inp, tar = sess.run(el)
print(tar[:1])

Conclusion:

[[912   0  53 145   0 155  45  50  15  48 924 225 912   0 235   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]]

dec_outputs = decoder(y, context_vector, hidden, batch_sz)[:1]

Conclusion:

[]

That is, everything is OK,
but when I transfer everything to

loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels= y, logits= dec_outputs)) 
loss_ = sess.run(loss, feed_dict= {x: inp, y: tar})

Output: nan
tar.shape: (6, 82)
dec_outputs.shape: (6, 82, 512)

Reply

Answer the question

In order to leave comments, you need to log in

[[+comments_count]] answer(s)

I

ivodopyanov, 2019-02-11
@ivodopyanov

Most likely, the problem is not in the network architecture, but in the data. For example, such an error can occur if one of the samples turned out to be of zero length. At the same time, the gradients will be NaN there, and the weights through which they pass also turn into NaN.
The zeros in the first sample output look suspicious. Usually 0 is just a blank to finish the length of the sequence to the desired size, and OOV-words and the end of the phrase are marked with separate codes / characters.