[[+content_image]]
D
D
Dplll2019-02-10 18:54:07
Python
Dplll, 2019-02-10 18:54:07

Why does tf.nn.sparse_softmax_cross_entropy_with_logits() return nan?

inp, tar = sess.run(el)
print(tar[:1])

Conclusion:
[[912   0  53 145   0 155  45  50  15  48 924 225 912   0 235   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]]

dec_outputs = decoder(y, context_vector, hidden, batch_sz)[:1]

Conclusion:
[]

That is, everything is OK,
but when I transfer everything to
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels= y, logits= dec_outputs)) 
loss_ = sess.run(loss, feed_dict= {x: inp, y: tar})

Output: nan
tar.shape: (6, 82)
dec_outputs.shape: (6, 82, 512)

Answer the question

In order to leave comments, you need to log in

[[+comments_count]] answer(s)
I
ivodopyanov, 2019-02-11
@ivodopyanov

Most likely, the problem is not in the network architecture, but in the data. For example, such an error can occur if one of the samples turned out to be of zero length. At the same time, the gradients will be NaN there, and the weights through which they pass also turn into NaN.
The zeros in the first sample output look suspicious. Usually 0 is just a blank to finish the length of the sequence to the desired size, and OOV-words and the end of the phrase are marked with separate codes / characters.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question