V
V
Vadim2021-01-09 18:06:31
JavaScript
Vadim, 2021-01-09 18:06:31

How to read what is happening in a given implementation of the DQN algorithm on TensorFlowJS?

Hi,

I have found some examples of DQN implementations, but since I am not an expert on TensorFlow or Machine Learning, I am a bit confused. https://dumpz.org/c77HNAA4XxGF here is one of them.

I understand that, in the 73rd line, we take some piece of data: [{state, action, reward, newState, done}]to be precise, then we get currentStates, which is , then at 75 we use the model to get currentQs, which are equal, as far as I understand, because our model is used to get the action from the state of the environment. The same thing happens with newCurrentStatesand futureQs.

But then at 88 we see . What's going on here? is it an array of arrays with action probabilities for each futureState? And thenlet maxFutureQ = Math.max (futureQs);futureQsmaxFutureQshould be the probability of the action, why then do we add it to the reward? This part confuses me.

Also, I can't understand why we need to do currentQ [action] = newQ;94. We end up losing that part anyway, no?

Can someone help me understand what's going on here and maybe post comments on the lines?

Thanks in advance.

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question