Answer the question
In order to leave comments, you need to log in
How to read what is happening in a given implementation of the DQN algorithm on TensorFlowJS?
Hi,
I have found some examples of DQN implementations, but since I am not an expert on TensorFlow or Machine Learning, I am a bit confused. https://dumpz.org/c77HNAA4XxGF here is one of them.
I understand that, in the 73rd line, we take some piece of data: [{state, action, reward, newState, done}]
to be precise, then we get currentStates
, which is , then at 75 we use the model to get
currentQs
, which are equal, as far as I understand, because our model is used to get the action from the state of the environment. The same thing happens with
newCurrentStates
and futureQs
.
But then at 88 we see . What's going on here? is it an array of arrays with action probabilities for each futureState? And thenlet maxFutureQ = Math.max (futureQs);
futureQs
maxFutureQ
should be the probability of the action, why then do we add it to the reward? This part confuses me.
Also, I can't understand why we need to do currentQ [action] = newQ;
94. We end up losing that part anyway, no?
Can someone help me understand what's going on here and maybe post comments on the lines?
Thanks in advance.
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question