How to read what is happening in a given implementation of the DQN algorithm on TensorFlowJS?

V

Vadim2021-01-09 18:06:31

JavaScript

Vadim, 2021-01-09 18:06:31

Hi,

I have found some examples of DQN implementations, but since I am not an expert on TensorFlow or Machine Learning, I am a bit confused. https://dumpz.org/c77HNAA4XxGF here is one of them.

I understand that, in the 73rd line, we take some piece of data: [{state, action, reward, newState, done}]to be precise, then we get currentStates, which is , then at 75 we use the model to get currentQs, which are equal, as far as I understand, because our model is used to get the action from the state of the environment. The same thing happens with newCurrentStatesand futureQs.

But then at 88 we see . What's going on here? is it an array of arrays with action probabilities for each futureState? And thenlet maxFutureQ = Math.max (futureQs);futureQsmaxFutureQshould be the probability of the action, why then do we add it to the reward? This part confuses me.

Also, I can't understand why we need to do currentQ [action] = newQ;94. We end up losing that part anyway, no?

Can someone help me understand what's going on here and maybe post comments on the lines?

Thanks in advance.