How to implement reinforcement learning with these tools?

P

Prizm2021-04-07 22:44:12

Neural networks

Prizm, 2021-04-07 22:44:12

I already have:

A neural network class of the feed-forward type (with built-in gradient calculation on an arbitrary model, etc.).
Pair data class (supervised data pairs)
The tic-tac-toe field class with methods for a standard move (by cell number), the getNextStates() method, which returns a list of possible future states (respectively, the tic-tac-toe are swapped so that the field is "on behalf of the opponent"), the getAsVector() method , which returns the state of the field as a vector of 9 values +-1 or 0 depending on what is in the cell.

So - how to properly train a tic-tac-toe bot with reinforcement learning? (provided that the course of the neural network will consist in calculating the V-function for all future states of the board and choosing the "worst" state from the point of view of the opponent).

PS I do not use any libraries, so please provide the algorithm in pseudocode.