How is the strategy saved in reinforcement learning?

O

Oleg Petrov2018-10-01 15:05:21

Python

Oleg Petrov, 2018-10-01 15:05:21

Parsing the program code https://github.com/Smeilz/Tic-Tac-Toe-Reinforcemen...
What did I understand?
The program has 2 modules.
Qlearning.py - responsible for training agents and saving the result of learning
Game.py - describes the process of the game The
question is how exactly does Qlearning do the saving strategy?
1) There is a line in Train.py
game.saveStates()
2) It refers to a function in the game.py module

def saveStates(self):
        self.player1.saveQtable("player1states")
        self.player2.saveQtable("player2states")

3) This function then references the instance of Player1 and Player2 and the saveQtable function in the QLearning.py module

def saveQtable(self,file_name):  #save table
        with open(file_name, 'wb') as handle:
            pickle.dump(self.Q, handle, protocol=pickle.HIGHEST_PROTOCOL)

-------------------------------------------------- --
As a result, as I understand it, the program saves the strategy that was obtained as a result of training as a stream of bytes and decodes it back when loading.
Questions.
1) How exactly is the strategy saved? What is its structure? What will the self parameter store in this case?
2) Is it possible to change the code to save it to a file in readable form and see the format?
3) How to save the same in Xml?
Thanks in advance