![]() The agent gets a high reward when he wins a game, a high penalty when he loses and an even higher penalty when he performs an illegal move. What your are trying to do is to learn the behaviour of an agent playing Tic Tac Toe. To tackle this problem efficiently, you sould consider Reinforcement Learning methods, instead of what you are currently doing. Where is my mistake? How do I generally tackle this problem? Or where could I read about it? Thanks a lot! This might be possible for Tic Tac Toe but if I move to another game, say Go, this becomes infeasible. Also, it seems to go into the direction of a complete enumartion of all possible board situations. However, after a while this becomes very demanding on the computation. One solution I came up with is to "remember" every game played and optimizing simultaneously over all games played. ![]() How can I improve the decision in one situation without forgetting every other situation? I know I should probably have 9 output neurons instead of 1 and should probably not use a random field as the target, as I assume this can mess things up. So I have this kind of overfitting: Everytime I backpropagade to optimize the weights for one particular situation, the decision for every other situation becomes worse! My problem is: in changing the weights, a formerly optimized outcome is now also changed. (There is one output which should have values between 1 and 9). When NN chooses an illegal move I optimize the weights such that the distance to another, randomly chosen (legal) field is minimized. It doesn't get very far with this, however. not choosing a field that is already occupied. First, it should learn to make an allowed move, ie. (I know there are other better methods for this, but I want to learn about NN) I have a neural network playing tic-tac-toe.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |