A complete implementation of the AlphaGo Zero algorithm applied to Tic-Tac-Toe, featuring a neural network with residual connections, Monte Carlo Tree Search (MCTS), and self-play training.
This implementation follows the AlphaGo Zero paper architecture adapted for Tic-Tac-Toe:
- Input: 5-channel 3×3 representation of the board
- Channel 0: Current player's pieces
- Channel 1: Opponent's pieces
- Channel 2: Empty squares
- Channel 3: Current player indicator
- Channel 4: History plane
- Architecture:
- First convolutional layer (32 filters, 3×3)
- Residual tower (10 residual blocks)
- Policy head (outputs action probabilities)
- Value head (outputs position evaluation)
- UCB1 exploration with neural network priors
- Temperature-based action selection
- Proper handling of player perspectives
- Self-play data generation
- Neural network training on generated data
- Model evaluation and selection
- Iterative improvement
alphaTacToe/
├── game.py # Tic-tac-toe game logic
├── mcts.py # Monte Carlo Tree Search implementation
├── train.py # Training loop
├── play.py # Command-line interface for playing
├── neural_network_components/ # Neural network components
│ ├── alphaTacToeNET.py # Main network class
│ ├── first_covl_layer.py # First convolution layer
│ ├── residual_tower.py # Residual tower
│ ├── residual_block.py # Individual residual blocks
│ ├── policy_head.py # Policy head
│ └── value_head.py # Value head
├── test_system.py # System tests
├── test_cli.py # CLI tests
├── quick_train.py # Quick training demo
├── create_demo_model.py # Create demo model
└── checkpoints/ # Saved models directory
pip install torch numpy# Run comprehensive system tests
PYTHONPATH=/home/{user}/alphaTacToe python test_system.py
# Test the neural network architecture
PYTHONPATH=/home/{user}/alphaTacToe python network_test.py# Quick training (3 iterations for testing)
PYTHONPATH=/home/{user}/alphaTacToe python quick_train.py
# Full training (50 iterations)
PYTHONPATH=/home/{user}/alphaTacToe python train.py# Create a demo model for testing
PYTHONPATH=/home/{user}/alphaTacToe python create_demo_model.py
# Launch the CLI game
PYTHONPATH=/home/{user}/alphaTacToe python play.pyThe CLI offers several game modes:
- Play against AI - Challenge the trained model
- AI vs AI - Watch two AIs battle it out
- Human vs Human - Classic two-player mode
- Load different models - Try different training checkpoints
- Adjust settings - Change AI difficulty (MCTS simulations)
Positions are numbered 0-8:
0 1 2
0 0 1 2
1 3 4 5
2 6 7 8
Edit the config in train.py:
num_iterations: Number of training iterationsnum_episodes: Self-play games per iterationnum_mcts_sims: MCTS simulations per movebatch_size: Training batch sizelearning_rate: Neural network learning rateeval_threshold: Win rate needed to accept new model
In the CLI, you can adjust AI strength:
- Easy: 10 MCTS simulations
- Medium: 25 MCTS simulations
- Hard: 50 MCTS simulations
- Expert: 100 MCTS simulations
- Custom: Any number you choose
Run tests to verify everything works:
# Full system test
PYTHONPATH=/home/{user}/alphaTacToe python test_system.py
# CLI components test
PYTHONPATH=/home/{user}/alphaTacToe python test_cli.py
# Neural network test
PYTHONPATH=/home/{user}/alphaTacToe python network_test.py- Residual connections for better gradient flow
- Batch normalization for training stability
- Separate policy and value heads following AlphaGo architecture
- Softmax policy output for action probabilities
- Tanh value output for position evaluation
- UCB1 exploration with neural network priors
- Temperature control for exploration vs exploitation
- Proper perspective handling for two-player games
- Efficient tree traversal and backpropagation
- Self-play data generation with temperature annealing
- Experience replay with circular buffer
- Model evaluation against previous versions
- Checkpoint saving for model persistence
- GPU acceleration when available
With proper training (50+ iterations), the AI should:
- Beat random players consistently
- Learn basic tic-tac-toe strategy
- Avoid obvious mistakes
- Play optimally in many positions
Note: Tic-tac-toe is a solved game (draw with perfect play), so a perfectly trained AI should never lose and achieve draws against optimal opponents.
- Train longer: Run more iterations for better performance
- Experiment with architecture: Try different network sizes
- Tune hyperparameters: Optimize learning rates and MCTS parameters
- Scale to larger games: Apply to 4×4 tic-tac-toe or other games
- Add analysis tools: Visualize training progress and game analysis
Feel free to:
- Report bugs or issues
- Suggest improvements
- Add new features
- Optimize performance
- Improve documentation
This project is for educational purposes. The AlphaGo Zero algorithm is described in the DeepMind paper "Mastering the Game of Go without Human Knowledge".
Happy Gaming! 🎮
Built with PyTorch and lots of coffee ☕