AlphaTacToe 🎮

A complete implementation of the AlphaGo Zero algorithm applied to Tic-Tac-Toe, featuring a neural network with residual connections, Monte Carlo Tree Search (MCTS), and self-play training.

🏗️ Architecture

This implementation follows the AlphaGo Zero paper architecture adapted for Tic-Tac-Toe:

Neural Network

Input: 5-channel 3×3 representation of the board
- Channel 0: Current player's pieces
- Channel 1: Opponent's pieces
- Channel 2: Empty squares
- Channel 3: Current player indicator
- Channel 4: History plane
Architecture:
- First convolutional layer (32 filters, 3×3)
- Residual tower (10 residual blocks)
- Policy head (outputs action probabilities)
- Value head (outputs position evaluation)

Monte Carlo Tree Search (MCTS)

UCB1 exploration with neural network priors
Temperature-based action selection
Proper handling of player perspectives

Training Loop

Self-play data generation
Neural network training on generated data
Model evaluation and selection
Iterative improvement

📁 Project Structure

alphaTacToe/
├── game.py                     # Tic-tac-toe game logic
├── mcts.py                     # Monte Carlo Tree Search implementation
├── train.py                    # Training loop
├── play.py                     # Command-line interface for playing
├── neural_network_components/  # Neural network components
│   ├── alphaTacToeNET.py      # Main network class
│   ├── first_covl_layer.py    # First convolution layer
│   ├── residual_tower.py      # Residual tower
│   ├── residual_block.py      # Individual residual blocks
│   ├── policy_head.py         # Policy head
│   └── value_head.py          # Value head
├── test_system.py             # System tests
├── test_cli.py                # CLI tests
├── quick_train.py             # Quick training demo
├── create_demo_model.py       # Create demo model
└── checkpoints/               # Saved models directory

🚀 Quick Start

1. Install Dependencies

pip install torch numpy

2. Test the System

# Run comprehensive system tests
PYTHONPATH=/home/{user}/alphaTacToe python test_system.py

# Test the neural network architecture
PYTHONPATH=/home/{user}/alphaTacToe python network_test.py

3. Train the Model

# Quick training (3 iterations for testing)
PYTHONPATH=/home/{user}/alphaTacToe python quick_train.py

# Full training (50 iterations)
PYTHONPATH=/home/{user}/alphaTacToe python train.py

4. Play Against the AI

# Create a demo model for testing
PYTHONPATH=/home/{user}/alphaTacToe python create_demo_model.py

# Launch the CLI game
PYTHONPATH=/home/{user}/alphaTacToe python play.py

🎮 How to Play

The CLI offers several game modes:

Play against AI - Challenge the trained model
AI vs AI - Watch two AIs battle it out
Human vs Human - Classic two-player mode
Load different models - Try different training checkpoints
Adjust settings - Change AI difficulty (MCTS simulations)

Board Notation

Positions are numbered 0-8:

⚙️ Configuration

Training Parameters

Edit the config in train.py:

num_iterations: Number of training iterations
num_episodes: Self-play games per iteration
num_mcts_sims: MCTS simulations per move
batch_size: Training batch size
learning_rate: Neural network learning rate
eval_threshold: Win rate needed to accept new model

AI Difficulty

In the CLI, you can adjust AI strength:

Easy: 10 MCTS simulations
Medium: 25 MCTS simulations
Hard: 50 MCTS simulations
Expert: 100 MCTS simulations
Custom: Any number you choose

🧪 Testing

Run tests to verify everything works:

# Full system test
PYTHONPATH=/home/{user}/alphaTacToe python test_system.py

# CLI components test  
PYTHONPATH=/home/{user}/alphaTacToe python test_cli.py

# Neural network test
PYTHONPATH=/home/{user}/alphaTacToe python network_test.py

🔬 Technical Details

Neural Network Features

Residual connections for better gradient flow
Batch normalization for training stability
Separate policy and value heads following AlphaGo architecture
Softmax policy output for action probabilities
Tanh value output for position evaluation

MCTS Features

UCB1 exploration with neural network priors
Temperature control for exploration vs exploitation
Proper perspective handling for two-player games
Efficient tree traversal and backpropagation

Training Features

Self-play data generation with temperature annealing
Experience replay with circular buffer
Model evaluation against previous versions
Checkpoint saving for model persistence
GPU acceleration when available

📊 Expected Performance

With proper training (50+ iterations), the AI should:

Beat random players consistently
Learn basic tic-tac-toe strategy
Avoid obvious mistakes
Play optimally in many positions

Note: Tic-tac-toe is a solved game (draw with perfect play), so a perfectly trained AI should never lose and achieve draws against optimal opponents.

🚀 Next Steps

Train longer: Run more iterations for better performance
Experiment with architecture: Try different network sizes
Tune hyperparameters: Optimize learning rates and MCTS parameters
Scale to larger games: Apply to 4×4 tic-tac-toe or other games
Add analysis tools: Visualize training progress and game analysis

🤝 Contributing

Feel free to:

Report bugs or issues
Suggest improvements
Add new features
Optimize performance
Improve documentation

📜 License

This project is for educational purposes. The AlphaGo Zero algorithm is described in the DeepMind paper "Mastering the Game of Go without Human Knowledge".

Happy Gaming! 🎮

Built with PyTorch and lots of coffee ☕

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlphaTacToe 🎮

🏗️ Architecture

Neural Network

Monte Carlo Tree Search (MCTS)

Training Loop

📁 Project Structure

🚀 Quick Start

1. Install Dependencies

2. Test the System

3. Train the Model

4. Play Against the AI

🎮 How to Play

Board Notation

⚙️ Configuration

Training Parameters

AI Difficulty

🧪 Testing

🔬 Technical Details

Neural Network Features

MCTS Features

Training Features

📊 Expected Performance

🚀 Next Steps

🤝 Contributing

📜 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
neural_network_components		neural_network_components
.gitignore		.gitignore
README.md		README.md
cleanup.sh		cleanup.sh
create_demo_model.py		create_demo_model.py
game.py		game.py
mcts.py		mcts.py
network_test.py		network_test.py
play.py		play.py
quick_train.py		quick_train.py
test_cli.py		test_cli.py
test_system.py		test_system.py
train.py		train.py
trust_the_process.md		trust_the_process.md

YuvaneshSankar/AlphaTacToe

Folders and files

Latest commit

History

Repository files navigation

AlphaTacToe 🎮

🏗️ Architecture

Neural Network

Monte Carlo Tree Search (MCTS)

Training Loop

📁 Project Structure

🚀 Quick Start

1. Install Dependencies

2. Test the System

3. Train the Model

4. Play Against the AI

🎮 How to Play

Board Notation

⚙️ Configuration

Training Parameters

AI Difficulty

🧪 Testing

🔬 Technical Details

Neural Network Features

MCTS Features

Training Features

📊 Expected Performance

🚀 Next Steps

🤝 Contributing

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages