A Python/PyTorch implementation of the classic Snake game with Q-Learning reinforcement learning agent.
- Python 3.13+
brew install python@3.13sudo apt update
sudo apt install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.13 python3.13-pip python3.13-venvwinget install Python.Python.3.13python3.13 --versionpython3.13 -m pip install -r requirements.txtpython3.13 train.pypython3.13 train.py --headlesspython3.13 train.py --headless --speed 1000 --max-games 500--headless: Run without GUI for faster training--speed: Game speed in FPS (default: 40)--max-games: Maximum number of games to train--target-score: Stop training when this score is reached--lr: Learning rate (default: 0.001)--gamma: Discount factor for Q-learning (default: 0.9)--hidden-size: Neural network hidden layer size (default: 256)--memory-size: Replay memory size (default: 100,000)--batch-size: Training batch size (default: 64)--train-frequency: Train every N steps (default: 4)--min-samples: Minimum samples before training starts (default: 100)
python3.13 train.py --headless --speed 1000 --max-games 1000
python3.13 train.py --headless --target-score 50
python3.13 train.py --lr 0.0005 --gamma 0.95 --hidden-size 512python3.13 play.pypython3.13 play.py --num-games 10python3.13 play.py --speed 10--speed: Game speed in FPS (default: 20)--num-games: Number of games to play (default: 1)--model: Path to model file (default: model.pth)--infinite: Play infinitely (ignores --num-games)--delay: Delay in seconds between games (default: 1.0)
# Play 5 games
python3.13 play.py --num-games 5
# Play infinitely (Ctrl+C to stop)
python3.13 play.py --infinite
# Play fast with no delay
python3.13 play.py --infinite --delay 0 --speed 100
# Use a different model
python3.13 play.py --model my_model.pthThe agent uses Deep Q-Learning with experience replay to learn optimal snake behavior. The neural network takes a 12-dimensional state vector as input:
- 3 danger indicators (straight, right, left)
- 4 direction indicators (left, right, up, down)
- 4 food location indicators (left, right, up, down relative to head)
- 1 normalized taxicab distance to food
The agent outputs Q-values for 3 possible actions:
- Continue straight
- Turn right
- Turn left
- The agent explores using epsilon-greedy strategy
- Experiences are stored in replay memory
- The network is trained on batches of past experiences
- The best model (highest score) is automatically saved
- +10 for eating food
- -10 for collision (wall or self)
- +0.1 for moving closer to food (encourages goal-directed behavior)
- -0.1 for moving away from food (discourages loops and wandering)
snake.py: Snake game environment implementationagent.py: Q-Learning neural network and agenttrain.py: Training script with customizable parametersplay.py: Script to watch trained agent playmodel.pth: Saved model weights (created after training)