Skip to content

Unity3D simulation environment for training autonomous driving agents using Proximal Policy Optimization (PPO).

Notifications You must be signed in to change notification settings

Spectrewolf8/PPO_AutoDRW_Unity3d_GameWorld

Repository files navigation

image

PPO AutoDRV - Unity3D Game World

Unity3D simulation environment for training autonomous driving agents using Proximal Policy Optimization (PPO). This project communicates with a Python backend via ZeroMQ to enable reinforcement learning training.

Overview

This Unity project provides the simulation environment where an autonomous driving agent learns to navigate. It sends sensor data to a Python training backend and receives steering commands in return.

Key capabilities:

  • Physics-based 3D driving simulation
  • 5-ray sensor system for obstacle detection
  • Real-time ZeroMQ communication with training backend
  • Reward collection and collision detection
  • Episode management and automatic resets

Prerequisites

Installation

  1. Clone this repository:
git clone https://github.com/Spectrewolf8/PPO_AutoDRW_Unity3d_GameWorld.git
  1. Open the project in Unity Hub

  2. Install the Python backend (required):

git clone https://github.com/Spectrewolf8/PPO_RL_AutoDRV_Compute_Backend.git
cd PPO_RL_AutoDRV_Compute_Backend
pip install -r requirements.txt

Quick Start

  1. Start the Python backend server:
cd PPO_RL_AutoDRV_Compute_Backend
python app.py
  1. Open the Unity project and press Play

  2. The simulation will connect to the backend at 127.0.0.1:65432

Architecture

Unity sends sensor data to the Python backend via ZeroMQ and receives steering commands:

Unity (Client) <--ZeroMQ--> Python Backend (Server)
   |                              |
   |-- Sends: Ray distances       |-- PPO Model
   |-- Sends: Speed               |-- Training/Inference
   |-- Sends: Collisions          |-- Checkpoint System
   |                              |
   |-- Receives: Steering (-1/0/1)

Communication: REQ/REP pattern over tcp://127.0.0.1:65432

Project Structure

Assets/
├── Scenes/                      # Unity scenes
├── Scripts/                     # C# scripts
│   ├── CarController.cs
│   ├── CarRaycastsController.cs
│   ├── CommunicationController.cs
│   ├── GameController.cs
│   ├── RewardController.cs
│   ├── CarRespawnController.cs
│   └── OverviewCameraController.cs
├── Prefabs/                     # Prefabs
├── Materials/                   # Materials
└── models/                      # 3D models

Core Components

CarController.cs - Vehicle physics and steering

CarRaycastsController.cs - 5-ray sensor system for obstacle detection

CommunicationController.cs - ZeroMQ client for backend communication

GameController.cs - Game state and episode management

RewardController.cs - Collectible reward items

CarRespawnController.cs - Episode reset logic

Communication Protocol

Connection

  • Protocol: ZeroMQ REQ/REP
  • Address: 127.0.0.1:65432
  • Format: JSON messages

Unity to Python (Game State)

{
  "rays": [7.0, 4.5, 4.5, 3.5, 3.5],
  "ray_hits": [0, 1, 0, 1, 0],
  "speed": 1.25,
  "collision": false,
  "reward_collected": 0,
  "done": false,
  "episode": 1
}

Python to Unity (Action)

{
  "type": "action",
  "steering": 0
}

Steering values: -1 (left), 0 (straight), 1 (right)

Ray Sensor Layout

  • Ray 0: Forward (max 7.0 units)
  • Ray 1: Forward-Left (max 4.5 units)
  • Ray 2: Forward-Right (max 4.5 units)
  • Ray 3: Right (max 3.5 units)
  • Ray 4: Left (max 3.5 units)

See CommunicationDesign.md for complete protocol specification.

Configuration

Environment parameters are synchronized from the Python backend during connection. To modify settings, edit config.json in the backend repository.

Backend Repository

This project requires the Python training backend:

PPO_RL_AutoDRV_Compute_Backend

The backend provides:

  • PPO reinforcement learning algorithm
  • Training and inference modes
  • Model checkpointing
  • Gymnasium environment interface
  • ZeroMQ communication server

About

Unity3D simulation environment for training autonomous driving agents using Proximal Policy Optimization (PPO).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published