A Docker Compose application that lets you watch and interact with a browser controlled by an AI agent. Features hybrid control where you can both chat with the agent and directly interact with the browser.
┌─────────────────────────────────────────────────────────────────┐
│ User Browser │
│ ┌──────────────────────┐ ┌─────────────────────────────────┐ │
│ │ noVNC Viewer │ │ Chat Interface │ │
│ │ (Browser View) │ │ (Agent Communication) │ │
│ └──────────────────────┘ └─────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│ │
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────────────────┐
│ playwright-browser │ │ nextjs-webapp │
│ - Xvfb + x11vnc │ │ - Next.js 14 │
│ - noVNC │ │ - noVNC embed │
│ - Chromium │ │ - Chat UI │
│ - Playwright MCP │ │ │
└─────────────────────────┘ └─────────────────────────────────────┘
│ │
└──────────────────────────┘
│
┌─────────────────────────────────┐
│ python-agent │
│ - FastAPI WebSocket │
│ - Gemini 3 Pro │
│ - MCP Client │
└─────────────────────────────────┘
- Docker and Docker Compose
- Google Gemini API key
-
Clone the repository
git clone <repo-url> cd watch-and-learn
-
Set up environment variables
cp .env.example .env # Edit .env and add your GEMINI_API_KEY -
Build and start the services
docker-compose up --build
-
Open the application
- Navigate to http://localhost:3000
- You'll see the browser view on the left and chat on the right
| Service | Port | Description |
|---|---|---|
| nextjs-webapp | 3000 | Main web interface |
| playwright-browser | 6080 | noVNC web interface (direct access) |
| python-agent | 8000 | WebSocket API for agent |
Type commands in the chat window to control the browser:
- "Go to google.com"
- "Search for weather in New York"
- "Click on the first link"
- "What's on the screen?"
Click and type directly in the browser view panel. The AI agent can observe and respond to your actions.
Playwright Browser:
cd services/playwright-browser
docker build -t playwright-browser .
docker run -p 6080:6080 -p 3001:3001 playwright-browserNext.js Webapp:
cd services/nextjs-webapp
npm install
npm run devPython Agent:
cd services/python-agent
pip install -r requirements.txt
GEMINI_API_KEY=your_key python main.py| Variable | Description | Required |
|---|---|---|
| GEMINI_API_KEY | Google Gemini API key | Yes |
MIT