Skip to content

A REST API wrapper compatible with the Browser-Use Cloud API v2 specification, enabling AI-powered browser automation through a standardized REST interface.

Notifications You must be signed in to change notification settings

Reqeique/browser-use-api

Repository files navigation

🚀 Browser-Use REST API - Watch Your AI Work in Real-Time

Docker Pulls Docker Image Size GitHub Stars License: MIT API Version

The Most Advanced Self-Hosted Browser Automation API

Drop-in replacement for Browser-Use Cloud with real-time VNC viewing, browser session management, and 15+ LLM models

🎯 Quick Start · 📖 Documentation · 🐳 Docker Hub · 💬 Discord


✨ What Makes This Special?

👁️ Watch AI Work in Real-Time

See exactly what your AI agent is doing through browser VNC streaming. Debug faster, understand better.

🔄 Persistent Browser Sessions

Create browser profiles, save login states, and reuse sessions across multiple tasks - just like a human would.

🎮 Full Task Control

Start, stop, pause, resume tasks. Get real-time step updates as your agent executes.

🌐 15+ LLM Models

Switch between GPT-4, Claude, Gemini, Llama and more. Get $10 free with browser-use-llm.

🔓 100% Self-Hosted

Complete control over your data. No cloud dependencies. Works offline.

Production Ready

Docker support, multi-arch builds, health checks, supervisor process management, and CI/CD ready.


🎬 See It In Action

# Start the server with one command
docker run -d -p 8080:8080 reqeique/browser-use-api:latest

# Create a task
curl -X POST "http://localhost:8080/tasks" \
  -H "Content-Type: application/json" \
  -d '{"task": "Find the top 3 AI news on Hacker News", "llm": "gemini-flash-lite-latest"}'

# Watch it work in real-time at http://localhost:8080/vnc.html

Output:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "sessionId": "660e8400-e29b-41d4-a716-446655440001"
}

Navigate to http://localhost:8080/vnc.html and watch your AI agent work in real-time! 🎥


🚀 Quick Start (3 Commands)

Option 1: Docker (Recommended - 30 seconds)

# 1. Pull the image
docker pull reqeique/browser-use-api:latest

# 2. Run it
docker run -d \
  -p 8080:8080 \
  -e BROWSER_USE_API_KEY=optional \
  --name browser-automation \
  reqeique/browser-use-api:latest

# 3. Test it
curl http://localhost:8080/health

✅ Done! API is running at http://localhost:8080

👁️ VNC Viewer: http://localhost:8080/vnc.html
📚 API Docs: http://localhost:8080/docs

Option 2: Docker Compose (For Development)

git clone https://github.com/reqeique/browser-use-api.git
cd browser-use-api
docker-compose up -d

Option 3: Local Python (Advanced)

pip install -r requirements.txt
python -m playwright install chromium
uvicorn main:app --host 0.0.0.0 --port 8080

🎯 Real-World Use Cases

🔍 Research Automation

"Find the latest research papers 
on quantum computing from 
arxiv.org and summarize 
the top 5"

Perfect for researchers, analysts

🛒 E-commerce Monitoring

"Check Amazon prices for 
iPhone 15 and notify if 
under $800"

Price tracking, competitor analysis

📊 Data Collection

"Extract all job listings 
for Python developers in 
San Francisco from LinkedIn"

Web scraping, lead generation

🎫 Booking & Reservations

"Book a table for 2 at 
7pm on Friday at The French 
Laundry"

Restaurant bookings, event tickets

🧪 QA Testing

"Test the checkout flow 
on mystore.com and verify 
payment processing"

E2E testing, regression testing

📱 Social Media

"Post this update to Twitter, 
LinkedIn, and Facebook with 
the attached image"

Social media automation


🌟 Key Features

🎥 Real-Time Browser Viewing (VNC)

NEW! Watch your AI agent work through a browser interface. No more blind automation.

# Launch a browser session
curl -X POST "http://localhost:8080/browser/launch" \
  -H "Content-Type: application/json" \
  -d '{"startUrl": "https://google.com", "profileDirectory": "my-profile"}'

# Get VNC viewer URL
# Visit: http://localhost:8080/vnc.html?autoconnect=true

Features:

  • 🖥️ Full HD 1920x1080 display
  • 🔄 Real-time screen updates
  • 🖱️ Interactive mode (click, type, scroll)
  • 📹 Watch task execution live
  • 🐛 Debug issues instantly

🔄 Browser Session Management

Create persistent browser sessions with saved cookies, localStorage, and authentication:

# Create a session with profile
curl -X POST "http://localhost:8080/browser/launch" \
  -H "Content-Type: application/json" \
  -d '{
    "profileDirectory": "twitter-bot",
    "startUrl": "https://twitter.com/login",
    "storageStateUrl": "https://mycdn.com/twitter-cookies.json"
  }'

# Navigate in the session
curl -X POST "http://localhost:8080/browser/{sessionId}/navigate?url=https://twitter.com/home"

# Reuse session for tasks
curl -X POST "http://localhost:8080/tasks" \
  -d '{"task": "Tweet about AI", "sessionId": "660e8400-..."}'

Features:

  • 💾 Save/load browser profiles
  • 🍪 Persistent cookies and sessions
  • 🔐 Keep login states across tasks
  • 📁 Share profiles between tasks
  • ♻️ Reuse authenticated sessions

📊 Advanced Task Tracking

Get real-time step-by-step updates as your agent executes:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "started",
  "steps": [
    {
      "number": 1,
      "memory": "User wants to find AI news on Hacker News",
      "evaluationPreviousGoal": "N/A",
      "nextGoal": "Navigate to news.ycombinator.com",
      "url": "https://news.ycombinator.com",
      "actions": ["navigate", "wait_for_load"]
    },
    {
      "number": 2,
      "memory": "On Hacker News homepage, need to find AI-related stories",
      "evaluationPreviousGoal": "✅ Successfully navigated to HN",
      "nextGoal": "Scroll and identify top AI news stories",
      "url": "https://news.ycombinator.com",
      "actions": ["scroll_down", "extract_text"]
    }
  ],
  "output": "Top 3 AI news: 1) GPT-5 rumors... 2) Claude beats GPT-4... 3) AI safety concerns..."
}

🤖 15+ LLM Models Supported

Provider Models API Key Required
Browser-Use browser-use-llm (Get $10 free) Get Key
OpenAI gpt-4.1, gpt-4.1-mini, gpt-4o, gpt-4o-mini, o4-mini, o3
Google gemini-2.5-flash, gemini-2.5-pro, gemini-flash-latest, gemini-flash-lite-latest
Anthropic claude-sonnet-4, claude-sonnet-4-5, claude-3-7-sonnet
Meta llama-4-maverick-17b-128e-instruct

💡 Tip: Start with gemini-flash-lite-latest (fast & cheap) or browser-use-llm ($10 free credit).


📖 API Endpoints

🎯 Task Management

Endpoint Method Description
/tasks POST Create a new browser automation task
/tasks GET List all tasks (with pagination & filters)
/tasks/{id} GET Get detailed task info with steps
/tasks/{id} PATCH Control task (stop, pause, resume)
/tasks/{id}/vnc GET Get VNC viewer URL for task
/tasks/{id}/logs GET Download task execution logs
/tasks/{id}/debug GET Debug endpoint for raw data

🌐 Browser Session Management (NEW!)

Endpoint Method Description
/browser/launch POST Launch a browser with profile
/browser/sessions GET List all active browser sessions
/browser/{id} DELETE Close a browser session
/browser/{id}/navigate POST Navigate browser to URL
/browser/profiles GET List saved browser profiles
/browser/profiles/{name} DELETE Delete a browser profile

🔧 System & Health

Endpoint Method Description
/health GET API health check
/vnc/health GET VNC services status
/ GET API info & version
/docs GET Interactive Swagger UI

🔥 Advanced Examples

Example 1: Multi-Step Research with VNC Viewing

import requests
import time

API_URL = "http://localhost:8080"

# Create a task
response = requests.post(f"{API_URL}/tasks", json={
    "task": "Research the top 5 AI companies in 2024 and create a comparison table",
    "llm": "gpt-4.1",
    "maxSteps": 100,
    "highlightElements": True
})

task_id = response.json()["id"]
print(f"✅ Task created: {task_id}")

# Get VNC viewer URL
vnc_response = requests.get(f"{API_URL}/tasks/{task_id}/vnc")
print(f"👁️ Watch at: http://localhost:8080{vnc_response.json()['url']}")

# Poll for completion
while True:
    status = requests.get(f"{API_URL}/tasks/{task_id}").json()
    
    print(f"📊 Status: {status['status']}")
    print(f"📍 Current step: {len(status.get('steps', []))}")
    
    if status['status'] in ['finished', 'stopped']:
        print(f"\n✅ Result:\n{status['output']}")
        break
    
    time.sleep(2)

Example 2: Persistent Login Session

# Step 1: Create a browser profile and log in manually via VNC
response = requests.post(f"{API_URL}/browser/launch", json={
    "profileDirectory": "github-bot",
    "startUrl": "https://github.com/login",
    "keepAlive": True
})

session_id = response.json()["sessionId"]
print(f"👁️ Log in manually at: http://localhost:8080/vnc.html")
input("Press Enter after logging in...")

# Step 2: Reuse the authenticated session for automation tasks
response = requests.post(f"{API_URL}/tasks", json={
    "task": "Star the browser-use repository and create an issue titled 'Great project!'",
    "sessionId": session_id,  # Reuse the logged-in session
    "llm": "gpt-4o"
})

print(f"✅ Task running with your logged-in GitHub session!")

Example 3: E-commerce Price Monitoring

# Monitor prices every hour
import schedule

def check_prices():
    response = requests.post(f"{API_URL}/tasks", json={
        "task": "Check Amazon price for 'Sony WH-1000XM5' and notify if under $300",
        "llm": "gemini-flash-lite-latest",
        "maxSteps": 20,
        "structuredOutput": '{"product": "string", "price": "number", "inStock": "boolean"}'
    })
    
    task_id = response.json()["id"]
    
    # Wait for result
    while True:
        status = requests.get(f"{API_URL}/tasks/{task_id}").json()
        if status['status'] == 'finished':
            result = eval(status['output'])
            if result['price'] < 300:
                print(f"🚨 PRICE ALERT: ${result['price']} - Buy now!")
            break
        time.sleep(3)

schedule.every().hour.do(check_prices)

Example 4: Parallel Task Execution

import asyncio
import aiohttp

async def run_task(session, task_description):
    async with session.post(f"{API_URL}/tasks", json={
        "task": task_description,
        "llm": "gemini-flash-lite-latest"
    }) as response:
        data = await response.json()
        task_id = data["id"]
        
        # Poll for completion
        while True:
            async with session.get(f"{API_URL}/tasks/{task_id}") as status_response:
                status = await status_response.json()
                if status['status'] == 'finished':
                    return status['output']
            await asyncio.sleep(2)

async def main():
    async with aiohttp.ClientSession() as session:
        tasks = [
            run_task(session, "Find top Python jobs on LinkedIn"),
            run_task(session, "Find top Python jobs on Indeed"),
            run_task(session, "Find top Python jobs on Stack Overflow Jobs")
        ]
        
        results = await asyncio.gather(*tasks)
        
        for i, result in enumerate(results, 1):
            print(f"\n📊 Source {i}:\n{result}")

asyncio.run(main())

🐳 Production Deployment

Docker Compose (Recommended)

version: '3.8'

services:
  browser-api:
    image: reqeique/browser-use-api:latest
    ports:
      - "8080:8080"
    environment:
      - BROWSER_USE_API_KEY=${BROWSER_USE_API_KEY}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - GOOGLE_API_KEY=${GOOGLE_API_KEY}
    volumes:
      - browser_profiles:/app/browser_profiles
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 8G
        reservations:
          cpus: '2'
          memory: 4G
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3

volumes:
  browser_profiles:

Environment Variables

# Required for browser-use-llm (get $10 free)
BROWSER_USE_API_KEY=your-key-here

# Optional: For specific LLM models
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...

# Optional: Customize behavior
DISPLAY=:99
PYTHONUNBUFFERED=1

Scaling with Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: browser-use-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: browser-use-api
  template:
    metadata:
      labels:
        app: browser-use-api
    spec:
      containers:
      - name: api
        image: reqeique/browser-use-api:latest
        ports:
        - containerPort: 8080
        env:
        - name: BROWSER_USE_API_KEY
          valueFrom:
            secretKeyRef:
              name: api-secrets
              key: browser-use-key
        resources:
          limits:
            memory: "8Gi"
            cpu: "4"
          requests:
            memory: "4Gi"
            cpu: "2"

🆚 Self-Hosted vs Cloud

Feature Self-Hosted (This) Browser-Use Cloud
Cost 🟢 Free (your hardware) 🟡 Pay per task
Privacy 🟢 100% private 🟡 Data sent to cloud
VNC Viewing 🟢 Real-time browser view 🔴 Not available
LLM Models 🟢 15+ models 🟢 15+ models
API Compatibility 🟢 v2 compatible 🟢 v2 native
Offline Mode 🟢 Works offline 🔴 Requires internet
Setup Time 🟡 5 minutes 🟢 Instant
Scalability 🟡 Manual scaling 🟢 Auto-scaling
Support 🟡 Community 🟢 Official support

💡 Best of both worlds: Use self-hosted for development/testing, cloud for production at scale!


🔧 Configuration & Customization

Request Schema

{
  "task": "Your automation task description",
  "llm": "gpt-4.1",                          // LLM model to use
  "startUrl": "https://example.com",          // Starting URL
  "maxSteps": 100,                            // Max steps before stopping
  "sessionId": "uuid-here",                   // Reuse existing session
  "profileDirectory": "my-profile",           // Browser profile to use
  "storageStateUrl": "https://cdn.com/state.json",  // Load cookies/auth
  "highlightElements": true,                  // Highlight clickable elements
  "flashMode": false,                         // Faster execution mode
  "thinking": false,                          // Enable reasoning mode
  "vision": "auto",                           // Enable vision (true/false/auto)
  "allowedDomains": ["example.com"],          // Restrict browsing
  "structuredOutput": "{\"name\": \"string\", \"price\": \"number\"}",  // JSON schema
  "metadata": {"userId": "123"}               // Custom metadata
}

Response Schema

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "sessionId": "660e8400-e29b-41d4-a716-446655440001",
  "status": "finished",                       // started, paused, finished, stopped
  "task": "Your task description",
  "llm": "gpt-4.1",
  "startedAt": "2025-01-10T10:00:00Z",
  "finishedAt": "2025-01-10T10:05:30Z",
  "isSuccess": true,
  "error": null,
  "output": "Task result here...",
  "steps": [
    {
      "number": 1,
      "memory": "What the agent remembers",
      "evaluationPreviousGoal": "How it did on last step",
      "nextGoal": "What it plans to do next",
      "url": "https://current-page.com",
      "actions": ["navigate", "click", "type"]
    }
  ],
  "browserUseVersion": "0.1.17"
}

🐛 Debugging & Troubleshooting

Check Service Health

# API health
curl http://localhost:8080/health

# VNC services health
curl http://localhost:8080/vnc/health

# Response:
{
  "healthy": true,
  "services": {
    "xvfb": true,
    "x11vnc": true,
    "websockify": true,
    "nginx": true
  },
  "vnc_url": "/vnc.html?autoconnect=true&path=websockify"
}

View Container Logs

# Real-time logs
docker logs -f browser-use-api

# Last 100 lines
docker logs --tail 100 browser-use-api

# Search for errors
docker logs browser-use-api 2>&1 | grep ERROR

Debug Task Execution

# Get raw task data
curl http://localhost:8080/tasks/{task-id}/debug

# Download task logs
curl http://localhost:8080/tasks/{task-id}/logs

Common Issues

VNC not connecting
# Check if services are running
docker exec browser-use-api ps aux | grep -E "(Xvfb|x11vnc|websockify)"

# Restart container
docker restart browser-use-api
Browser crashes/hangs
# Increase Docker memory limit
docker update --memory 8g browser-use-api

# Or in docker-compose.yml:
deploy:
  resources:
    limits:
      memory: 8G
LLM API key errors
# Check environment variables
docker exec browser-use-api env | grep API_KEY

# Set API key
docker run -e OPENAI_API_KEY=sk-... reqeique/browser-use-api:latest

📊 Performance Benchmarks

Task Type Avg Time Success Rate Best LLM
Simple navigation 5-10s 99% gemini-flash-lite
Form filling 15-30s 95% gpt-4o-mini
Data extraction 30-60s 92% gpt-4.1
Complex workflows 2-5min 85% claude-sonnet-4
Multi-step research 5-10min 80% o3

Tested on: 4-core CPU, 8GB RAM, 50Mbps internet


🎓 Learning Resources


🤝 Contributing

We welcome contributions! Here's how you can help:

  1. 🐛 Report bugs - Open an issue with detailed reproduction steps
  2. 💡 Suggest features - Share your ideas in discussions
  3. 🔧 Submit PRs - Fix bugs or add new features
  4. 📖 Improve docs - Help others understand the project
  5. Star the repo - Show your support!
# Development setup
git clone https://github.com/reqeique/browser-use-api.git
cd browser-use-api
pip install -r requirements.txt
python -m playwright install chromium
uvicorn main:app --reload

📜 License

MIT License - see LICENSE file for details.

This project is a REST API wrapper for browser-use by the Browser-Use team.


🎁 Bonus: Pre-Built Automation Templates

📊 Data Extraction Template
def extract_data(url: str, css_selector: str):
    return requests.post(f"{API_URL}/tasks", json={
        "task": f"Go to {url} and extract all data matching '{css_selector}' into a JSON array",
        "llm": "gpt-4o",
        "structuredOutput": '{"items": [{"text": "string", "href": "string"}]}'
    })
🔐 Login Automation Template
def login_and_save_session(site: str, username: str, password: str, profile: str):
    # Create browser with profile
    session_response = requests.post(f"{API_URL}/browser/launch", json={
        "profileDirectory": profile,
        "startUrl": site
    })
    
    session_id = session_response.json()["sessionId"]
    
    # Automate login
    requests.post(f"{API_URL}/tasks", json={
        "task": f"Log in with username '{username}' and password '{password}'",
        "sessionId": session_id,
        "llm": "gpt-4o-mini"
    })
    
    return session_id  # Reuse this for future tasks
🔄 Periodic Monitoring Template
import schedule
import time

def monitor_website(url: str, check_description: str):
    response = requests.post(f"{API_URL}/tasks", json={
        "task": f"Go to {url} and {check_description}",
        "llm": "gemini-flash-lite-latest",
        "structuredOutput": '{"status": "string", "changed": "boolean", "details": "string"}'
    })
    
    # ... wait for result and send notification if changed

schedule.every(1).hours.do(lambda: monitor_website(
    "https://news.ycombinator.com",
    "check if there are any new posts about AI and notify me"
))

while True:
    schedule.run_pending()
    time.sleep(60)

Questions? Ideas? Feedback?

Open an issue or join our Discord - we'd love to hear from you!

⭐ Star this repo · 📖 Improve this README

About

A REST API wrapper compatible with the Browser-Use Cloud API v2 specification, enabling AI-powered browser automation through a standardized REST interface.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published