Skip to content

ArkMaster123/gesturecode

Repository files navigation

GestureCode

   .-~~~-.
  /       \
 | GESTURE |
 |  CODE   |
  \       /
   `-----'
     ||
     ||  πŸ€– AI Assistant
     ||
  πŸ‘‹ Wave β€’ Pinch β€’ Point πŸ‘‹
     Create β€’ Scale β€’ Select

A revolutionary AI-powered development environment where you build 3D visualizations and code through natural hand gestures and voice commands. Using real-time computer vision, MediaPipe hand tracking, and Gemini AI, this tool lets you create code by moving your hands in space.

What You Can Do

  • Gesture-Based Coding: Move your hands to create 3D objects and visualizations instantly
  • Voice Collaboration: Talk to your AI sidekick for help and complex operations
  • Real-Time Feedback: See your hand movements tracked live on camera
  • Sandbox Execution: Code runs safely in an isolated environment
  • Permission-Based Building: AI asks before completing complex tasks

Key Features

Hand Tracking & Gestures

  • Real-time hand detection with 21 landmark tracking per hand
  • Visual feedback showing hand positions and connections
  • Gesture interpretation for code generation
  • Support for both hands simultaneously

AI-Powered Development

  • Gemini Realtime for voice interaction and vision analysis
  • Code execution in sandboxed Python environment
  • 3D visualization with matplotlib
  • Collaborative building with explicit user permission

Multiple Applications

  • Code Builder: Main hand-controlled coding interface
  • Posture Analyzer: Ergonomic monitoring for desk work

Quick Start

Prerequisites

Installation

  1. Clone and install dependencies:
git clone <repository-url>
cd gesturecode
uv sync
  1. Set up environment variables:
cp .env.example .env
# Edit .env with your API keys
  1. Run any example:
# Main gesture-controlled code builder
uv run python -m gesturecode.hand_code_builder

# Lightweight focus tracker
uv run python -m gesturecode.focus_tracker

# Desk posture analyzer
uv run python -m gesturecode.desk_posture_analyzer

How to Use

    Gesture Controls:
     _____
    /     \
   |  PINCH |     Move hands close β†’ CREATE
    \_____/
       ||
       ||  Move apart β†’ SCALE
       ||
    /\_/\  Point β†’ SELECT
   ( o.o )
    > ^ <   Drag to πŸ—‘οΈ β†’ DELETE

Basic Usage

  1. Start the application - Your browser will open with the video interface
  2. Grant camera/microphone permissions when prompted
  3. Move your hands - You'll see real-time tracking with landmarks and connections
  4. Gesture to build - Pinch with both hands close together to create objects
  5. Talk to AI - Ask "Can you help me build a sphere?" for complex operations

Gesture Controls

  • Pinch both hands together β†’ Create new object instantly
  • Move hands apart β†’ Scale object bigger/smaller
  • Point with index finger β†’ Select objects
  • Drag to red bin β†’ Delete objects
  • Voice commands β†’ Complex operations and AI assistance

AI Collaboration

The AI sidekick will:

  • Ask permission before completing complex tasks
  • Understand context of what you're building
  • Generate code based on your gestures and requests
  • Execute code safely in sandboxed environment

Architecture

Core Components

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Hand Tracking β”‚    β”‚   AI Processing  β”‚    β”‚  Code Execution β”‚
β”‚   MediaPipe     │───▢│   Gemini Realtime│───▢│   CodeBox       β”‚
β”‚   (21 landmarks)β”‚    β”‚   + Vision       β”‚    β”‚   Sandbox       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚                       β”‚
         β–Ό                       β–Ό                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Video Overlay   β”‚    β”‚ Voice Commands   β”‚    β”‚ 3D Visualizationβ”‚
β”‚ Stream Edge     β”‚    β”‚ Permission Flow  β”‚    β”‚ Matplotlib      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Technologies

  • Vision Agents: Real-time AI framework with video processing
  • MediaPipe Hands: 21-landmark hand tracking at 30 FPS
  • Gemini Realtime: Multimodal AI with vision and voice
  • Stream Edge: Low-latency video infrastructure
  • CodeBox API: Sandboxed Python code execution
  • Matplotlib 3D: Real-time visualization rendering

βš™οΈ Customization

Adjusting Performance

Change processing FPS (higher = more responsive, more expensive):

# In src/hand_code_builder.py
llm = gemini.Realtime(fps=5)  # Default: 5 FPS for responsive gesture tracking
# or
llm = gemini.Realtime(fps=10)  # Higher FPS for very fast interactions

Adjust hand tracking FPS:

# In src/hand_code_builder.py
MediaPipeHandsProcessor(fps=30)  # Default: 30 FPS tracking

Using Different AI Models

Switch to OpenAI instead of Gemini:

from vision_agents.plugins import openai

llm = openai.Realtime(fps=5)

Modifying AI Behavior

Edit the AI instructions in docs/ai/builder_sidekick.md to change:

  • Personality and communication style
  • Permission requirements and safety boundaries
  • Code generation patterns and preferences
  • Gesture interpretation rules

Alternative Applications

Included Examples

Desk Posture Analyzer

Monitor ergonomics and posture with voice feedback:

uv run python -m gesturecode.desk_posture_analyzer

Features:

  • Real-time posture analysis with YOLO pose detection
  • Ergonomic coaching and feedback
  • Workstation setup recommendations

Focus Tracker

Monitor presence and work patterns with gentle productivity coaching:

uv run python -m gesturecode.focus_tracker

Features:

  • Pose detection for presence monitoring at your desk
  • Work session tracking and break pattern analysis
  • Gentle productivity reminders and insights
  • Supportive coaching focused on healthy work habits

Documentation

Getting Started

Development

Features

Contributing

We welcome contributions! Please see our Contributing Guide for details.

License

This project is open source and available under the MIT License.

Acknowledgments

About

AI-powered gesture-controlled coding with real-time 3D visualization

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages