Skip to content
View deepubuntu's full-sized avatar

Block or report deepubuntu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
deepubuntu/README.md

DeepUbuntu

Real-world autonomous vehicle datasets from challenging African and Asian terrains

DeepUbuntu is a comprehensive data collection, cleaning, and labeling platform specifically designed for autonomous vehicle development in challenging real-world conditions. Unlike traditional AV datasets that focus primarily on well-maintained roads in developed regions, DeepUbuntu captures the harsh realities of driving in Africa and Asia—unpaved roads, extreme weather, complex traffic patterns, and infrastructure variations that current autonomous vehicle systems struggle to handle.

Architecture Overview

DeepUbuntu is built on a microservices architecture designed for scalability, reliability, and real-time processing:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Web Portal    │    │   Mobile Apps   │    │   Dashcam API   │
│   (Next.js)     │    │   (React Native)│    │   (REST/GraphQL)│
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                    ┌─────────────────┐
                    │   API Gateway   │
                    │   (Kong/Nginx)  │
                    └─────────────────┘
                                 │
         ┌───────────────────────┼───────────────────────┐
         │                       │                       │
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Upload        │    │   User          │    │   Metadata      │
│   Service       │    │   Service       │    │   Service       │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                    ┌─────────────────┐
                    │   Processing    │
                    │   Pipeline      │
                    └─────────────────┘
                                 │
         ┌───────────────────────┼───────────────────────┐
         │                       │                       │
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Anonymization │    │   Quality       │    │   Segmentation  │
│   Service       │    │   Analysis      │    │   Service       │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                    ┌─────────────────┐
                    │   Storage       │
                    │   Layer         │
                    └─────────────────┘
                                 │
         ┌───────────────────────┼───────────────────────┐
         │                       │                       │
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   PostgreSQL    │    │   MongoDB       │    │   Redis         │
│   (Metadata)    │    │   (Analytics)   │    │   (Cache)       │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Technology Stack

Backend Services

  • Languages: Go, Python, Node.js
  • Frameworks: Gin (Go), FastAPI (Python), Express (Node.js)
  • Databases: PostgreSQL, MongoDB, Redis
  • Message Queue: Apache Kafka
  • Containerization: Docker, Kubernetes

Core Services

  • Upload Service: Handles video uploads and chunking
  • User Service: Authentication, authorization, and user management
  • Metadata Service: GPS, temporal, and environmental data extraction
  • Anonymization Service: AI-powered face and license plate blurring
  • Quality Analysis Service: Video quality assessment and stability metrics
  • Segmentation Service: Video categorization and scene analysis

Web Portal

  • Frontend: React/Next.js on port 3000
  • Styling: Tailwind CSS with custom design system
  • State Management: Zustand for client-side state
  • Real-time: WebSocket connections for live updates

Repository Structure

deepubuntu/
├── app/                    # Next.js app directory
│   ├── about/             # About page
│   ├── api/               # API routes
│   ├── blog/              # Blog pages
│   ├── careers/           # Careers page
│   ├── contact/           # Contact page
│   ├── enterprise/        # Enterprise solutions
│   ├── government/        # Government solutions
│   ├── legal/             # Legal pages
│   ├── products/          # Product pages
│   ├── research/          # Research pages
│   └── resources/         # Resource pages
├── components/            # React components
│   ├── canvas/           # 3D/WebGL components
│   ├── providers/        # Context providers
│   ├── sections/         # Page sections
│   └── ui/              # UI components
├── content/              # MDX content
│   ├── blog/            # Blog posts
│   └── products/        # Product documentation
├── docs/                # Documentation
├── lib/                 # Utility libraries
│   ├── hooks/           # Custom React hooks
│   ├── stores/          # State management
│   └── utils/           # Utility functions
├── public/              # Static assets
│   └── models/          # 3D models
├── scripts/             # Build and deployment scripts
├── tests/               # Test files
└── README.md           # This file

Technology Stack

Frontend

  • Framework: Next.js 14 with App Router
  • Language: TypeScript
  • Styling: Tailwind CSS with custom design system
  • 3D Graphics: Three.js for 3D models and animations
  • State Management: Zustand for client-side state
  • Real-time: WebSocket connections for live updates

Backend

  • API: RESTful APIs with GraphQL support
  • Authentication: OAuth 2.0 with JWT tokens
  • Database: PostgreSQL for metadata, MongoDB for analytics
  • Cache: Redis for session management and caching
  • File Storage: S3-compatible object storage
  • Message Queue: Apache Kafka for event processing

Infrastructure

  • Containerization: Docker with multi-stage builds
  • Orchestration: Kubernetes for scaling and management
  • Monitoring: Prometheus, Grafana, and ELK stack
  • CI/CD: GitHub Actions with automated testing
  • Deployment: Vercel for frontend, cloud-native backend

Data Processing Pipeline

The DeepUbuntu platform processes data through a comprehensive pipeline:

  1. Upload: Users upload driving videos via web portal or dashcam API
  2. Privacy: Automatic face blurring and license plate anonymization
  3. Validation: Format verification and metadata extraction
  4. Quality Analysis: Video quality assessment and stability metrics
  5. Segmentation: Split videos into categorized segments
  6. Metadata: Extract GPS, temporal, and environmental data
  7. Storage: Organize data into commercial-ready datasets
  8. Analytics: Generate insights for contributors and customers

Key Features

For Data Contributors

  • Drag & Drop Upload: Easy video upload with progress tracking
  • Coverage Mapping: Interactive map showing your contributions
  • Reward System: Points and achievements for quality contributions
  • Mobile Support: Upload directly from dashcam apps
  • Resumable Uploads: Continue interrupted uploads

For Data Consumers

  • Advanced Search: Find specific scenarios, locations, weather conditions
  • Quality Guarantees: 99.5% annotation accuracy, sub-meter GPS precision
  • Real-time Access: Live data feeds and streaming APIs
  • Fast Access: REST & GraphQL APIs with real-time data feeds
  • Privacy Compliant: GDPR compliant with built-in anonymization

Privacy & Security

Data Protection

  • Automatic Anonymization: AI-powered face and license plate blurring
  • End-to-End Encryption: All data encrypted in transit and at rest
  • Access Control: Role-based permissions and audit logging
  • OAuth 2.0: Secure authentication with role-based access control
  • Rate Limiting: Protection against abuse and DDoS attacks

Compliance

  • GDPR Compliant: Full data protection regulation compliance
  • Data Residency: Configurable data storage locations
  • Audit Trails: Complete data access and modification logs
  • Privacy by Design: Built-in privacy controls and defaults

Performance Metrics

  • Upload Speed: 100MB/s sustained upload rates
  • Processing Time: <5 minutes for 1GB video files
  • API Response: <100ms average response time
  • Uptime: 99.9% availability SLA
  • Scalability: Auto-scaling to handle 10,000+ concurrent users

Contributing

We welcome contributions from the community! Here are the main areas where you can help:

Development Roles

  • Developers: Core platform development
  • Researchers: ML model improvements
  • Designers: UI/UX improvements and new features
  • Community: Documentation, testing, feedback

Getting Started

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

Popular repositories Loading

  1. cowcow cowcow Public

    Offline-first realtime speech data collection toolkit for low-connectivity environments.

    Rust 1

  2. deepubuntu deepubuntu Public

    Deep Ubuntu Research - Corporate website featuring advanced AI solutions, frontier research, and full-stack development services. Built with Next.js 14, React Three Fiber, and cutting-edge animatio…

    JavaScript 1

  3. research research Public

    A compilation of research papers and potential talent to be hired.