Skip to content

imgshape is a dataset intelligence toolkit for computer vision that generates deterministic dataset fingerprints, explainable decisions, and production-ready artifacts for reproducible ML workflows.

License

Notifications You must be signed in to change notification settings

STiFLeR7/imgshape

Repository files navigation

🖼️ imgshape

The Data-Centric AI Toolkit for Vision Engineers

PyPI Version Python Version Downloads Streamlit


"Automatically analyze any image dataset and get model-ready preprocessing recommendations in one command."


🚀 Live Demo (Web)📖 Documentation💬 Report Bug / Discuss


⚡ 30-Second Start

Don't guess your dataset's health. Audit it immediately with the Atlas engine.

pip install imgshape

from imgshape import Atlas

# 1. Initialize the Atlas Orchestrator
atlas = Atlas()

# 2. Extract deterministic fingerprint
result = atlas.extract_fingerprint("./my_dataset")

# 3. View the verdict
print(result.summary())

System Output:

{
  "fingerprint_id": "fp_8a7d9f2",
  "total_images": 4502,
  "corrupt_files": 12,
  "metrics": {
    "avg_resolution": "1024x768",
    "diversity_score": 0.89,
    "channel_consistency": "FAIL"
  },
  "issues": ["Found 14 grayscale images in RGB dataset"]
}

🔍 The Visual Dashboard (Atlas UI)

Experience imgshape's capabilities visually. The dashboard provides a real-time interface for dataset fingerprinting, augmentation previews, and pipeline configuration.

imgshape Dashboard

Dashboard v4.0.0 showing Atlas API Version, Task Type selection, and System Logs.


🚀 Why imgshape?

Most vision models fail because of garbage data—corrupt files, mixed channels (RGBA vs RGB), or weird aspect ratios. imgshape catches these before you train using a deterministic rule engine.

Module Technical Function
🔍 Instant Audit Multi-threaded scan for corruption, outliers, and duplicates using high-performance IO.
🧠 Decision Engine Heuristic-based suggestion engine (Atlas DecisionLayer) for Resize, Normalize, and Augment.
🛠️ Pipeline Export Generates serialization-safe code for PyTorch, TensorFlow, and Albumentations.
🎨 Visual Studio Local Streamlit instance for interactive augmentation testing and hypothesis verification.

📦 Installation Matrix

Choose your deployment flavor.

Command Use Case Size
pip install imgshape Core / CI/CD ~12MB
pip install "imgshape[full]" Research / Power User ~45MB
pip install "imgshape[ui]" Interactive / Dashboard ~30MB

💡 Practical Use Cases

1. The "Sanity Check" (CI/CD Integration)

Block bad data from entering your training bucket. Ideal for GitHub Actions or Jenkins.

# Returns exit code 1 if corrupt files or schema violations are found
imgshape --check ./new_batch_v2 --strict-schema

2. The "Pipeline Builder"

Don't guess augmentation parameters. Let the entropy statistics decide.

# analyze -> recommend -> export PyTorch snippet
imgshape --path ./train_data --analyze --recommend --out transforms.py

3. The "Visual Explorer"

Verify RandomCrop or ColorJitter intensity manually before training.

# Launches local studio with auto-reload
imgshape --web --reload

🏗️ Architecture & Internal Mechanics

imgshape (Aurora Engine) operates on a Fingerprint-Analyze-Decide loop, acting as a middleware between raw storage and compute.

graph TD
    subgraph "Data Layer"
    A[Raw Images]
    end

    subgraph "imgshape Core (Atlas)"
    B[Fingerprint Extractor] -->|Hash & Meta| C{Decision Engine}
    C -->|Rules v4.0| D[Recommendation]
    end

    subgraph "Integration Layer"
    D --> E[PyTorch/TF Code]
    D --> F[JSON Artifacts]
    D --> G[HTML/PDF Reports]
    end

    A --> B
Loading

Core Components

  • Atlas Orchestrator: The central intent-driven API that manages the lifecycle of an analysis session.
  • Fingerprint Extractor: A stateless module that computes immutable signatures for datasets (distributions, channel counts, hashes).
  • Decision Engine: A rule-based system that maps dataset signatures + User Intent (e.g., "Speed" vs "Accuracy") to concrete preprocessing steps.

🤝 Community & Support

Built by Stifler for the AI Engineering community.

Star on GitHub — it helps more people find clean data.

About

imgshape is a dataset intelligence toolkit for computer vision that generates deterministic dataset fingerprints, explainable decisions, and production-ready artifacts for reproducible ML workflows.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published