"Automatically analyze any image dataset and get model-ready preprocessing recommendations in one command."
🚀 Live Demo (Web) • 📖 Documentation • 💬 Report Bug / Discuss
Don't guess your dataset's health. Audit it immediately with the Atlas engine.
pip install imgshape
from imgshape import Atlas
# 1. Initialize the Atlas Orchestrator
atlas = Atlas()
# 2. Extract deterministic fingerprint
result = atlas.extract_fingerprint("./my_dataset")
# 3. View the verdict
print(result.summary())System Output:
{
"fingerprint_id": "fp_8a7d9f2",
"total_images": 4502,
"corrupt_files": 12,
"metrics": {
"avg_resolution": "1024x768",
"diversity_score": 0.89,
"channel_consistency": "FAIL"
},
"issues": ["Found 14 grayscale images in RGB dataset"]
}Experience imgshape's capabilities visually. The dashboard provides a real-time interface for dataset fingerprinting, augmentation previews, and pipeline configuration.
Dashboard v4.0.0 showing Atlas API Version, Task Type selection, and System Logs.
Most vision models fail because of garbage data—corrupt files, mixed channels (RGBA vs RGB), or weird aspect ratios. imgshape catches these before you train using a deterministic rule engine.
| Module | Technical Function |
|---|---|
| 🔍 Instant Audit | Multi-threaded scan for corruption, outliers, and duplicates using high-performance IO. |
| 🧠 Decision Engine | Heuristic-based suggestion engine (Atlas DecisionLayer) for Resize, Normalize, and Augment. |
| 🛠️ Pipeline Export | Generates serialization-safe code for PyTorch, TensorFlow, and Albumentations. |
| 🎨 Visual Studio | Local Streamlit instance for interactive augmentation testing and hypothesis verification. |
Choose your deployment flavor.
| Command | Use Case | Size |
|---|---|---|
pip install imgshape |
Core / CI/CD | ~12MB |
pip install "imgshape[full]" |
Research / Power User | ~45MB |
pip install "imgshape[ui]" |
Interactive / Dashboard | ~30MB |
Block bad data from entering your training bucket. Ideal for GitHub Actions or Jenkins.
# Returns exit code 1 if corrupt files or schema violations are found
imgshape --check ./new_batch_v2 --strict-schemaDon't guess augmentation parameters. Let the entropy statistics decide.
# analyze -> recommend -> export PyTorch snippet
imgshape --path ./train_data --analyze --recommend --out transforms.pyVerify RandomCrop or ColorJitter intensity manually before training.
# Launches local studio with auto-reload
imgshape --web --reloadimgshape (Aurora Engine) operates on a Fingerprint-Analyze-Decide loop, acting as a middleware between raw storage and compute.
graph TD
subgraph "Data Layer"
A[Raw Images]
end
subgraph "imgshape Core (Atlas)"
B[Fingerprint Extractor] -->|Hash & Meta| C{Decision Engine}
C -->|Rules v4.0| D[Recommendation]
end
subgraph "Integration Layer"
D --> E[PyTorch/TF Code]
D --> F[JSON Artifacts]
D --> G[HTML/PDF Reports]
end
A --> B
- Atlas Orchestrator: The central intent-driven API that manages the lifecycle of an analysis session.
- Fingerprint Extractor: A stateless module that computes immutable signatures for datasets (distributions, channel counts, hashes).
- Decision Engine: A rule-based system that maps dataset signatures + User Intent (e.g., "Speed" vs "Accuracy") to concrete preprocessing steps.
- Issues: Found a bug? Open an issue.
- Discussions: Feature requests? Join the discussion.
Built by Stifler for the AI Engineering community.
Star on GitHub ⭐ — it helps more people find clean data.
