-
-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Add real end-to-end testing by reconstructing objects with known ground truth meshes and textures, then comparing results via quantitative metrics.
Motivation
Current tests validate CLI/script behavior but not reconstruction quality. Need regression detection for actual output quality when pipeline or models change.
Proposed Design
Trigger & Execution
- Manual kick-off (e.g.,
scripts/validate.sh) - Runs overnight — full pipeline too expensive for CI
Test Assets (stored in R2)
- Location:
r2:hummat-assets/mini-mesh/validation/ - Initial objects:
016_mokka— geometry only (from automatica dataset)018_mustard_bottle— geometry only (from automatica dataset)- YCB mustard bottle — geometry + texture (for texture validation)
- Format: GT mesh (
.ply/.obj), test video, metadata (expected scale, etc.)
Geometry Metrics
| Metric | Purpose |
|---|---|
| Chamfer distance | Overall surface accuracy |
| Hausdorff distance | Worst-case error (catches outliers, thin structures) |
| F-score @ threshold | % of surface within tolerance (e.g., 1mm, 5mm) |
Alignment Pipeline
- Fast Global Registration (FGR) — coarse alignment using FPFH features (handles unknown correspondences)
- Scaled ICP — refine with scale estimation (
TransformationEstimationPointToPoint(with_scaling=True)) - Fallback — if FGR fitness < threshold, use PCA-based init (centroid + principal axes) → scaled ICP
- Outlier filtering — remove predicted points far from GT (handles unmasked runs with extra geometry like table surface)
- Compute metrics — on filtered, aligned point sets
Texture Metrics
- Method: Render-based comparison
- Render GT and reconstructed mesh from N viewpoints
- Compare rendered images via PSNR, SSIM, LPIPS
- Captures: Albedo accuracy + sharpness
- Renderer: Trimesh + pyrender (default), nvdiffrast for speed
Output
- JSON report: Machine-readable metrics for tracking
- Markdown summary: Human-readable results
Regression Detection
- Initially: Manual inspection, no automatic pass/fail
- Once baselines established: Relative with absolute floor
- Flag if metrics degrade > X% from stored baseline
- OR if metrics fall below absolute minimum acceptable quality
- Catches both sudden regressions and slow drift over time
Alternatives Considered
-
Manual visual inspection only: Compare reconstructed meshes by eye in Blender/MeshLab. Rejected because: subjective, not reproducible, doesn't scale, can't detect subtle regressions.
-
Synthetic data with perfect GT: Render synthetic scenes and reconstruct them. Considered but deprioritized because: doesn't test real-world capture challenges (lighting variation, motion blur, reflections), though useful as a complementary test later.
-
Unit tests on pipeline stages only: Test individual components (SfM accuracy, mesh extraction) in isolation. Rejected as primary approach because: integration bugs and quality drift across stages would go undetected. Current CLI tests already cover this partially.
-
CI integration with smaller assets: Run validation on every PR with tiny test objects. Rejected because: meaningful reconstruction takes hours on GPU; tiny objects don't exercise the full pipeline realistically. Manual/nightly trigger is more practical.
Tasks
- Set up R2 storage structure for validation assets
- Record/prepare test videos for initial objects
- Upload GT meshes and videos to R2
- Implement alignment pipeline (FGR + scaled ICP + PCA fallback + filtering)
- Implement geometry metrics (Chamfer, Hausdorff, F-score)
- Implement render-based texture comparison
- Create
scripts/validate.shentry point - Generate JSON + Markdown reports
- Document usage in
docs/ - Add baseline storage and regression detection (after initial baselines captured)
Dependencies
trimesh— mesh loading, point samplingpyrender— renderinglpips— perceptual similarityopen3d— FGR, scaled ICP, point cloud operations
Future Considerations
- Add more objects incrementally (challenging cases: reflective, thin structures)
- Integration with scheduled runs (nightly/weekly)
Related Issues
- feat(export): PBR-ish texture extraction and BRDF-aware training #11 — PBR texture extraction (texture validation useful for PBR outputs)