Two‑stage deep learning pipeline for predicting an image’s approximate geographic location: (1) Region classification (Director) → (2) Region‑conditioned coordinate regression (Experts).
IMPORTANT NOTE! We created it together in Pycharma using the collaboration feature, so only gulis-dev committed
| No | Notebook | Link | Purpose (Summary) |
|---|---|---|---|
| 01 | Data Collecting | notebooks/01_Data_Collecting.ipynb |
Notes / scaffold for sourcing raw imagery & metadata |
| 02 | Data Preprocessing | notebooks/02_Data_Preprocessing.ipynb |
Cleaning & preparing tabular metadata before modeling |
| 03 | Data Analysis & Imbalance Check | notebooks/03_data_analysis_and_Imbalance_check.ipynb |
Exploratory analysis: class distribution, imbalance, visual stats (large outputs) |
| 04 | Director Training (earlier version) | notebooks/04_train_director.ipynb |
Initial Director model training experiments |
| 05 | Experts Training | notebooks/05_train_experts.ipynb |
Per‑region regression (lat/lon normalization & training loops) |
| 06 | Inference Overview | notebooks/06_inference_overview.ipynb |
End‑to‑end inference documentation |
| 07 | Director v3 Training | notebooks/07_director_v3_training.ipynb |
Improved Director: augmentation, class balancing, early stopping |
If you are browsing on GitHub and large notebooks (e.g. #03) load slowly, consider cloning and opening locally, or clearing heavy cell outputs before new commits.
- Project Overview
- Core Motivation
- Architecture
- Regions & Coordinate Normalization
- Data Pipeline
- Modeling
- Training Setup
- Results (Director v3)
- Inference Workflow
- Repository Structure
- License
- Citation (Optional)
GeoMind localizes an image approximately on the globe. Instead of regressing latitude & longitude directly (highly multimodal / discontinuous), it divides the task:
- Director: classify the image into one of 13 macro geographic regions.
- Region Expert: a dedicated regression model (one per region) outputs normalized (lat, lon) which are denormalized using that region’s bounding box.
This cascaded approach reduces ambiguity and improves stability vs. monolithic global regression.
- Global coordinate regression is non-convex and multi-peaked (coastal roads vs. inland desert look similar).
- Restricting the regression domain (after region classification) allows Experts to specialize.
- Enables modular improvement: swap backbone for Director or retrain a weak Expert independently.
Image
│
▼
┌────────────────┐
│ Director │ EfficientNet-B0 (13-way softmax)
└───────┬────────┘
│ region_id (0..12)
▼
┌────────────────────────┐
│ Region Expert (id=k) │ EfficientNet-B0 head → 2 sigmoid outputs
└──────────┬─────────────┘
│ (norm_lat, norm_lon) ∈ [0,1]^2
▼
Denormalize via region bounding box → (latitude, longitude)
Each region k defines:
- lat_min, lat_max
- lon_min, lon_max
Expert outputs y_lat, y_lon in [0,1]; convert:
lat = y_lat * (lat_max - lat_min) + lat_min
lon = y_lon * (lon_max - lon_min) + lon_min
Advantages:
- Uniform learning scale across regions.
- Avoids exploding coordinate variance across the globe.
- Facilitates per-region error analysis.
(Ref. Notebooks 01–03)
- Collection: Acquire raw images + metadata (filenames, region IDs, optional ground-truth coordinates).
- Validation:
- Check file existence.
- Validate label range (0–12 for 13 regions).
- Detect missing or malformed rows.
- Split:
- Train / Validation (90/10 stratified or random with fixed seed).
- Analysis:
- Class imbalance quantification.
- Spatial dispersion per region.
- Visual sanity checks (edge cases, ambiguous scenery).
- (Optional) Future additions:
- Hard example tagging (misclassified by Director).
- Geo-visual clustering for region refinement.
- Backbone: EfficientNet-B0 (can be swapped later).
- Input: 224×224 RGB (scaled & normalized).
- Output: 13 logits → softmax.
- Class imbalance mitigation:
- WeightedRandomSampler
- Per-class weight vector in CrossEntropyLoss
- One model per region (same backbone, lightweight head).
- Head: Global pooling → Linear → 2 units → Sigmoid.
- Loss: MSE / SmoothL1 on normalized coordinates.
- Denormalization only at evaluation / inference.
Key configuration (Director v3 – Notebook 07):
| Component | Choice |
|---|---|
| Resolution | 224×224 |
| Optimizer | AdamW (lr=5e-4, weight_decay=1e-4) |
| Scheduler | ReduceLROnPlateau (factor 0.5, patience 3) |
| Early Stopping | Patience 6 (val_loss) |
| Loss | CrossEntropy (class weights) |
| Augmentations | RandomResizedCrop, flips (H/V), Rotation (~±20°), ColorJitter, RandomGrayscale |
| Normalization | ImageNet mean/std |
| Checkpoint | Best val_loss |
Potential improvements (see Roadmap): multi-metric checkpointing (accuracy + loss), label smoothing, MixUp / CutMix, partial freezing for speed, advanced backbones.
Best (by val_loss): 1.3318
Peak validation accuracy reached later: 0.6397 (not saved under loss-only strategy).
Classification Report (summary):
- Strong classes: 5, 10, 11 (high precision & recall).
- Weaker / confused: 3, 12, partial confusion among adjacent region pairs (geographic similarity).
- Weighted F1 ≈ 0.64.
(Full matrix & report reproduced in notebook 07.)
Observations:
- Some systematic confusions might justify region boundary refinement or hierarchical sub-classes.
- Dual checkpointing (best_loss, best_accuracy) recommended to capture later accuracy gains.
(Ref. Notebook 06)
- Preprocess image (resize/crop → 224×224 → normalize).
- Director predicts region_id.
- Load corresponding Expert weights.
- Expert outputs normalized (lat, lon).
- Denormalize to real coordinates.
- (Optional) Compute Haversine distance if ground-truth available.
- Generate quick review link:
https://www.google.com/maps?q=<lat>,<lon> - Aggregate metrics across a batch (mean distance, percentile thresholds).
(Representative – adjust as code evolves.)
.
├── notebooks/
│ ├── 01_Data_Collecting.ipynb
│ ├── 02_Data_Preprocessing.ipynb
│ ├── 03_data_analysis_and_Imbalance_check.ipynb
│ ├── 04_train_director.ipynb
│ ├── 05_train_experts.ipynb
│ ├── 06_inference_overview.ipynb
│ └── 07_director_v3_training.ipynb
├── src/
│ ├── data/ # (planned) dataset & transforms modules
│ ├── models/ # (planned) director & expert definitions
│ ├── training/ # training scripts / loops
│ ├── inference/ # inference utilities / CLI
│ └── utils/ # metrics, logging, geo utils
├── saved_models/
│ ├── director/
│ └── experts/
├── requirements.txt
├── README.md
└── LICENSE (pending choice: MIT / Apache-2.0)
Suggested next refactors:
- Move repeated notebook code →
src/. - Provide a CLI entry (e.g.
python -m geomind.inference ...).
(Choose one—placeholder below.)
Example (MIT):
This project is licensed under the MIT License.
# Install
git clone https://github.com/gulis-dev/GeoMind.git
cd GeoMind
pip install -r requirements.txt
# (Optional) Train director
python src/training/train_director.py --config configs/director_v3.yaml
# Inference (example)
python src/inference/run_inference.py \
--images_dir data/raw/images \
--director_path saved_models/director/director_v3_best.pt \
--experts_dir saved_models/experts \
--output predictions.csvThis system produces approximate geolocations and may be wrong or biased. Do not use for safety‑critical, surveillance, or privacy‑sensitive applications without thorough validation and ethical review.
Feel free to open issues / PRs for improvements, benchmarking results, or integration ideas. Contributions welcome!