Skip to content

A modular pipeline for converting OpenStreetMap data into cloud-native GeoParquet files using DuckDB, PostGIS, and Docker.

License

Notifications You must be signed in to change notification settings

FadilSmajilbasic/cadence-maps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

213 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ—บ๏ธ CadenceMaps

CadenceMaps logo

Open, Reproducible OpenStreetMap Data Pipeline
Transforming OSM data into analysis-ready GeoParquet format

License Data Format Region Update Frequency

๐Ÿš€ Quick Start

Access ready-to-use OSM data with a simple duckdb query:

INSTALL httpfs;
LOAD httpfs;
INSTALL spatial;
LOAD spatial;
SET s3_endpoint='api.cadencemaps.infs.ch';
SET s3_url_style='path';


SELECT names.primary
FROM read_parquet(
  's3://cadencemaps/release/2025-05-13/theme=places/type=place/country=CH/*',
  filename=true,
  hive_partitioning=1
)
WHERE categories.primary = 'restaurant';
QR Code

Scan to access the CadenceMaps website

๐Ÿ“– About

A modular pipeline for converting OpenStreetMap data into cloud-native GeoParquet files using DuckDB, PostGIS, and Docker. Built as part of a thesis project, it addresses the challenge of making OSM's rich dataset (5+ million POIs for the D-A-CH-LI) accessible for modern cloud-native analytics.

๐ŸŽฏ Key Features

  • ๐Ÿ”„ Automated Pipeline: Weekly or Daily updates possible using CI/CD DataOps principles
  • ๐Ÿ“Š Analysis-Ready: Direct SQL querying without server infrastructure
  • ๐ŸŒ Standardized Schema: Overture Maps-compatible format for interoperability
  • โšก High Performance: Optimized spatial partitioning
  • ๐Ÿ” Quality Assured: Automated validation and vandalism detection
  • ๐Ÿ—๏ธ Scalable Architecture: Country-scale processing in <24 hours

๐Ÿ›๏ธ Architecture

Data Processing Pipeline

CadenceMaps employs a fully automated CI/CD pipeline that orchestrates the conversion of OpenStreetMap data into analysis-ready GeoParquet files. The pipeline is triggered weekly and processes the entire D-A-CH-LI region in parallel across multiple countries.

Pipeline Stages

The GitLab CI/CD pipeline consists of seven distinct stages that ensure reliable, reproducible data processing:

  1. ๐Ÿ”ง Publish Stage: Builds and publishes Docker images for all services (converter, importer, database, validator, web, reverse-proxy, cert-extractor) to the GitLab container registry.

  2. โš™๏ธ Setup Stage: Initializes the production environment by deploying core infrastructure services including MinIO S3 storage, PostgreSQL database, reverse proxy, and the web interface.

  3. ๐Ÿš€ Deploy Stage: Executes parallel data processing for each country in the D-A-CH-LI region:

    • Downloads latest OSM PBF files from Geofabrik (with fallback to OSM France)
    • Imports data into PostgreSQL using osm2pgsql with custom Lua transformations
    • Converts PostgreSQL data to GeoParquet format using DuckDB
    • Uploads results to staging area in MinIO
  4. ๐Ÿงน Cleanup Stage: Performs system maintenance by removing unused Docker images and volumes to free up server resources.

  5. โœ… Validate Stage: Runs vandalism detection algorithms.

  6. ๐Ÿ‘ Approve Stage: Manual approval gate where validated data is promoted from staging to production release bucket, making it publicly accessible.

  7. ๐Ÿ›‘ Stop Proxy Stage: Optional manual stage to stop the reverse proxy for maintenance purposes.

Detailed pipeline Execution Flow

Pipeline Execution Flow

Automation Features

  • โฐ Scheduled Execution: Pipeline runs weekly or daily automatically, ensuring fresh data availability
  • ๐Ÿ”„ Fault Tolerance: Automatic fallback to alternative OSM data sources (OSM France) if Geofabrik fails
  • ๐Ÿ“Š Parallel Processing: All four countries (DE, AT, CH, LI) process simultaneously for optimal performance
  • ๐Ÿ’พ Resource Management: Aggressive cleanup of Docker images and volumes to maintain server health
  • ๐Ÿ“ˆ Monitoring: Artifact collection of validation reports
  • ๐Ÿšฆ Quality Gates: Manual approval step prevents bad data from reaching production

The entire pipeline typically completes in under 24 hours for the full D-A-CH-LI region, processing 5+ million points of interest.

Technology Stack

Component Technology Purpose
Data Ingestion osm2pgsql OSM to PostgreSQL conversion
Spatial Processing PostgreSQL/PostGIS Geographic data preprocessing
Format Conversion Python + DuckDB Tabular data transformation
Storage Format GeoParquet Cloud-optimized geospatial data
Hosting MinIO (S3-compatible) Scalable object storage
Orchestration GitLab CI/CD + Docker Automated pipeline execution

๐Ÿ“ Repository Structure

cadencemaps/
โ”œโ”€โ”€ ๐Ÿณ .docker/            # Docker container configurations for production
โ”‚   โ”œโ”€โ”€ cert-extractor/    # SSL certificate extraction service for minio
โ”‚   โ”œโ”€โ”€ converter/         # Converter service Docker setup
โ”‚   โ”œโ”€โ”€ importer/          # Importer service Docker setup
โ”‚   โ”œโ”€โ”€ reverse-proxy/     # Caddy reverse proxy configuration
โ”‚   โ”œโ”€โ”€ spatial-database/  # PostGIS database Docker setup
โ”‚   โ”œโ”€โ”€ validator/         # Validator service Docker setup
โ”‚   โ”œโ”€โ”€ web/               # Web service Docker setup
โ”‚   โ””โ”€โ”€ docker-compose.yml # Production Docker compose configuration
โ”œโ”€โ”€ ๐Ÿ”„ converter/          # Data conversion logic (PostgreSQL to GeoParquet)
โ”œโ”€โ”€ ๐Ÿ“ฅ importer/           # osm2pgsql data import tools and mapping scripts
โ”‚   โ””โ”€โ”€ mappers/           # Lua scripts for data transformation
โ”‚       โ”œโ”€โ”€ divisions/     # Administrative boundary mapping
โ”‚       โ”œโ”€โ”€ places/        # Points of interest mapping
โ”‚       โ””โ”€โ”€ utils/         # Shared utility functions
โ”œโ”€โ”€ โœ… validator/          # Data quality validation and testing
โ”œโ”€โ”€ ๐ŸŒ web/                # Public website and documentation
โ”‚   โ””โ”€โ”€ release-page/      # Next.js frontend application
โ”œโ”€โ”€ ๐Ÿณ docker-compose.yml  # Local development environment setup

๐Ÿ—บ๏ธ Coverage

Current Region: D-A-CH-LI

  • ๐Ÿ‡ฉ๐Ÿ‡ช Germany (Deutschland)
  • ๐Ÿ‡ฆ๐Ÿ‡น Austria (ร–sterreich)
  • ๐Ÿ‡จ๐Ÿ‡ญ Switzerland (Schweiz)
  • ๐Ÿ‡ฑ๐Ÿ‡ฎ Liechtenstein

Data Themes

  • places:
    • place (POIs like restaurants, parks, etc. as points)
  • divisions:
    • division (logical definition of administrative boundaries as points)
    • division_area (definition of administrative boundaries as polygons or multipolygons)
    • division_boundary (definition of administrative boundaries as linestrings or multilinestrings)

๐Ÿ› ๏ธ Usage

Local Development

  1. Clone the repository
git clone <repository-url>
cd cadencemaps
  1. Start Docker services
docker-compose up -d

Accessing Data

Access examples can be found on our Website in the Usage section.

๐Ÿ“Š Performance

Benchmarks (D-A-CH-LI Region)

  • Processing Time: <24 hours for full update
  • Data Volume: ~5M POIs processed

A typical client query like "all restaurants in D-A-CH-LI region" takes around 20 seconds to execute as per our testing:

SELECT * FROM read_parquet('s3://cadencemaps/release/2025-06-04/theme=places/type=place/*/*', hive_partitioning=1)
    WHERE
      country IN ('DE', 'CH', 'LI', 'AT')
      AND (categories.primary = 'restaurant'
      OR 'restaurant' IN categories.alternate);

Query Performance

Run Execution Time
Run 1 20.17 seconds
Run 2 19.17 seconds
Run 3 19.40 seconds
Run 4 19.68 seconds
Run 5 19.05 seconds

๐Ÿค Contributing

We welcome contributions! Please see our contributing guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Areas

  • ๐ŸŒ Geographic Expansion: Add new countries/regions
  • โšก Performance: Optimize processing algorithms
  • ๐Ÿงฎ Analytics: New schema mappings and transformations
  • ๐Ÿ“– Documentation: Improve guides and examples

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • OpenStreetMap Community - For the incredible open geospatial data
  • Geofabrik - For reliable OSM extracts
  • Overture Maps Foundation - For schema standardization efforts

Built by Matthias Hersche, Fadil Smajilbasic and Nils Robin-Grob as part of the FS25 Bachelor Thesis at the OST

About

A modular pipeline for converting OpenStreetMap data into cloud-native GeoParquet files using DuckDB, PostGIS, and Docker.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published