Skip to content

Terraform templates to deploy open data plaform to the cloud.

License

Notifications You must be signed in to change notification settings

BigDataRepublic/open-data-platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BigData Republic Open Data Platform

Introduction

The BigData Republic Open Data Platform is a repository of Terraform templates that help to deploy a modern, open-source data platform on European cloud providers. The platform follows open data stack principles to promote interoperability and avoid vendor lock‑in.

The Open Data Stack is a collection of open-source tools and open standards that together support the full data engineering lifecycle — enabling scalable, flexible, and cost‑effective data platforms without proprietary constraints.

Goal

This project helps organizations experiment with running a data platform on European cloud infrastructure and achieve a first working deployment within a single day.

It is an opinionated starting point. Many production concerns are intentionally out of scope for now, including:

  • Network security hardening
  • Authentication & authorization (IAM / SSO)
  • Streaming / real‑time ingestion
  • Backup, disaster recovery, and lifecycle policies
  • Cost governance & observability

Integrations

Currently supported deployment targets:

Local Kubernetes deployment is also supported for development and experimentation.

Solution Overview

The solution consists of two layers:

  1. Infrastructure provisioning (cloud + local) via Terraform
  2. Data platform deployment via Terraform + Helm on the provisioned Kubernetes cluster

Deployment

Infrastructure Deployment

The infrastructure layer provisions the core building blocks:

  • Object storage
  • Kubernetes cluster

Each cloud integration lives under infra/<provider> and implements the provider‑specific provisioning logic. Depending on the provider, resources are created either through direct Terraform providers (e.g. Scaleway) or via OpenStack APIs plus kubectl (e.g. Cyso Cloud). A local option is also available for testing.

Refer to the provider documentation for setup instructions:

Data Platform Deployment

The proof‑of‑concept data platform assembles a lean but capable open-source stack:

Trino is configured to persist datasets as Apache Iceberg tables in object storage via the Nessie catalog.

BigData Republic Open Data Platform

Deployment steps are provider‑agnostic; see the platform deployment guide here.

License

Released under the MIT License.

About

Terraform templates to deploy open data plaform to the cloud.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published