Skip to content
View karthigaiselvanm's full-sized avatar

Block or report karthigaiselvanm

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
karthigaiselvanm/README.md

Hi, I'm Karthigai Selvan — Data Platform & DE Lead

10+ yrs in data engineering • AWS • Databricks & Spark • PySpark • Trino • Apache Ranger • Terraform • Kubernetes
k2ddna.com · LinkedIn · Email


🚀 What I’m focused on

  • Building reliable platform data engineering foundations (governance, scalability, cost).
  • Modern DE stack: Spark, Delta/Parquet, Trino, dbt, Airflow, Kafka, Terraform, AWS.
  • Secure data access with Apache Ranger and policy-driven controls.

🧰 Toolbox

Python · PySpark · Spark SQL · Databricks · Delta · Kafka · Airflow · Trino · Apache Ranger · dbt · Postgres · S3 · Glue · Terraform · Docker · Kubernetes · Great Expectations


⭐ Featured Projects

Trino + Apache Ranger on Kubernetes (Helm)

  • Custom images, init-containers, and StatefulSet deployment for Trino with Ranger plugin.
  • Solves real issues: missing cred.jceks, JVM opts updates, ranger.service.name validation, and config mounts.
  • Built for repeatable, secure platform setups.

Spark Learning Lab

  • Practical notebooks on DataFrame APIs, Structured Streaming, optimization.
  • Companion to my “Spark Deep Dive” study plan.

PySpark Samples

  • Hands-on patterns for joins, windowing, UDFs/UDAs, and incremental data processing.


📚 Writing & Speaking

  • Apache Ranger × Trino deep-dive (design, gotchas, Helm) — coming soon on k2ddna.com.
  • Data platform patterns for secure multi-tenant analytics.

🏅 Highlights

  • Databricks Certified Data Engineer Associate (Dec 2024)
  • Built OCSF-aligned feed mappings and pipelines; working on Benthos ingestion and policy-guarded changes.

🤝 Let’s collaborate

If you’re building secure, scalable data platforms (Ranger, Trino, Spark, Terraform on AWS/K8s), I’m always up for pairing on design reviews, POCs, and OSS improvements.


GitHub Streak

Github Stats

Popular repositories Loading

  1. pyspark-sample-projects pyspark-sample-projects Public

    Sample projects done in Jupyter Notebooks using spark DataFrame & spark SQL

    Jupyter Notebook 3 2

  2. udacity-data-engineering-projects udacity-data-engineering-projects Public

    Few sample projects related to Udacity Data Engineer Program including Data modeling in Postgres & Apache Cassandra, Setting up a Cloud Data Warehouse, Creating a data lake using Spark & Data pipel…

    Jupyter Notebook 1

  3. trino-ranger-k8s trino-ranger-k8s Public

    Secure Trino deployments on Kubernetes with Apache Ranger — includes Helm charts, init containers, and configs for policy-driven access control.

    Dockerfile 1 1

  4. great_expectations great_expectations Public

    Forked from great-expectations/great_expectations

    Always know what to expect from your data.

    Python 1

  5. sample_secrets_api sample_secrets_api Public

    Forked from GitGuardian/sample_secrets

    Jupyter Notebook

  6. sql-injection-example sql-injection-example Public

    Forked from kodtodya/sql-injection-example

    this example demonstrates the vulnerable sql injection code

    Java