diff --git a/_posts/2025-12-23-gnec-hackathon-win.html b/_posts/2025-12-23-gnec-hackathon-win.html new file mode 100644 index 00000000..feb6efd0 --- /dev/null +++ b/_posts/2025-12-23-gnec-hackathon-win.html @@ -0,0 +1,66 @@ +--- +layout: post +title: Gnec Hackathon Win +date: 2025-12-23 +author_name: Silke Nodwell +author_role: Lead at Women Coding Community +image: /assets/images/blog/2025-12-23-gnec-hackathon-win.png +description: Our Women Coding Community Team Takes 3rd Place at the GNEC Fall Hackathon +category: tech-career +--- + +
+

Our Women Coding Community Team Takes 3rd Place at the GNEC Fall Hackathon

+

“Nourish Together” team: Silke Nodwell, Ainan Ihsan, Tammy Sisodiya, Nino Godoradze

+

What does it take to win a hackathon? For our team of four from Women Coding Community, the answer is a combination of persistence, experimentation, and the resilience to tackle each new challenge together.

+

This fall, we competed in the Global NGO Executive Committee (GNEC) Hackathon, which focused on the UN Sustainable Development Goals of No Poverty and Zero Hunger. Our team, Nourish Together, placed 3rd out of more than 150 teams. Over eight weeks of evening meetings, trial-and-error, and bursts of productivity, we built a working prototype we’re really proud of. Here’s how it came together.

+

How It All Started

+

In August, Tammy put a message in the WCC Slack about the GNEC hackathon, asking whether anyone was interested in joining a team. Within a couple of hours, we had a full team. We met virtually a few days later and quickly realised how international our group was: four people across three countries and time zones, spanning the US, Sweden and the UK.

+

Choosing a Direction

+

At first, we struggled. “Poverty” and “hunger” are enormous, complex issues, and none of us worked directly in those domains. Coming up with a focused idea felt daunting.

+

Eventually, we realised that instead of trying to design for communities we didn’t know well, we could design for a group whose needs we did understand: donors. Donors have both the resources and the technology to use an app, and, crucially, we felt equipped to empathise with their decision-making process.

+

So we framed our problem simply:

+

How might we help donors confidently identify charities aligned with their values?

+

Designing our Website

+

Our initial concept was intentionally broad: a website that could help donors explore both financial and non-financial ways to contribute. As part of this, we imagined an “impact tracker” that visualised how each donated dollar translated into tangible outcomes.

+

Around this time, I had seen an impressive prototype built with Lovable at work and suggested we try it for our UI. Lovable turned out to be ideal. It enabled us to quickly build a polished mock impact tracker that displayed the journey of each donation and showed how many families could be helped through that contribution.

+

After the impact tracker was in place, Tammy added an embedded food bank map. It connected directly to Google Maps so users could instantly find their nearest food bank if they preferred donating food rather than money.

+

Adding an LLM-Inspired Recommender

+

With the core website established, we began exploring how to make the platform more intelligent. That was when the idea of a charity recommender emerged. At first, we approached it as a conventional machine learning problem. At the same time, I was reading Prompt Engineering with LLMs with the WCC book club, which sparked the question:

+

Instead of building a traditional machine-learning recommender, what if we built an LLM-powered one?

+

In true hackathon fashion, we opted for the simplest approach that could work. Rather than running a full large language model, we implemented a recommender using a Sentence Transformer, a smaller and more efficient model designed to convert text into vector embeddings, or numeric representations that capture meaning.

+

We generated embeddings for each charity’s description and stored them in a FAISS index, which is optimised for fast similarity search across large collections of vectors. This setup is similar to a Retrieval-Augmented Generation (RAG) system, where relevant items are retrieved based on semantic similarity rather than exact keyword matches.

+

When a user entered a query, it was embedded in the same way and compared it against the stored charity embeddings using cosine similarity, a common metric for measuring how close two vectors are in high-dimensional space. The nearest matches became our recommendations. The result was an intuitive, natural-language experience for our ‘Find Your Perfect Charity’ feature.

+

The Reality of Online Collaboration

+

In the beginning, we tried assigning fixed roles. This approach collapsed almost immediately because volunteer schedules are unpredictable. Some weeks one of us had time to take on multiple tasks, while other weeks we were barely available. We switched to leaving tasks unassigned; anyone could pick up an item from the list as long as they kept the group updated on their progress.

+

Working fully online added its own challenges. During one evening sync call, we realised that we needed uninterrupted time together if we were going to deliver something cohesive. So we all agreed to take a full day off work to focus on the project. This became a turning point. It aligned our codebase, resolved repository issues and built crucial momentum.

+

What followed was a very determined weekend sprint.

+

Two Days, A Cloud Deployment and Little Sleep

+

Integrating the FAISS index into our Lovable website was more complicated than expected. After several failed attempts, we switched to a new plan. We deployed the recommender on Railway as a standalone Cloud API and connected this API to our Lovable front end.

+

This approach worked, but not without challenges. On Saturday night, Ainan stayed up late resolving dependency issues. Early on Sunday morning, I picked up where she left off and finally got the API communicating with the UI.

+

Meanwhile, Nino, who lives in the United States and has the most video-editing experience, logged in at 1 pm UK time. She edited a beautifully polished demo video, only to discover that it was thirty seconds too short for the submission requirements. To fix this, we recorded a quick Teams call introducing ourselves and added it to the final cut. It ended up fitting perfectly.

+

We had eight weeks to complete the project, yet we still submitted with only thirty minutes to spare. Hackathon law, surely!

+

Tools That Made a Difference

+ +

What We Learned

+

This project was not only about building an intelligent website for donors. It was also about learning how to collaborate flexibly, make practical technical decisions and sustain momentum even when the path forward was unclear.

+

If there is one lesson we are taking forward, it is this:

+

A hackathon team succeeds when it stays adaptable.

+

Not when roles are perfectly assigned, or when the plan unfolds neatly, but when everyone leans in however and whenever they can.

+

We are proud of our 3rd-place finish, proud of the product we built and even prouder of how we worked together. Above all, we are grateful for Women Coding Community, which brought this team together and made this experience possible.

+
+

Hackathon website

+

Devpost project

+

GitHub (recommender API)

+

GitHub (donation project)

+
\ No newline at end of file diff --git a/_posts/2025-12-25-data-engineering-portfolio-projects.html b/_posts/2025-12-25-data-engineering-portfolio-projects.html new file mode 100644 index 00000000..eece8be3 --- /dev/null +++ b/_posts/2025-12-25-data-engineering-portfolio-projects.html @@ -0,0 +1,235 @@ +--- +layout: post +title: Data Engineering Portfolio Projects +date: 2025-12-25 +author_name: Sowmiya Ravikumar +author_role: Data Engineer +image: /assets/images/blog/2025-12-25-data-engineering-portfolio-projects.jpg +description: Data Engineering Portfolio Projects +category: tech-career +--- + +
+

Building portfolio projects for Data Engineering can be challenging outside enterprise environments due to limited access to realistic data, missing business context, and cloud costs.

+

Below are four practical portfolio projects that aspiring data engineers can build to showcase real-world skills. Each project focuses on a commonly used data engineering pattern and can be implemented using open-source tools or managed cloud services

+
+

1. Daily Sales Batch ETL

+

A Finance team requires an audit-ready daily sales report delivered every morning by 8:00 AM, based on the previous day’s completed orders. This is a classic batch data engineering scenario where data must be processed on a fixed schedule with strong guarantees around correctness, reproducibility, and scalability.

+

Architecture Overview

+
    +
  1. Extract daily order data from the raw storage layer
  2. +
  3. Transform sales data into clean, analytics-ready models
  4. +
  5. Load curated tables for reporting and audits
  6. +
  7. Schedule the pipeline to meet a strict daily SLA
  8. +
+

Technology Stack

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
EnvironmentStorageProcessingScheduling
Local / Open-SourceMinIOSpark DockerAirflow
AWSAmazon S3AWS GlueGlue Triggers
GCPGCSCloud DataflowCloud Scheduler
+

Key points to consider

+ +

2. Real-Time Order Monitoring

+

A Customer Support team needs immediate visibility into stuck orders (e.g., payment complete but products not shipped) to intervene before customers churn. This is a classic real-time operational use case, where events must be processed as they occur with guarantees for correctness and timeliness.

+

Architecture Overview

+
    +
  1. Capture Events: Track order updates in near real-time from transactional systems using Change Data Capture (CDC)
  2. +
  3. Process Stream: Transform, deduplicate, and aggregate events as they arrive
  4. +
  5. Persist & Query: Store curated streams or aggregates for dashboards and alerts
  6. +
  7. Alert / Monitor: Trigger notifications for stuck orders or SLA violations
  8. +
+

Technology Stack

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
EnvironmentEvent Capture / CDCStream ProcessingStorage / QueryAlerting / Monitoring
Local / Open-SourceDebezium + KafkaSpark Structured StreamingMinIO + DuckDBPython / Spark triggers, Prometheus + Grafana
AWSDMS + KinesisAWS Glue StreamingS3 + AthenaCloudWatch / SNS
GCPDatastream + Pub/Sub (CDC)Cloud DataflowGCS + BigQueryCloud Monitoring + Pub/Sub alerts
+

Key points to consider

+ +
+

3. Campaign Performance Data Analytics

+

A Marketing team runs campaigns across Google Ads, Meta, and Email. They need a single source of truth to consistently analyse total spend, conversions, and campaign performance across channels. This is a classic analytics engineering use case, where raw ingestion data is transformed into curated, analysis-ready models.

+

Architecture Overview

+
    +
  1. Storage (Bronze): Capture unprocessed campaign and conversion data.
  2. +
  3. Transformation (Silver): Clean, standardize, enrich, and apply business logic
  4. +
  5. Data Warehouse (Gold): Aggregate metrics at campaign and channel level for reporting and product analytics.
  6. +
  7. Orchestration & Consumption: Automate daily ETL runs and query from BI tools.
  8. +
+

Technology Stack

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
EnvironmentStorageProcessingData Warehouse / Product Layer
Local / Open-SourceDuckDB / MinIOSpark DockerDuckDB
AWSS3AWS GlueRedshift
GCPGCSCloud DataflowBigQuery
+

Key points to consider

+ +
+

4. Real-Time IoT Sensor Analytics

+

A factory floor needs to monitor high-frequency IoT sensor data to detect overheating machines or abnormal energy usage before equipment fails. This is a stateful streaming use case, where it is critical to compute averages, trends, and anomalies in real time.

+

Architecture Overview

+
    +
  1. Ingestion: Capture sensor readings continuously from IoT devices or message streams
  2. +
  3. Stream Processing: Maintain state, compute rolling averages, windowed aggregations, and detect anomalies
  4. +
  5. Storage: Persist aggregated or processed sensor data for operational and historical use
  6. +
  7. Monitoring & Alerting: Visualize metrics and trigger alerts on abnormal conditions
  8. +
+

Technology Stack

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
EnvironmentIngestionStream ProcessingStorage / QueryMonitoring & Alerting
Local / Open-SourceKafkaApache FlinkInfluxDBGrafana / Python triggers
AWSKinesisManaged Service Apache FlinkTimestreamCloudWatch / SNS
GCPPub/SubDataproc for Apache FlinkCloud BigtableCloud Monitoring / Pub/Sub alerts
+

Key points to consider

+ +
+

Useful tips

+ +

Resources

+

Apache Spark Docker

+

DuckDB Docs Python API

+

Debezium Tutorial

+

Introduction to MinIO | Baeldung

+

Future Data Systems Article

+

Best practices for optimizing Apache Iceberg workloads

+
\ No newline at end of file diff --git a/_sass/custom/_blog.scss b/_sass/custom/_blog.scss index 53d4adc7..59deac52 100644 --- a/_sass/custom/_blog.scss +++ b/_sass/custom/_blog.scss @@ -34,7 +34,7 @@ .article-header-img { width: 100%; - height: 500px; + height: 550px; margin-bottom: map-get($spacers, 5); img { @@ -106,6 +106,24 @@ } } + .bordered-table { + border-collapse: collapse; + width: 100%; + border: 1px solid black; + } + + .bordered-table th, + .bordered-table td { + border: 1px solid black; + padding: 8px; + text-align: left; + } + + .bordered-table td { + white-space: normal; + word-spacing: normal; + } + .blockquote-footer { font-size: $font-title-m; } @@ -193,6 +211,7 @@ width: fit-content; margin: map-get($spacers, 2) auto 0; } + .img-notes { font-size: 12px; font-style: italic; diff --git a/assets/images/blog/2025-12-23-gnec-hackathon-win.png b/assets/images/blog/2025-12-23-gnec-hackathon-win.png new file mode 100644 index 00000000..2b13cdde Binary files /dev/null and b/assets/images/blog/2025-12-23-gnec-hackathon-win.png differ diff --git a/assets/images/blog/2025-12-25-data-engineering-portfolio-projects.jpg b/assets/images/blog/2025-12-25-data-engineering-portfolio-projects.jpg new file mode 100644 index 00000000..611a5580 Binary files /dev/null and b/assets/images/blog/2025-12-25-data-engineering-portfolio-projects.jpg differ