An event-driven analytics system designed to model user activity, compute engagement metrics, and classify churn using a clean, append-only architecture.
This project focuses on correct event modeling, SQL-first analytics, and analytical integrity, not dashboards or infrastructure automation.
- Overview
- Architecture
- Core Data Model
- Ingestion Layer
- Analytics Layer
- Churn Classification
- How to Run
- Design Decisions & Tradeoffs
- Scope Boundaries
- Project Structure
- Future Extensions
Modern analytics systems are built around immutable event streams.
This project implements a simplified but production-aligned event analytics pipeline:
- Append-only event storage
- Idempotent ingestion
- SQL-defined metrics
- Retention and churn modeling
- Full metric recomputability
The system is intentionally local-first and infrastructure-light to focus on analytical depth and data modeling correctness.
Architecture
Client / Script
↓
Ingestion Layer (Python)
↓
PostgreSQL (events table)
↓
SQL Analytics Layer
↓
Insights
The system is built around a single immutable table:
events
- One row = one atomic action
- Append-only
- No derived state stored
- All metrics recomputable
Key columns:
event_id(UUID, primary key)event_nameevent_timeuser_idproperties(JSONB)ingested_at
See full schema documentation in:
Schema
Implemented in Python.
Responsibilities:
- Validate incoming events
- Enforce event contract
- Guarantee idempotency
- Persist immutable records
Key characteristics:
- No business logic
- No metric computation
- Strict input validation
- Safe retry handling
See: Ingestion Layer
All analytical intelligence lives in SQL.
Metrics implemented:
- Daily Active Users (DAU)
- Signup counts
- Basic retention
- Day-1 retention
- Cohort retention
- Signup → Activity
- Signup → Day-1 Activity
- Multi-window churn classification
All metrics are derived exclusively from raw events.
See: Analytics Layer
Users are categorized based on inactivity duration:
- Active: < 7 days inactive
- Soft churn: 7–13 days
- Churned: 14–29 days
- Hard churn: ≥ 30 days
Churn is inferred from inactivity — not explicit events.
This mirrors real-world product analytics practices.
churn_status | users
–––––––+—––
active | 1
soft_churn | 1
churned | 1
hard_churn | 1
- Install Dependencies
bash
pip install -r requirements.txt
-
Configure Environment
cp env.example .envUpdate with your PostgreSQL credentials.
-
Create Database
createdb event_analytics -
Apply Schema
psql event_analytics -f sql/schema/events.sql -
Ingest Test Even
python -m ingestion.test_ingest -
Run Analytics
psql event_analytics \i sql/analytics/churn_classification.sql
Why a Single Events Table?
• Simplifies modeling
• Preserves historical truth
• Avoids premature normalization
• Enables recomputability
Why SQL-First Analytics?
• Transparent metric definitions
• Declarative logic
• Easy reasoning about time
• No hidden transformations
Why No Dashboard?
The focus is metric correctness, not visualization.
This project intentionally excludes:
• Cloud deployment
• Streaming systems
• Scheduling/orchestration
• Dashboard tools
• Machine learning
• Derived state tables
The goal is depth in event modeling and analytics logic.
Event-Analytics-Project/
│
├── ingestion/
├── sql/
│ ├── schema/
│ └── analytics/
├── docs/
├── assets/
├── requirements.txt
├── env.example
├── CONTRIBUTING.md
└── README.md
Possible next steps: • Materialized views for heavy queries • Automated metric refresh • Spark-based scaling • Cloud data warehouse migration • ML-based churn prediction
These are intentionally separate layers.



