Skip to content

Upload CSV files, ask questions in plain English. 3 NVIDIA AI agents collaborate live to merge your data and find answers. Built with Next.js 15, Neon PostgreSQL, and Nemotron 253B.

Notifications You must be signed in to change notification settings

lubobali/mergeAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MergeAI — Your AI Data Analyst

Upload spreadsheets. Ask in plain English. Watch 3 AI agents find the answer — live.

Try Live  |  Full Demo  |  Vibes Demo  |  How It Works  |  Tech Stack  |  Architecture

Built solo in 48 hours for the Vibe Coding Hackathon 2026. Powered by NVIDIA NIM.


Most analytics tools make you drag and drop, write formulas, or learn SQL. MergeAI doesn't. You upload your CSV files, type a question like "which department spends the most on training?", and three AI agents collaborate in real-time to find the answer. No setup. No mapping. No SQL.

MergeAI Landing Page


HOW IT WORKS

You upload two spreadsheets that have never seen each other. One has employee data, the other has training records. You type: "Compare training cost by department."

Here's what happens — and you watch it happen live:

Schema Map — see your table connections to make join queries:

MergeAI Schema Map — auto-detected file joins

┌──────────────┐       ┌──────────────┐       ┌──────────────┐
│  Schema Agent │  ──→  │  SQL Agent   │  ──→  │  Validator   │
│  (Nano 8B)   │       │  (253B Ultra)│       │ (Deterministic)│
│              │  ←──  │              │  ←──  │              │
│ Finds joins  │ retry │ Writes SQL   │ retry │ Checks results│
└──────────────┘       └──────────────┘       └──────────────┘
  1. Schema Agent reads both files, understands the columns, spots that EmpID in one file matches Employee ID in the other
  2. SQL Agent writes a real PostgreSQL query — CTEs, JSONB extraction, proper JOINs — to merge the data across files
  3. Validator executes the query and checks the results. Zero rows? Case mismatch? Wrong column? It sends feedback and the agents retry. Up to 3 rounds of self-correction.

Results appear in a clean table with a plain English summary:

"The data shows a distribution of training outcomes, with the most frequent being 'Incomplete' at 775 instances, followed by 'Completed' at 770, 'Passed' at 739, and the least frequent being 'Failed' at 716."

The whole thing takes about 8 seconds.

Pie Chart — Training Outcome Distribution:

MergeAI Dashboard — pie chart with 3 agents done


WHY THIS EXISTS

Tool What You Need To Do
Tableau Manually drag-and-drop join configuration
Power BI Create composite data models
Looker Write LookML definitions
ChatGPT Hope that in-memory pandas doesn't crash
MergeAI Type one sentence

Your data lives in a real PostgreSQL database. The queries are real SQL. The joins are real joins. Click "View SQL" to see exactly what the AI wrote — full transparency.

Table Preview — click any file name to browse your data before querying:

Data Preview — browse your CSV data before querying


TECHNICAL INNOVATION

3-Agent Pipeline with Self-Correction

Not a single LLM call that hopes for the best. Three specialized agents with a feedback loop:

Round 1: Schema Agent analyzes → SQL Agent generates → Validator checks
         ↓ (if 0 rows or errors)
Round 2: Schema Agent re-analyzes with feedback → SQL Agent regenerates → Validator re-checks
         ↓ (if still failing)
Round 3: Final attempt with accumulated context → Best-effort result

NVIDIA NIM — Two Models Collaborating

Agent Model Why
Schema Agent Nemotron Nano 8B Fast schema analysis, JSON output, ~200ms
SQL Agent Nemotron Ultra 253B Most accurate SQL generation, handles complex CTEs
Summary Agent Nemotron Nano 8B Quick NL summary of results

Per NVIDIA docs: "detailed thinking off" system prompt disables reasoning traces for clean SQL output from 253B.

Universal JSONB Storage

Every CSV file — any schema, any columns — gets stored the same way:

-- One table handles ALL CSV files
uploaded_rows (
  file_id   UUID,        -- which file
  row_data  JSONB        -- {"Name": "Alice", "Salary": "85000", "Dept": "Engineering"}
)

-- Agent-generated query (real example):
WITH employees AS (
  SELECT row_data->>'EmpID' AS emp_id,
         row_data->>'DepartmentType' AS dept
  FROM uploaded_rows WHERE file_id = 'abc-123'
),
training AS (
  SELECT row_data->>'Employee ID' AS emp_id,
         (row_data->>'Training Cost')::NUMERIC AS cost
  FROM uploaded_rows WHERE file_id = 'def-456'
)
SELECT dept, AVG(cost) AS avg_training_cost
FROM employees JOIN training ON LOWER(emp_id) = LOWER(emp_id)
GROUP BY dept ORDER BY avg_training_cost DESC;

Interactive Plotly Charts

Five chart types generated automatically based on your query — bar, pie, line, scatter, and heatmap:

Heatmap — Average Training Cost by Department and Training Type:

Heatmap — Average Training Cost by Department and Training Type

Real-Time Agent Visualization (SSE + Framer Motion)

Server-Sent Events stream agent status to the browser. Framer Motion animates each agent card through states:

idle → active (pulsing blue) → done (green) → or retry (orange) → back to active

AG-UI Protocol event naming: agent_start, agent_complete, round_retry, query_complete.

Line Chart — Average Training Cost by Month:

Line Chart — Average Training Cost by Month


THE TECH

Layer Tech Why
Vibe Coding AdaL CLI AI-assisted development, hackathon workflow
Framework Next.js 15 (App Router) React 19, server components, API routes
AI Models NVIDIA NIM API Nemotron 253B + Nano 8B via OpenAI-compatible SDK
Database Neon PostgreSQL Serverless HTTP mode, zero connection overhead
ORM Drizzle v0.45.1 Typed JSONB, lightest ORM, SQL-first
Streaming SSE (ReadableStream) Native, zero deps, real-time agent updates
Animation Framer Motion Agent card state transitions
Auth Clerk Sign-up/sign-in in 10 minutes, free tier
CSV Parsing Papa Parse Client-side, fast, handles any format
Deploy Vercel Auto-deploy from GitHub

TRY IT

Demo data is pre-loaded — 3,000 employees + 3,000 training records. Click any example query and watch the agents work.

Then upload your own CSV files. Any schema. Any columns. Any data. The agents figure it out.


LOCAL SETUP

git clone https://github.com/lubobali/mergeAI.git
cd mergeAI
npm install

Create .env.local:

DATABASE_URL=your_neon_connection_string
NVIDIA_API_KEY=your_nvidia_nim_key
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=your_clerk_key
CLERK_SECRET_KEY=your_clerk_secret
npx drizzle-kit push    # create tables
npm run dev             # start dev server

Built for Vibe Coding Hackathon 2026 with AdaL CLI

About

Upload CSV files, ask questions in plain English. 3 NVIDIA AI agents collaborate live to merge your data and find answers. Built with Next.js 15, Neon PostgreSQL, and Nemotron 253B.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 2

  •  
  •  

Languages