GitHub - lubobali/mergeAI: Upload CSV files, ask questions in plain English. 3 NVIDIA AI agents collaborate live to merge your data and find answers. Built with Next.js 15, Neon PostgreSQL, and Nemotron 253B.

MergeAI — Your AI Data Analyst

Upload spreadsheets. Ask in plain English. Watch 3 AI agents find the answer — live.

Built solo in 48 hours for the Vibe Coding Hackathon 2026. Powered by NVIDIA NIM.

Most analytics tools make you drag and drop, write formulas, or learn SQL. MergeAI doesn't. You upload your CSV files, type a question like "which department spends the most on training?", and three AI agents collaborate in real-time to find the answer. No setup. No mapping. No SQL.

HOW IT WORKS

You upload two spreadsheets that have never seen each other. One has employee data, the other has training records. You type: "Compare training cost by department."

Here's what happens — and you watch it happen live:

Schema Map — see your table connections to make join queries:

┌──────────────┐       ┌──────────────┐       ┌──────────────┐
│  Schema Agent │  ──→  │  SQL Agent   │  ──→  │  Validator   │
│  (Nano 8B)   │       │  (253B Ultra)│       │ (Deterministic)│
│              │  ←──  │              │  ←──  │              │
│ Finds joins  │ retry │ Writes SQL   │ retry │ Checks results│
└──────────────┘       └──────────────┘       └──────────────┘

Schema Agent reads both files, understands the columns, spots that EmpID in one file matches Employee ID in the other
SQL Agent writes a real PostgreSQL query — CTEs, JSONB extraction, proper JOINs — to merge the data across files
Validator executes the query and checks the results. Zero rows? Case mismatch? Wrong column? It sends feedback and the agents retry. Up to 3 rounds of self-correction.

Results appear in a clean table with a plain English summary:

"The data shows a distribution of training outcomes, with the most frequent being 'Incomplete' at 775 instances, followed by 'Completed' at 770, 'Passed' at 739, and the least frequent being 'Failed' at 716."

The whole thing takes about 8 seconds.

Pie Chart — Training Outcome Distribution:

WHY THIS EXISTS

Tool	What You Need To Do
Tableau	Manually drag-and-drop join configuration
Power BI	Create composite data models
Looker	Write LookML definitions
ChatGPT	Hope that in-memory pandas doesn't crash
MergeAI	Type one sentence

Your data lives in a real PostgreSQL database. The queries are real SQL. The joins are real joins. Click "View SQL" to see exactly what the AI wrote — full transparency.

Table Preview — click any file name to browse your data before querying:

TECHNICAL INNOVATION

3-Agent Pipeline with Self-Correction

Not a single LLM call that hopes for the best. Three specialized agents with a feedback loop:

Round 1: Schema Agent analyzes → SQL Agent generates → Validator checks
         ↓ (if 0 rows or errors)
Round 2: Schema Agent re-analyzes with feedback → SQL Agent regenerates → Validator re-checks
         ↓ (if still failing)
Round 3: Final attempt with accumulated context → Best-effort result

NVIDIA NIM — Two Models Collaborating

Agent	Model	Why
Schema Agent	Nemotron Nano 8B	Fast schema analysis, JSON output, ~200ms
SQL Agent	Nemotron Ultra 253B	Most accurate SQL generation, handles complex CTEs
Summary Agent	Nemotron Nano 8B	Quick NL summary of results

Per NVIDIA docs: "detailed thinking off" system prompt disables reasoning traces for clean SQL output from 253B.

Universal JSONB Storage

Every CSV file — any schema, any columns — gets stored the same way:

-- One table handles ALL CSV files
uploaded_rows (
  file_id   UUID,        -- which file
  row_data  JSONB        -- {"Name": "Alice", "Salary": "85000", "Dept": "Engineering"}
)

-- Agent-generated query (real example):
WITH employees AS (
  SELECT row_data->>'EmpID' AS emp_id,
         row_data->>'DepartmentType' AS dept
  FROM uploaded_rows WHERE file_id = 'abc-123'
),
training AS (
  SELECT row_data->>'Employee ID' AS emp_id,
         (row_data->>'Training Cost')::NUMERIC AS cost
  FROM uploaded_rows WHERE file_id = 'def-456'
)
SELECT dept, AVG(cost) AS avg_training_cost
FROM employees JOIN training ON LOWER(emp_id) = LOWER(emp_id)
GROUP BY dept ORDER BY avg_training_cost DESC;

Interactive Plotly Charts

Five chart types generated automatically based on your query — bar, pie, line, scatter, and heatmap:

Heatmap — Average Training Cost by Department and Training Type:

Real-Time Agent Visualization (SSE + Framer Motion)

Server-Sent Events stream agent status to the browser. Framer Motion animates each agent card through states:

idle → active (pulsing blue) → done (green) → or retry (orange) → back to active

AG-UI Protocol event naming: agent_start, agent_complete, round_retry, query_complete.

Line Chart — Average Training Cost by Month:

THE TECH

Layer	Tech	Why
Vibe Coding	AdaL CLI	AI-assisted development, hackathon workflow
Framework	Next.js 15 (App Router)	React 19, server components, API routes
AI Models	NVIDIA NIM API	Nemotron 253B + Nano 8B via OpenAI-compatible SDK
Database	Neon PostgreSQL	Serverless HTTP mode, zero connection overhead
ORM	Drizzle v0.45.1	Typed JSONB, lightest ORM, SQL-first
Streaming	SSE (ReadableStream)	Native, zero deps, real-time agent updates
Animation	Framer Motion	Agent card state transitions
Auth	Clerk	Sign-up/sign-in in 10 minutes, free tier
CSV Parsing	Papa Parse	Client-side, fast, handles any format
Deploy	Vercel	Auto-deploy from GitHub

TRY IT

Demo data is pre-loaded — 3,000 employees + 3,000 training records. Click any example query and watch the agents work.

Then upload your own CSV files. Any schema. Any columns. Any data. The agents figure it out.

LOCAL SETUP

git clone https://github.com/lubobali/mergeAI.git
cd mergeAI
npm install

Create .env.local:

DATABASE_URL=your_neon_connection_string
NVIDIA_API_KEY=your_nvidia_nim_key
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=your_clerk_key
CLERK_SECRET_KEY=your_clerk_secret

npx drizzle-kit push    # create tables
npm run dev             # start dev server

Built for Vibe Coding Hackathon 2026 with AdaL CLI

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
public		public
screenshots		screenshots
scripts		scripts
src		src
.gitignore		.gitignore
.npmrc		.npmrc
CHANGES.md		CHANGES.md
CHAT_THREAD_PLAN.txt		CHAT_THREAD_PLAN.txt
Hackathon_CrossQuery_Plan.txt		Hackathon_CrossQuery_Plan.txt
Hackathon_Task.txt		Hackathon_Task.txt
PLAN_100_SCORE.txt		PLAN_100_SCORE.txt
README.md		README.md
UPGRADE_IDEAS.txt		UPGRADE_IDEAS.txt
drizzle.config.ts		drizzle.config.ts
eslint.config.mjs		eslint.config.mjs
explainer_lubobali.md		explainer_lubobali.md
feedback_mergeAI.pdf		feedback_mergeAI.pdf
landing_page_lubobali.png		landing_page_lubobali.png
mergeAI_submission.zip		mergeAI_submission.zip
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MergeAI — Your AI Data Analyst

HOW IT WORKS

WHY THIS EXISTS

TECHNICAL INNOVATION

3-Agent Pipeline with Self-Correction

NVIDIA NIM — Two Models Collaborating

Universal JSONB Storage

Interactive Plotly Charts

Real-Time Agent Visualization (SSE + Framer Motion)

THE TECH

TRY IT

LOCAL SETUP

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

lubobali/mergeAI

Folders and files

Latest commit

History

Repository files navigation

MergeAI — Your AI Data Analyst

HOW IT WORKS

WHY THIS EXISTS

TECHNICAL INNOVATION

3-Agent Pipeline with Self-Correction

NVIDIA NIM — Two Models Collaborating

Universal JSONB Storage

Interactive Plotly Charts

Real-Time Agent Visualization (SSE + Framer Motion)

THE TECH

TRY IT

LOCAL SETUP

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages