feat: Implement conversation filtering to prevent saving trivial and repeated messages to AI training data. #382

codeunia-dev · 2025-11-26T09:45:53Z

This PR adds filtering logic to the AI route to prevent low-value or redundant conversations from being saved in the ai_training_data table.
This ensures cleaner, more meaningful training data for analytics and future AI improvements.

✅ Changes Included

1. Added `shouldSaveConversation` Function (after line 83)

A new utility function determines whether a conversation should be saved based on:

Simple Greetings

Filtered if message matches (case-insensitive):
hi, hello, hey, hii, hiii, sup, yo, hai, helo, hllo
Also matches versions with punctuation.

Casual Chit-Chat

Filtered if:

Message shorter than 10 characters
Contains common low-value phrases:
thanks, ok, okay, cool, nice, lol, haha

Repeated Questions

Filtered if:

Same user_id
Same query_text
Exists in DB within last 24 hours

Function Behavior

Condition	Save?
Greeting / Chit-chat	❌ No
Duplicate (24h)	❌ No
Meaningful message	✅ Yes

2. Streaming Mode Update (lines 1089–1097)

Wrapped DB save logic in:

if (await shouldSaveConversation(...)) { ... }

Error handling unchanged.
Low-value messages are now skipped in streaming mode.

3. Non-Streaming Mode Update (lines 1170–1178)

Same conditional filter applied.
Keeps current error handling.
Ensures consistency between streaming + non-streaming behaviors.

🧪 Verification Plan

Manual Testing

Simple Greetings

Send messages like:

hi
hello
hey

Expected: Not stored

Casual Chit-Chat

Send:

thanks
ok
cool

Expected: Not stored

Repeated Questions

Send the same question twice within 24 hours:

Message 1: "What is Supabase?" → stored
Message 2: "What is Supabase?" → not stored

Valuable Conversation

Send a multi-sentence or technical question.
Expected: stored

DB Verification

Check that only appropriate entries appear in ai_training_data
Confirm duplicate filtering respects 24-hour window

🔍 Notes

No changes to API output
Only database write behavior modified
Significantly improves quality of AI training data

Authored by: @akshay0611

Summary by CodeRabbit

New Features
- Added intelligent conversation filtering for AI training data that excludes simple greetings, very brief messages, and duplicate questions from the same user within 24 hours, improving training data quality for better AI responses.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…repeated messages to AI training data.

vercel · 2025-11-26T09:45:57Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
codeunia	Ready	Preview	Comment	Nov 26, 2025 9:45am

coderabbitai · 2025-11-26T09:46:12Z

Caution

Review failed

The pull request is closed.

Walkthrough

Introduced conversation-filtering logic to prevent saving trivial messages, greetings, and duplicate questions to the training database in the AI route handler. The filter checks message content, length, and applies duplicate detection via database query before persisting conversations.

Changes

Cohort / File(s)	Summary
Conversation Filtering Logic `app/api/ai/route.ts`	Added `shouldSaveConversation(message, userId)` function that filters greetings, short/casual messages, and duplicate questions from the last 24 hours. Integrated conditional save logic into both streaming and non-streaming response paths, with fallback to allow saving on query errors. Added filtering outcome logging.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Handler as AI Route Handler
    participant AI as AI Service
    participant DB as Database
    
    Client->>Handler: POST /api/ai
    Handler->>AI: Process message
    AI-->>Handler: Response
    Handler->>Handler: shouldSaveConversation(message, userId)
    
    rect rgb(200, 220, 255)
    Note over Handler: Filter Checks
    Handler->>Handler: 1. Check for greetings
    Handler->>Handler: 2. Check for short/casual
    Handler->>DB: 3. Query duplicates (24h)
    DB-->>Handler: Duplicate check result
    end
    
    alt Filter Passes
        Handler->>DB: Save to ai_training_data
        DB-->>Handler: Success
        Handler->>Handler: Log: Saved
    else Filter Fails
        Handler->>Handler: Log: Filtered out
    end
    
    Handler-->>Client: Return response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Database query performance: The 24-hour duplicate detection query should be reviewed for indexing and potential performance implications under load.
Filter logic correctness: Verify that the greeting list and casual-phrase detection are comprehensive enough and don't over-filter valid conversations.
Error handling edge cases: Confirm the conservative "allow-on-error" strategy is appropriate and won't mask problematic database states.

Possibly related PRs

feat: Add ChatGPT-style streaming responses using SSE for Unio AI assistant #380: Implements streaming response handling and initial save logic for accumulated streamed responses in the same AI route file—this PR adds the filtering gate on top of that save mechanism.

Poem

🐰 A rabbit hops through chatter bold,
Sifting hellos, tales retold,
"Just the meaningful," she declares with cheer,
Filtering whispers, keeping wisdom dear! ✨

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/unio

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f01dec4 and 453f3c1.

📒 Files selected for processing (1)

app/api/ai/route.ts (3 hunks)

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

feat: Implement conversation filtering to prevent saving trivial and …

453f3c1

…repeated messages to AI training data.

codeunia-dev merged commit b7bac7f into main Nov 26, 2025
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement conversation filtering to prevent saving trivial and repeated messages to AI training data. #382

feat: Implement conversation filtering to prevent saving trivial and repeated messages to AI training data. #382

Uh oh!

codeunia-dev commented Nov 26, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Nov 26, 2025

Uh oh!

Uh oh!

coderabbitai bot commented Nov 26, 2025 •

edited

Loading

Review failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Implement conversation filtering to prevent saving trivial and repeated messages to AI training data. #382

feat: Implement conversation filtering to prevent saving trivial and repeated messages to AI training data. #382

Uh oh!

Conversation

codeunia-dev commented Nov 26, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Changes Included

1. Added shouldSaveConversation Function (after line 83)

Simple Greetings

Casual Chit-Chat

Repeated Questions

Function Behavior

2. Streaming Mode Update (lines 1089–1097)

3. Non-Streaming Mode Update (lines 1170–1178)

🧪 Verification Plan

Manual Testing

Simple Greetings

Casual Chit-Chat

Repeated Questions

Valuable Conversation

DB Verification

🔍 Notes

Summary by CodeRabbit

Uh oh!

vercel bot commented Nov 26, 2025

Uh oh!

Uh oh!

coderabbitai bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codeunia-dev commented Nov 26, 2025 •

edited by coderabbitai bot

Loading

1. Added `shouldSaveConversation` Function (after line 83)

coderabbitai bot commented Nov 26, 2025 •

edited

Loading