Skip to content

Conversation

@codeunia-dev
Copy link
Owner

@codeunia-dev codeunia-dev commented Nov 26, 2025

This PR adds filtering logic to the AI route to prevent low-value or redundant conversations from being saved in the ai_training_data table.
This ensures cleaner, more meaningful training data for analytics and future AI improvements.


✅ Changes Included

1. Added shouldSaveConversation Function (after line 83)

A new utility function determines whether a conversation should be saved based on:

Simple Greetings

Filtered if message matches (case-insensitive):
hi, hello, hey, hii, hiii, sup, yo, hai, helo, hllo
Also matches versions with punctuation.

Casual Chit-Chat

Filtered if:

  • Message shorter than 10 characters

  • Contains common low-value phrases:
    thanks, ok, okay, cool, nice, lol, haha

Repeated Questions

Filtered if:

  • Same user_id

  • Same query_text

  • Exists in DB within last 24 hours

Function Behavior

Condition Save?
Greeting / Chit-chat ❌ No
Duplicate (24h) ❌ No
Meaningful message ✅ Yes

2. Streaming Mode Update (lines 1089–1097)

  • Wrapped DB save logic in:

    if (await shouldSaveConversation(...)) { ... }
    
  • Error handling unchanged.

  • Low-value messages are now skipped in streaming mode.


3. Non-Streaming Mode Update (lines 1170–1178)

  • Same conditional filter applied.

  • Keeps current error handling.

  • Ensures consistency between streaming + non-streaming behaviors.


🧪 Verification Plan

Manual Testing

Simple Greetings

Send messages like:

  • hi

  • hello

  • hey

Expected: Not stored

Casual Chit-Chat

Send:

  • thanks

  • ok

  • cool

Expected: Not stored

Repeated Questions

Send the same question twice within 24 hours:

Message 1: "What is Supabase?"stored
Message 2: "What is Supabase?"not stored

Valuable Conversation

Send a multi-sentence or technical question.
Expected: stored

DB Verification

  • Check that only appropriate entries appear in ai_training_data

  • Confirm duplicate filtering respects 24-hour window


🔍 Notes

  • No changes to API output

  • Only database write behavior modified

  • Significantly improves quality of AI training data


Authored by: @akshay0611

Summary by CodeRabbit

  • New Features
    • Added intelligent conversation filtering for AI training data that excludes simple greetings, very brief messages, and duplicate questions from the same user within 24 hours, improving training data quality for better AI responses.

✏️ Tip: You can customize this high-level summary in your review settings.

@vercel
Copy link

vercel bot commented Nov 26, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
codeunia Ready Ready Preview Comment Nov 26, 2025 9:45am

@codeunia-dev codeunia-dev merged commit b7bac7f into main Nov 26, 2025
3 of 4 checks passed
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 26, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Introduced conversation-filtering logic to prevent saving trivial messages, greetings, and duplicate questions to the training database in the AI route handler. The filter checks message content, length, and applies duplicate detection via database query before persisting conversations.

Changes

Cohort / File(s) Summary
Conversation Filtering Logic
app/api/ai/route.ts
Added shouldSaveConversation(message, userId) function that filters greetings, short/casual messages, and duplicate questions from the last 24 hours. Integrated conditional save logic into both streaming and non-streaming response paths, with fallback to allow saving on query errors. Added filtering outcome logging.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Handler as AI Route Handler
    participant AI as AI Service
    participant DB as Database
    
    Client->>Handler: POST /api/ai
    Handler->>AI: Process message
    AI-->>Handler: Response
    Handler->>Handler: shouldSaveConversation(message, userId)
    
    rect rgb(200, 220, 255)
    Note over Handler: Filter Checks
    Handler->>Handler: 1. Check for greetings
    Handler->>Handler: 2. Check for short/casual
    Handler->>DB: 3. Query duplicates (24h)
    DB-->>Handler: Duplicate check result
    end
    
    alt Filter Passes
        Handler->>DB: Save to ai_training_data
        DB-->>Handler: Success
        Handler->>Handler: Log: Saved
    else Filter Fails
        Handler->>Handler: Log: Filtered out
    end
    
    Handler-->>Client: Return response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Database query performance: The 24-hour duplicate detection query should be reviewed for indexing and potential performance implications under load.
  • Filter logic correctness: Verify that the greeting list and casual-phrase detection are comprehensive enough and don't over-filter valid conversations.
  • Error handling edge cases: Confirm the conservative "allow-on-error" strategy is appropriate and won't mask problematic database states.

Possibly related PRs

Poem

🐰 A rabbit hops through chatter bold,
Sifting hellos, tales retold,
"Just the meaningful," she declares with cheer,
Filtering whispers, keeping wisdom dear! ✨

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/unio

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f01dec4 and 453f3c1.

📒 Files selected for processing (1)
  • app/api/ai/route.ts (3 hunks)

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants