feat: Implement conversation filtering to prevent saving trivial and repeated messages to AI training data. #382
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds filtering logic to the AI route to prevent low-value or redundant conversations from being saved in the
ai_training_datatable.This ensures cleaner, more meaningful training data for analytics and future AI improvements.
✅ Changes Included
1. Added
shouldSaveConversationFunction (after line 83)A new utility function determines whether a conversation should be saved based on:
Simple Greetings
Filtered if message matches (case-insensitive):
hi,hello,hey,hii,hiii,sup,yo,hai,helo,hlloAlso matches versions with punctuation.
Casual Chit-Chat
Filtered if:
Message shorter than 10 characters
Contains common low-value phrases:
thanks,ok,okay,cool,nice,lol,hahaRepeated Questions
Filtered if:
Same
user_idSame
query_textExists in DB within last 24 hours
Function Behavior
2. Streaming Mode Update (lines 1089–1097)
Wrapped DB save logic in:
Error handling unchanged.
Low-value messages are now skipped in streaming mode.
3. Non-Streaming Mode Update (lines 1170–1178)
Same conditional filter applied.
Keeps current error handling.
Ensures consistency between streaming + non-streaming behaviors.
🧪 Verification Plan
Manual Testing
Simple Greetings
Send messages like:
hihelloheyExpected: Not stored
Casual Chit-Chat
Send:
thanksokcoolExpected: Not stored
Repeated Questions
Send the same question twice within 24 hours:
Message 1:
"What is Supabase?"→ storedMessage 2:
"What is Supabase?"→ not storedValuable Conversation
Send a multi-sentence or technical question.
Expected: stored
DB Verification
Check that only appropriate entries appear in
ai_training_dataConfirm duplicate filtering respects 24-hour window
🔍 Notes
No changes to API output
Only database write behavior modified
Significantly improves quality of AI training data
Authored by: @akshay0611
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.