Prevent Redundant API Calls via Query/URL Deduplication by thanay-sisir · Pull Request #12 · Pokee-AI/PokeeResearchOSS

thanay-sisir · 2025-11-20T09:11:31Z

🚀 Query/URL Deduplication Feature

🎯 Executive Summary

I implemented a session-level deduplication mechanism to prevent redundant web searches and URL reads within the BaseDeepResearchAgent. This change significantly improves research performance, reduces unnecessary API costs, and speeds up response times.

⚠️ The Problem (Why)

Redundant Tool Calls: In multi-turn research, the agent frequently made the same search or read the same URL multiple times.
High Operational Cost: This resulted in wasted processing time, increased latency, and unnecessarily high usage costs for search and web scraping APIs.
Poor User Experience: The inefficiency slowed down the overall research loop, leading to longer wait times for the user.

🛠️ The Solution (How)

I introduced session-scoped state tracking using two Python set() objects: _seen_queries and _seen_urls.

Tracking: These sets are initialized at the start of a research session and automatically cleaned up afterward.
Execution Filter: Before executing any web_search or web_read tool call, I filter the input list (query_list or url_list) to remove any items already present in the respective tracking set.
Cache Mimicry: If the input list is entirely composed of duplicates, the tool call is skipped entirely, and a mock response is returned immediately to maintain consistent response format without incurring an API cost or latency penalty.
Proactive Update: The sets are updated before execution to handle all remaining unique items, ensuring future requests recognize them as seen.

✅ Key Benefits

⚡ Performance & Cost Savings: Eliminates redundant network and API calls, leading to reduced latency and significantly lower operational costs.
🚀 Faster Responses: The agent focuses on processing new information, making the overall research loops much faster for the end-user.
🔬 Clean Implementation: The change is non-invasive, adding only about 45 lines of focused code with no new dependencies, ensuring full backward compatibility.
🔒 Reliability: The session-scoped tracking prevents state leakage between different research sessions and uses efficient $O(1)$ set lookups for optimal performance.

…itive scores and, if none exist, safely defaults to the best three overall items, always ensuring the final list is sorted by score.

… (non-refusal). Enhances performance reporting.

thanay-sisir added 5 commits November 17, 2025 22:38

Rank Serper URLs by trusted domain weights

4e48257

URL weights and 0 URLS edgecase

4dc8ff1

robust item selection process that first tries to find items with pos…

8b13acf

…itive scores and, if none exist, safely defaults to the best three overall items, always ensuring the final list is sorted by score.

Updated main_rts.py to display Success Rate (>=0.8) and Coverage Rate…

9a84db5

… (non-refusal). Enhances performance reporting.

Add query/URL deduplication to prevent redundant tool calls

f0d6a0f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent Redundant API Calls via Query/URL Deduplication#12

Prevent Redundant API Calls via Query/URL Deduplication#12
thanay-sisir wants to merge 5 commits intoPokee-AI:mainfrom
thanay-sisir:URL_deduplication

thanay-sisir commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

thanay-sisir commented Nov 20, 2025

🚀 Query/URL Deduplication Feature

🎯 Executive Summary

⚠️ The Problem (Why)

🛠️ The Solution (How)

✅ Key Benefits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments