DeepSeekDeepResearch is a Python-based AI research assistant that continuously searches for and extracts relevant information from the web based on a user query. It leverages a local LLM (DeepSeek-R1:7b via Ollama), the Google Custom Search API for search queries, and asynchronous webpage extraction (Newspaper3k or Jina) to generate a comprehensive, detailed report on a given topic.
The final report is saved as final_report.txt in the repository directory upon completion.
- Iterative Research Loop - Continuously refines search queries and gathers context until no additional queries are needed.
- Asynchronous Processing - Performs searches, webpage fetching, and context extraction concurrently for improved speed.
- Local LLM-Powered Decision Making - Uses DeepSeek-R1:7b via Ollama to:
- Generate precise search queries
- Evaluate webpage usefulness
- Extract relevant context from webpages
- Produce a final comprehensive report
- Google Custom Search Integration - Uses the Google Custom Search JSON API to retrieve relevant links for each generated query.
- Webpage Extraction - Extracts webpage content using Newspaper3k (with Jina as a fallback if needed).
- Duplicate Filtering - Aggregates and deduplicates links across search rounds to ensure efficiency.
- Final Report Generation and Saving - Compiles all extracted information into a detailed final report and saves it as
final_report.txt.
git clone https://github.com/yourusername/local-deep-researcher.git
cd local-deep-researcher- Open the
main.pyfile. - Replace the placeholders for
GOOGLE_API_KEYandGOOGLE_CXwith your actual Google API Key and Custom Search Engine ID. - If using Jina for webpage extraction, update
JINA_API_KEYandJINA_BASE_URLas needed.
Verify that Ollama is installed on your system and that the following command runs without errors:
ollama run deepseek-r1:7bEnsure you have Python 3.8+ and install the required dependencies:
pip install nest_asyncio aiohttp tenacity google-api-python-client newspaper3kExecute the script from your terminal:
python main.py- Research Query/Topic: Enter a research query or topic (e.g., "What is UAP?").
- Maximum Number of Iterations: Optionally, specify the max iterations (default is 10).
-
Initial Query & Search Generation:
- The local LLM (DeepSeek-R1:7b via Ollama) generates up to four distinct search queries based on your input.
-
Concurrent Search & Extraction:
- Each query is sent to the Google Custom Search API.
- The program aggregates, deduplicates links, and fetches webpage content asynchronously.
-
Evaluation & Context Extraction:
- The LLM evaluates webpage relevance and extracts useful context.
-
Iterative Refinement:
- The LLM determines if additional search queries are needed.
- The process repeats until:
- The iteration limit is reached, or
- No new queries are generated.
-
Final Report Generation & Saving:
- A detailed final report is compiled, printed to the console, and saved as
final_report.txt.
- A detailed final report is compiled, printed to the console, and saved as
- The LLM processes the user’s query and generates up to four distinct search queries.
- Each query is sent concurrently to the Google Custom Search API.
- The returned links are aggregated and deduplicated.
For each unique link:
- Content Extraction → Extracted using Newspaper3k (or Jina as a fallback).
- Usefulness Evaluation → The LLM checks if the content is relevant.
- Context Extraction → Extracts key information from relevant pages.
- The LLM reviews gathered context and decides if additional searches are needed.
- If required, new search queries are generated; otherwise, the loop terminates.
- All relevant information is compiled and passed to the LLM.
- The LLM generates a final comprehensive research report.
- The report is printed to the console and saved as
final_report.txt.
- If you see a PSReadline warning on Windows, you can safely ignore it.
- If you encounter:
RuntimeError: asyncio.run() cannot be called from a running event loop- Ensure nest_asyncio is applied at the start of the script:
import nest_asyncio nest_asyncio.apply()
- Ensure nest_asyncio is applied at the start of the script:
- Google API Errors: Verify that your Google API Key and Custom Search Engine ID are correct.
- Quota Limits: Check if you've exceeded your Google API quota.
- If the local LLM call fails:
- Ensure Ollama is installed correctly.
- Check that running:
Works without errors in your terminal.
ollama run deepseek-r1:7b
This project is licensed under the MIT License. See LICENSE for details.
We welcome contributions! Feel free to fork, submit issues, or contribute to the project.