Skip to content

Implement comprehensive Bedrock documentation scraper #245

@anchapin

Description

@anchapin

Summary

The Bedrock documentation scraper is missing several critical features including rate limiting, caching, and content type handling.

Location

ai-engine/utils/bedrock_docs_scraper.py (multiple TODOs)

Context

Multiple TODO items:

  • Line 23: Consider adding other documentation sources
  • Line 34: Implement actual rate limiting using asyncio.sleep
  • Line 36: Implement caching mechanism with TTL
  • Line 47: Implement robots.txt check before fetching
  • Line 48: Implement actual rate limiting
  • Line 54: Add response to cache
  • Line 129: Handle different content types (JSON, Markdown, etc.)

Requirements

  • Add proper rate limiting with asyncio.sleep
  • Implement response caching with TTL
  • Add robots.txt compliance checking
  • Handle multiple content types (HTML, JSON, Markdown)
  • Add additional documentation sources as specified
  • Improve error handling and retry logic

Priority

High - Critical for knowledge base functionality and documentation retrieval

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions