Skip to content

Content Generation Revamp#135

Open
rasulkireev wants to merge 3 commits intomainfrom
content-generation
Open

Content Generation Revamp#135
rasulkireev wants to merge 3 commits intomainfrom
content-generation

Conversation

@rasulkireev
Copy link
Owner

@rasulkireev rasulkireev commented Dec 27, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • Added comprehensive blog post content generation pipeline with automated outline and research workflows.
    • Introduced research functionality including question generation, link discovery, and content analysis.
    • Added research process interface for viewing generated content and research artifacts.
    • Integrated Exa search capability for discovering relevant research sources.
  • Documentation

    • Updated README with technical architecture overview and content generation workflow details.
    • Updated environment configuration to reflect new API requirements.
  • Chores

    • Added Exa API library dependency.
    • Enhanced database schema to support content generation tracking and research data storage.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 27, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

This PR introduces a complete end-to-end blog post content generation pipeline. It adds AI agents for outline and research question generation, a centralized orchestration pipeline with Django models for tracking generated content, async task handlers for queue-based execution, integration with Exa search API, and frontend views to visualize the research and generation process.

Changes

Cohort / File(s) Summary
AI Agent Modules
core/agents/blog_post_outline_agent.py, core/agents/research_link_summary_agent.py, core/agents/generate_blog_post_section_content_agent.py, core/agents/generate_blog_post_intro_conclusion_agent.py
Introduces four new factory functions to create specialized AI agents: outline generation (section titles), research link summarization (general/contextual/analysis), section content synthesis, and intro/conclusion generation. Each agent is configured with system prompts, retries, temperature, and context handlers for rich prompt composition.
Schemas & Data Models
core/agents/schemas.py, core/models.py, content_generation/models.py
Adds comprehensive Pydantic models for structured blog generation: TextSummary, ResearchLinkAnalysis, ResearchLinkContextualSummaryContext, BlogPostSectionContentGenerationContext, GeneratedBlogPostIntroConclusionSchema. Introduces Django ORM models for GeneratedBlogPost, GeneratedBlogPostSection, GeneratedBlogPostResearchQuestion, GeneratedBlogPostResearchLink, and Backlink tracking.
Pipeline & Task Orchestration
core/content_generator/pipeline.py, core/content_generator/tasks.py, core/content_generator/utils.py, core/content_generator/__init__.py
Core orchestration: pipeline.py implements full blog generation workflow (outline creation, research discovery via Exa, link scraping/analysis, section content synthesis, intro/conclusion generation, markdown assembly). tasks.py wraps pipeline functions as seven async Django-Q tasks. utils.py provides date range calculation for Exa queries.
Frontend Views & Routes
core/views.py, core/urls.py
Adds BlogPostResearchProcessView (DetailView) to fetch and display generated blog posts with sections, research questions, and links. Registers new URL route at /project/<int>/title-suggestion/<int>/research/. Computes internal/external links and keywords for context rendering.
Frontend Templates
frontend/templates/blog/blog_post_research_process.html, frontend/templates/blog/generated_blog_post_detail.html, frontend/templates/components/blog_post_suggestion_card.html
Adds comprehensive research process template with nested panels for title suggestion, derived inputs, and generated blog posts with collapsible sections/questions/links. Updates post detail and suggestion card templates to link to research process view.
Configuration & App Setup
tuxseo/settings.py, content_generation/apps.py, content_generation/admin.py, content_generation/views.py, content_generation/tests.py
Registers new content_generation Django app, adds EXA_API_KEY environment variable, and scaffolds app structure with placeholder files.
Dependencies & Documentation
pyproject.toml, requirements.txt, README.md
Adds exa-py dependency (^2.0.2) for Exa search integration. Updates all dependency versions in requirements.txt. Adds Technical Details section to README with Content Generation Pipeline diagram.
Utilities & Inspection
snippets/inspect_blog_post_title_suggestion.py
Adds interactive script to inspect BlogPostTitleSuggestion and traverse related generated blog posts, sections, research questions, and links with formatted console output.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Frontend as Frontend<br/>View
    participant Pipeline as Content Gen<br/>Pipeline
    participant Agents as AI Agents
    participant ExaAPI as Exa API
    participant DB as Database
    participant LinkScraper as Link Scraper
    participant Tasks as Django-Q<br/>Tasks

    User->>Frontend: Request blog generation
    Frontend->>Pipeline: init_blog_post_content_generation(title)
    activate Pipeline
    Pipeline->>Agents: create_blog_post_outline_agent()
    Agents-->>Pipeline: outline (section titles)
    Pipeline->>DB: Create GeneratedBlogPost + Sections
    Pipeline->>DB: Create GeneratedBlogPostIntroConcluding scaffolds
    Pipeline->>Tasks: queue_research_question_generation_for_sections()
    deactivate Pipeline

    Tasks->>Agents: create_blog_post_section_research_questions_agent()
    Agents-->>Tasks: research questions per section
    Tasks->>DB: Create GeneratedBlogPostResearchQuestion records
    Tasks->>Tasks: queue populate_research_links_task for each question

    par Research Link Discovery
        Tasks->>ExaAPI: Search with question + keywords
        ExaAPI-->>Tasks: Search results (URLs, titles, metadata)
        Tasks->>DB: Create GeneratedBlogPostResearchLink records
        Tasks->>Tasks: queue scrape_research_link_content_task
    and Section Content Synthesis (when ready)
        Tasks->>LinkScraper: Fetch + scrape HTML content for each link
        LinkScraper-->>DB: Update GeneratedBlogPostResearchLink.content
        Tasks->>Tasks: queue analyze_research_link_content_task
    end

    Tasks->>Agents: create_research_link_analysis_agent()
    Agents-->>Tasks: general_summary, contextual_summary, answer_to_question
    Tasks->>DB: Update GeneratedBlogPostResearchLink analyses

    opt When all questions answered
        Tasks->>Agents: create_generate_blog_post_section_content_agent()
        Agents->>DB: Fetch prior sections + research answers
        Agents-->>Tasks: Section content (formatted markdown)
        Tasks->>DB: Update GeneratedBlogPostSection.content
    end

    opt When all middle sections complete
        Tasks->>Agents: create_generate_blog_post_intro_conclusion_agent()
        Agents->>DB: Fetch all sections + outline
        Agents-->>Tasks: Intro + Conclusion content
        Tasks->>DB: Update GeneratedBlogPostSection (intro/conclusion)
    end

    Tasks->>Pipeline: populate_generated_blog_post_content()
    Pipeline->>DB: Assemble markdown from all sections
    Pipeline->>DB: Update GeneratedBlogPost.content (finalize)
    Tasks-->>Frontend: Workflow complete

    User->>Frontend: View research process
    Frontend->>DB: Fetch GeneratedBlogPost + all relations
    DB-->>Frontend: Blog post data with sections/questions/links
    Frontend-->>User: Render Research Process view
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • Fix blog post content mismatch #94 — Introduces shared system-prompt helpers and BlogPostGenerationContext that this PR's agents build upon for rich prompt composition and context handling.
  • Generate Post via Task #120 — Implements task-based blog post generation and task-status polling; this PR complements it with the full pipeline orchestration and task handler implementations.
  • Refactor Agents #116 — Refactors agents into factory modules and __init__ exports; this PR adds multiple new agent factory modules that integrate into that architecture.

Poem

🐰 A pipeline springs to life with agents bright,
Searching through Exa's results into the night,
Outlines bloom, research flows, sections align,
With intro and conclusion, the blog posts shine! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Content Generation Revamp' is vague and generic, using non-descriptive language that doesn't convey the specific nature of the changes implemented. Consider using a more specific title that highlights the primary change, such as 'Add AI-powered blog post content generation pipeline with research integration' or 'Implement end-to-end blog post generation with Exa research integration'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch content-generation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 27, 2025

Greptile Summary

Replaced GPTResearcher-based content generation with a new multi-stage pipeline that uses Exa for research, Jina Reader for content scraping, and pydantic-ai agents for summarization and outline generation.

Key changes:

  • Added new core/content_generator/ package with modular pipeline architecture
  • Replaced single GPTResearcher call with multi-step process: generate outline → create sections → generate research questions → fetch Exa links → scrape content → analyze with AI
  • Added 4 new database models: GeneratedBlogPostSection, GeneratedBlogPostResearchQuestion, GeneratedBlogPostResearchLink, and Backlink
  • Created specialized pydantic-ai agents for blog post outlines, research questions, and link summarization (general + contextual)
  • Integrated Exa API (replaced Tavily/OpenAI) for research link discovery
  • Implemented task-based async processing using django-q2 for each pipeline stage
  • Updated BlogPostTitleSuggestion.generate_content() to wrapper that calls new pipeline

Issues found:

  • Typo in Backlink model field name: linkning_to_project_page should be linking_to_project_page
  • Migration 0051 has redundant auto_now_add=True with default=django.utils.timezone.now
  • Date calculation uses 30-day approximation for months instead of accurate month arithmetic

Confidence Score: 3/5

  • This PR requires careful testing due to major architectural changes and minor issues that need correction
  • The PR introduces a significant architectural refactor replacing GPTResearcher with a custom multi-stage pipeline. While the code is well-structured and follows good patterns (modular design, proper use of async tasks, pydantic schemas), there are several concerns: (1) typo in model field name will cause runtime errors, (2) redundant migration syntax, (3) incomplete pipeline implementation (content generation stops after research phase), and (4) this is a breaking change from the old system that needs thorough testing of the entire flow
  • Pay close attention to core/models.py (typo in Backlink field), core/migrations/0051_*.py (redundant migration syntax), and verify the end-to-end pipeline works correctly since the old GPTResearcher flow is completely replaced

Important Files Changed

Filename Overview
core/agents/blog_post_outline_agent.py Added new pydantic-ai agents for blog post outline and section research question generation
core/content_generator/pipeline.py Implemented new multi-step content generation pipeline with Exa research, Jina scraping, and AI summarization; replaces GPTResearcher with modular approach
core/content_generator/tasks.py Added django-q2 task wrappers for pipeline functions with proper task chaining
core/content_generator/utils.py Added utility for generating Exa date range strings; approximates months as 30 days each
core/migrations/0051_rename_markdown_content_generatedblogpostresearchlink_content_and_more.py Renamed fields and added new columns to research link model; has redundant migration syntax with auto_now_add
core/models.py Replaced GPTResearcher integration with new content generation pipeline; added new models for sections, questions, and research links; has typo in Backlink model field name

Sequence Diagram

sequenceDiagram
    participant User
    participant API
    participant Pipeline as pipeline.py
    participant Task as django-q2 Tasks
    participant Agent as AI Agents
    participant Exa as Exa API
    participant Jina as Jina Reader
    participant DB as Database

    User->>API: Generate blog post from title suggestion
    API->>Pipeline: init_blog_post_content_generation()
    
    Pipeline->>Agent: create_blog_post_outline_agent()
    Agent-->>Pipeline: sections list (4-8 sections)
    
    Pipeline->>DB: Create GeneratedBlogPost + Sections
    
    loop For each research section
        Pipeline->>Task: Queue generate_research_questions_for_section_task
        Task->>Agent: create_blog_post_section_research_questions_agent()
        Agent-->>Task: 3-6 research questions
        Task->>DB: Create GeneratedBlogPostResearchQuestion records
        
        loop For each question
            Task->>Task: Queue populate_research_links_for_question_from_exa_task
            Task->>Exa: Search for research question
            Exa-->>Task: 2 research results
            Task->>DB: Create GeneratedBlogPostResearchLink records
            
            loop For each link
                Task->>Task: Queue scrape_research_link_content_task
                Task->>Jina: Fetch markdown content
                Jina-->>Task: Page content
                Task->>DB: Update research link with content
                
                Task->>Task: Queue analyze_research_link_content_task
                Task->>Agent: create_general_research_link_summary_agent()
                Agent-->>Task: General summary
                Task->>Agent: create_contextual_research_link_summary_agent()
                Agent-->>Task: Contextual summary
                Task->>DB: Update research link with summaries
            end
        end
    end
    
    Note over Pipeline,DB: Content generation continues in future updates
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 40 to 43
name='date_scraped',
field=models.DateTimeField(auto_now_add=True, default=django.utils.timezone.now),
preserve_default=False,
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: auto_now_add=True with default=django.utils.timezone.now is redundant - auto_now_add already automatically sets the timestamp on creation

Suggested change
name='date_scraped',
field=models.DateTimeField(auto_now_add=True, default=django.utils.timezone.now),
preserve_default=False,
),
field=models.DateTimeField(auto_now_add=True),
Prompt To Fix With AI
This is a comment left during a code review.
Path: core/migrations/0051_rename_markdown_content_generatedblogpostresearchlink_content_and_more.py
Line: 40:43

Comment:
**syntax:** `auto_now_add=True` with `default=django.utils.timezone.now` is redundant - `auto_now_add` already automatically sets the timestamp on creation

```suggestion
            field=models.DateTimeField(auto_now_add=True),
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +2015 to +2021
)
linking_from_blog_post = models.ForeignKey(
GeneratedBlogPost, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks"
)

def __str__(self):
return f"{self.linking_from_blog_post.title} -> {self.linked_to_project_page.url}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Typo in field name linkning_to_project_page - should be linking_to_project_page

Suggested change
)
linking_from_blog_post = models.ForeignKey(
GeneratedBlogPost, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks"
)
def __str__(self):
return f"{self.linking_from_blog_post.title} -> {self.linked_to_project_page.url}"
linking_to_project_page = models.ForeignKey(
ProjectPage, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks"
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: core/models.py
Line: 2015:2021

Comment:
**syntax:** Typo in field name `linkning_to_project_page` - should be `linking_to_project_page`

```suggestion
    linking_to_project_page = models.ForeignKey(
        ProjectPage, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks"
    )
```

How can I resolve this? If you propose a fix, please make it concise.

"""
current_datetime = timezone.now()
end_date_iso_format = current_datetime.date().isoformat()
start_date_iso_format = (current_datetime - timedelta(days=months_back * 30)).date().isoformat()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: The date calculation assumes 30 days per month, which may not match user expectations for date ranges (e.g., 6 months back could be off by several days)

Consider using relativedelta from dateutil for accurate month arithmetic:

from dateutil.relativedelta import relativedelta
start_date_iso_format = (current_datetime - relativedelta(months=months_back)).date().isoformat()
Prompt To Fix With AI
This is a comment left during a code review.
Path: core/content_generator/utils.py
Line: 14:14

Comment:
**style:** The date calculation assumes 30 days per month, which may not match user expectations for date ranges (e.g., 6 months back could be off by several days)

Consider using `relativedelta` from `dateutil` for accurate month arithmetic:
```python
from dateutil.relativedelta import relativedelta
start_date_iso_format = (current_datetime - relativedelta(months=months_back)).date().isoformat()
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
core/content_generator/utils.py (1)

8-15: Consider using more accurate month arithmetic.

The current implementation approximates each month as 30 days (line 14), which doesn't account for the actual varying lengths of months (28-31 days). For date range filters in search queries, this approximation may be acceptable, but consider using dateutil.relativedelta for more precise month subtraction if accuracy matters for your use case.

🔎 Alternative implementation with dateutil
 from __future__ import annotations
 
-from datetime import timedelta
+from dateutil.relativedelta import relativedelta
 
 from django.utils import timezone
 
 
 def get_exa_date_range_iso_strings(*, months_back: int) -> tuple[str, str]:
     """
     Exa expects date filters as strings (YYYY-MM-DD).
     """
     current_datetime = timezone.now()
     end_date_iso_format = current_datetime.date().isoformat()
-    start_date_iso_format = (current_datetime - timedelta(days=months_back * 30)).date().isoformat()
+    start_date_iso_format = (current_datetime - relativedelta(months=months_back)).date().isoformat()
     return start_date_iso_format, end_date_iso_format

Note: This would require adding python-dateutil as a dependency if it's not already included (though it's likely already present as a transitive dependency of other packages).

core/tasks.py (1)

1785-1794: Delegation pattern is appropriate, but consider renaming the alias for clarity.

The local import avoids circular dependencies and the delegation is clean. However, importing generate_research_questions_for_section_task as delegated_task uses a very generic alias. Consider a more descriptive name like _inner_task or just call the function directly without aliasing.

🔎 Optional: More explicit delegation
 def generate_research_questions_for_section_task(section_id: int):
     """
     Generate research questions for one blog post section, then queue Exa research link tasks for
     each created question.
     """
     from core.content_generator.tasks import (
-        generate_research_questions_for_section_task as delegated_task,
+        generate_research_questions_for_section_task as _content_generator_task,
     )
 
-    return delegated_task(section_id=section_id)
+    return _content_generator_task(section_id=section_id)
snippets/inspect_blog_post_title_suggestion.py (2)

19-29: Add error handling for invalid input in the developer tool.

While this is a developer tool, wrapping the input parsing in try-except would provide a better experience:

🔎 Proposed improvement
 blog_post_title_suggestion_id_raw = input("BlogPostTitleSuggestion id: ").strip()
-blog_post_title_suggestion_id = int(blog_post_title_suggestion_id_raw)
+try:
+    blog_post_title_suggestion_id = int(blog_post_title_suggestion_id_raw)
+except ValueError:
+    raise SystemExit(f"Invalid ID: {blog_post_title_suggestion_id_raw!r}")

78-78: Remove unused noqa directives.

Static analysis correctly identifies these # noqa: E501 comments as unused. Remove them to keep the code clean.

Also applies to: 127-127, 154-156

core/content_generator/tasks.py (1)

72-91: Task implementation is correct; remove unused noqa directive.

The task correctly generates research questions and queues downstream tasks for each. The # noqa: E501 on line 91 is flagged as unused and should be removed.

core/agents/research_link_summary_agent.py (1)

71-90: Contextual summary agent implementation is good.

The multiple system prompts provide rich context for the agent. Remove the unused noqa directives on lines 78-79.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1256411 and 11e071d.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (16)
  • README.md
  • core/agents/blog_post_outline_agent.py
  • core/agents/research_link_summary_agent.py
  • core/agents/schemas.py
  • core/content_generator/__init__.py
  • core/content_generator/pipeline.py
  • core/content_generator/tasks.py
  • core/content_generator/utils.py
  • core/migrations/0050_backlink_generatedblogpostresearchquestion_and_more.py
  • core/migrations/0051_rename_markdown_content_generatedblogpostresearchlink_content_and_more.py
  • core/models.py
  • core/tasks.py
  • pyproject.toml
  • requirements.txt
  • snippets/inspect_blog_post_title_suggestion.py
  • tuxseo/settings.py
🧰 Additional context used
📓 Path-based instructions (8)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Follow PEP 8 for Python code style
Use descriptive variable names with underscores (snake_case) in Python

Use try-except blocks for specific errors, avoid excepting Exception

**/*.py: Use Python's logging module (structlog) extensively
Include context in log messages with relevant fields (error, exc_info, resource IDs, etc.) to aid debugging and traceability

**/*.py: Follow PEP 8 style guide and prioritize readability and maintainability
Use descriptive variable and function names with underscores (snake_case)
Leverage Django's ORM and avoid raw SQL when possible
Use Pydantic models for schema validation in django-ninja APIs

**/*.py: Use descriptive, full-word variable names that clearly communicate purpose and context; avoid abbreviations and single-letter variables
Provide context in variable names, especially when format or type matters to the implementation (e.g., 'current_date_iso_format')
Extract unchanging values into constants using UPPER_CASE naming (e.g., MAX_LOGIN_ATTEMPTS, DEFAULT_TIMEOUT_MS)
Break down complex operations with descriptive intermediate variables instead of accessing array indices directly
Use 'is_', 'has_', 'can_' prefixes for boolean variables
Include 'date' in variable names that represent dates
Use snake_case for variables and functions in Python
Use PascalCase for class names in Python
Keep variable lifespan short by defining variables close to where they're used to reduce cognitive load
Name functions after what they do, not how they're used; ask 'Will I understand this without my current context?'
Avoid generic function/variable names like 'data', 'info', 'manager'; be specific about purpose (e.g., 'calculate_customer_lifetime_value')
Include necessary context in function names without being verbose (e.g., 'add_month_to_date' not 'add_to_date' or 'add_number_of_months_to_date')
If a function cannot be named clearly, split it into smaller, focused functions with better-defined responsibilities
Use the same verbs ...

Files:

  • core/agents/schemas.py
  • tuxseo/settings.py
  • snippets/inspect_blog_post_title_suggestion.py
  • core/tasks.py
  • core/content_generator/utils.py
  • core/content_generator/__init__.py
  • core/agents/research_link_summary_agent.py
  • core/content_generator/tasks.py
  • core/agents/blog_post_outline_agent.py
  • core/content_generator/pipeline.py
  • core/migrations/0050_backlink_generatedblogpostresearchquestion_and_more.py
  • core/migrations/0051_rename_markdown_content_generatedblogpostresearchlink_content_and_more.py
  • core/models.py
core/agents/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pydantic-ai for AI agent functionality with implementations in core/agents/

Files:

  • core/agents/schemas.py
  • core/agents/research_link_summary_agent.py
  • core/agents/blog_post_outline_agent.py
**/*.{js,ts,tsx,jsx,py,java,cs,php,rb,go,rs,swift,kt,scala,groovy}

📄 CodeRabbit inference engine (.cursor/rules/code-style.mdc)

Use double quotes instead of single quotes

Files:

  • core/agents/schemas.py
  • tuxseo/settings.py
  • snippets/inspect_blog_post_title_suggestion.py
  • core/tasks.py
  • core/content_generator/utils.py
  • core/content_generator/__init__.py
  • core/agents/research_link_summary_agent.py
  • core/content_generator/tasks.py
  • core/agents/blog_post_outline_agent.py
  • core/content_generator/pipeline.py
  • core/migrations/0050_backlink_generatedblogpostresearchquestion_and_more.py
  • core/migrations/0051_rename_markdown_content_generatedblogpostresearchlink_content_and_more.py
  • core/models.py
pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

Use Poetry for Python dependency management with pyproject.toml

Files:

  • pyproject.toml
core/tasks.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define background tasks using django-q2 in core/tasks.py

Files:

  • core/tasks.py
**/tasks.py

📄 CodeRabbit inference engine (.cursor/rules/backend.mdc)

Use django-q2 syntax for implementing background workers and tasks

Files:

  • core/tasks.py
  • core/content_generator/tasks.py
core/{views,models}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Apply fat models, skinny views pattern: keep business logic primarily in Django models and mixins, while views handle request/response only

Files:

  • core/models.py
core/models.py

📄 CodeRabbit inference engine (CLAUDE.md)

Validate simple constraints in database, complex logic in Django models

Files:

  • core/models.py
🧠 Learnings (6)
📚 Learning: 2025-11-28T10:30:00.003Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-28T10:30:00.003Z
Learning: Applies to core/tasks.py : Define background tasks using django-q2 in core/tasks.py

Applied to files:

  • tuxseo/settings.py
  • core/content_generator/tasks.py
  • core/models.py
📚 Learning: 2025-11-28T10:30:23.438Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: .cursor/rules/backend.mdc:0-0
Timestamp: 2025-11-28T10:30:23.438Z
Learning: Applies to **/tasks.py : Use django-q2 syntax for implementing background workers and tasks

Applied to files:

  • tuxseo/settings.py
  • core/content_generator/tasks.py
📚 Learning: 2025-11-28T10:30:00.003Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-28T10:30:00.003Z
Learning: Applies to pyproject.toml : Use Poetry for Python dependency management with pyproject.toml

Applied to files:

  • pyproject.toml
📚 Learning: 2025-11-28T10:31:29.426Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-28T10:31:29.426Z
Learning: Applies to **/*.py : Include necessary context in function names without being verbose (e.g., 'add_month_to_date' not 'add_to_date' or 'add_number_of_months_to_date')

Applied to files:

  • core/content_generator/utils.py
📚 Learning: 2025-11-28T10:30:04.521Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: .cursor/rules/agent-rules.mdc:0-0
Timestamp: 2025-11-28T10:30:04.521Z
Learning: Always add AGENTS.md into AI context

Applied to files:

  • core/agents/research_link_summary_agent.py
  • core/agents/blog_post_outline_agent.py
📚 Learning: 2025-11-28T10:30:00.003Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-28T10:30:00.003Z
Learning: Add AGENTS.md into AI context when working with the repository

Applied to files:

  • core/agents/blog_post_outline_agent.py
🧬 Code graph analysis (6)
core/agents/schemas.py (2)
core/base_models.py (1)
  • BaseModel (7-19)
core/models.py (1)
  • web_page_content (1332-1337)
core/tasks.py (1)
core/content_generator/tasks.py (1)
  • generate_research_questions_for_section_task (72-91)
core/agents/research_link_summary_agent.py (3)
core/agents/schemas.py (3)
  • ResearchLinkContextualSummaryContext (196-202)
  • TextSummary (17-18)
  • WebPageContent (11-14)
core/choices.py (1)
  • get_default_ai_model (140-142)
core/models.py (2)
  • web_page_content (1332-1337)
  • project_details (449-464)
core/content_generator/tasks.py (2)
core/content_generator/pipeline.py (2)
  • analyze_research_link_content (401-518)
  • generate_research_questions_for_section (521-598)
tuxseo/utils.py (1)
  • get_tuxseo_logger (4-10)
core/agents/blog_post_outline_agent.py (3)
core/agents/schemas.py (1)
  • BlogPostGenerationContext (186-193)
core/agents/system_prompts.py (2)
  • add_target_keywords (140-153)
  • add_title_details (116-129)
core/choices.py (1)
  • get_default_ai_model (140-142)
core/models.py (1)
core/content_generator/pipeline.py (1)
  • init_blog_post_content_generation (171-189)
🪛 Ruff (0.14.10)
snippets/inspect_blog_post_title_suggestion.py

29-29: Avoid specifying long messages outside the exception class

(TRY003)


71-71: Do not catch blind exception: Exception

(BLE001)


78-78: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


89-89: Do not catch blind exception: Exception

(BLE001)


127-127: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


154-154: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


155-155: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


156-156: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

core/agents/research_link_summary_agent.py

50-50: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


61-61: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


78-78: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


79-79: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

core/content_generator/tasks.py

91-91: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

core/agents/blog_post_outline_agent.py

24-24: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


79-79: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

core/content_generator/pipeline.py

62-62: Avoid specifying long messages outside the exception class

(TRY003)


65-65: Avoid specifying long messages outside the exception class

(TRY003)


145-145: Avoid specifying long messages outside the exception class

(TRY003)


206-206: Avoid specifying long messages outside the exception class

(TRY003)


210-210: Avoid specifying long messages outside the exception class

(TRY003)


321-321: Avoid specifying long messages outside the exception class

(TRY003)


335-335: Avoid specifying long messages outside the exception class

(TRY003)


421-421: Avoid specifying long messages outside the exception class

(TRY003)


426-426: Avoid specifying long messages outside the exception class

(TRY003)


472-472: Avoid specifying long messages outside the exception class

(TRY003)


494-494: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


537-537: Avoid specifying long messages outside the exception class

(TRY003)


551-551: Avoid specifying long messages outside the exception class

(TRY003)

core/migrations/0050_backlink_generatedblogpostresearchquestion_and_more.py

11-13: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


15-95: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

core/migrations/0051_rename_markdown_content_generatedblogpostresearchlink_content_and_more.py

9-11: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


13-44: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🔇 Additional comments (25)
core/agents/schemas.py (2)

17-19: LGTM! Well-defined schema for text summaries.

The TextSummary model is simple, focused, and includes a clear field description that will help with AI agent prompting.


196-203: LGTM! Comprehensive context model for research link summaries.

The ResearchLinkContextualSummaryContext model aggregates all necessary contextual information for generating research-linked blog post summaries. The field names are descriptive and follow the project's naming conventions.

core/content_generator/__init__.py (1)

1-7: LGTM! Clear package documentation.

The docstring provides a concise overview of the package structure and its modules.

tuxseo/settings.py (2)

552-552: LGTM! Appropriate shell import for development convenience.

Adding the Exa import to SHELL_PLUS_IMPORTS follows the existing pattern and will make the Exa client available in the Django shell for manual testing and debugging.


555-555: LGTM! Consistent API key configuration.

The EXA_API_KEY environment variable follows the same pattern as other API keys in the codebase, with an appropriate default empty string for optional configuration.

README.md (1)

43-43: LGTM! Documentation updated for new API key requirement.

The README correctly documents the new EXA_API_KEY environment variable as required for deployment, consistent with the integration of the Exa API introduced in this PR.

requirements.txt (1)

1-297: LGTM! Auto-generated dependency file.

This file is automatically generated by Poetry from pyproject.toml. The addition of exa-py==2.0.2 (line 67) and other dependency version updates are expected as part of the dependency resolution process.

core/migrations/0051_rename_markdown_content_generatedblogpostresearchlink_content_and_more.py (2)

9-11: Static analysis false positive - ignore Ruff hints.

The Ruff hints about ClassVar annotations are false positives. Django migration classes use class attributes in a specific way that doesn't require ClassVar annotations. These hints can be safely ignored.


7-44: Field removal is safe—the question field was redundant.

The question field on GeneratedBlogPostResearchLink (removed at lines 24-27) does not pose a data loss risk. The codebase never accessed this field directly; all question data is retrieved through the research_question ForeignKey relationship, which points to GeneratedBlogPostResearchQuestion where the actual question text is stored. This removal is a safe cleanup of redundant data that existed in parallel to the relationship.

The preserve_default=False on the date_scraped field alteration (line 42) is correct.

pyproject.toml (1)

45-45: exa-py version 2.0.2 exists and has no known vulnerabilities.

Version 2.0.2 of the exa-py package is available on PyPI (released December 19, 2025) and has no recorded CVE advisories or known security vulnerabilities according to PyPI, Safety DB, and the official GitHub repository.

core/content_generator/tasks.py (2)

1-14: LGTM! Clean task module structure.

The imports are well-organized, and the module follows the django-q2 patterns correctly as per learnings. Using a dedicated logger for this module provides good traceability.


37-54: Good task chaining pattern.

The conditional queuing of analyze_research_link_content_task only when content was successfully fetched prevents unnecessary work and follows a clean pipeline pattern.

core/agents/research_link_summary_agent.py (2)

12-31: Clean helper functions for formatting agent context.

The helper functions _add_webpage_content_from_web_page_content and _add_webpage_content_from_contextual_deps provide consistent formatting for the agent prompts.


54-68: General summary agent is well-configured.

The agent uses appropriate settings with retries and a reasonable temperature for summarization. The system prompt is clear and focused.

core/agents/blog_post_outline_agent.py (3)

17-27: Clean Pydantic models for agent output.

The schema definitions are clear and include helpful field descriptions. The Field descriptions document the expected output format.


44-60: Outline agent is well-implemented.

The agent reuses existing system prompt helpers for consistency. The temperature of 0.7 is appropriate for creative outline generation.


82-98: Research questions agent follows same pattern.

Good consistency with the outline agent. Both agents share the same system prompt helpers, ensuring consistent context.

core/models.py (3)

803-815: Clean backward-compatible wrapper for content generation.

The refactor preserves the existing API while delegating to the new pipeline. The docstring clearly explains the change.


1189-1218: New models for sections and research questions are well-structured.

The models have appropriate fields and relationships. The on_delete=CASCADE is correct for these child records.


1221-1253: GeneratedBlogPostResearchLink model is comprehensive.

Good separation between initial data, Jina augmentation, and AI augmentation fields. The VectorField for embeddings is correctly configured with 1024 dimensions.

core/content_generator/pipeline.py (4)

38-54: Well-defined constants and context creation.

The constants for section titles and the helper function for creating generation context are clean and reusable.


95-132: Correct use of transaction.atomic for data integrity.

The blog post and sections are created atomically, ensuring consistency. The logging provides good traceability.


241-277: Robust Exa result parsing with validation.

Good handling of both object and dict response formats. URL validation (length check, protocol check) prevents invalid data from being stored.


573-589: Bulk creation of research questions is efficient.

Using bulk_create for the questions is appropriate. The question text is correctly truncated to fit the model's max_length.

core/migrations/0050_backlink_generatedblogpostresearchquestion_and_more.py (1)

48-95: Migration structure is correct for the new models.

The models are created with appropriate fields and relationships. The subsequent migration (0051) correctly handles field renames and additions.

Comment on lines +323 to +331
url = (research_link.url or "").strip()
if not url.startswith(("http://", "https://")):
logger.info(
"[ContentGenerator] Skipping scrape/summarize for invalid research link url",
research_link_id=research_link.id,
url=url,
)
return 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Return type inconsistency: returns 0 instead of False.

The function signature indicates it returns bool, but line 330 returns 0 for invalid URLs. This should return False for consistency.

🔎 Proposed fix
     if not url.startswith(("http://", "https://")):
         logger.info(
             "[ContentGenerator] Skipping scrape/summarize for invalid research link url",
             research_link_id=research_link.id,
             url=url,
         )
-        return 0
+        return False
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
url = (research_link.url or "").strip()
if not url.startswith(("http://", "https://")):
logger.info(
"[ContentGenerator] Skipping scrape/summarize for invalid research link url",
research_link_id=research_link.id,
url=url,
)
return 0
url = (research_link.url or "").strip()
if not url.startswith(("http://", "https://")):
logger.info(
"[ContentGenerator] Skipping scrape/summarize for invalid research link url",
research_link_id=research_link.id,
url=url,
)
return False
🤖 Prompt for AI Agents
In core/content_generator/pipeline.py around lines 323 to 331, the function is
declared to return a bool but currently returns the integer 0 for invalid
research link URLs; change the return value from 0 to False to match the
declared return type and maintain consistency across callers and type checks.
Ensure any related tests or callers that may check truthiness still work, and
run linters/type-checkers after the change.

Comment on lines 16 to 32
migrations.CreateModel(
name='Backlink',
fields=[
('id', models.BigAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
('uuid', models.UUIDField(default=uuid.uuid4, editable=False)),
('created_at', models.DateTimeField(auto_now_add=True)),
('updated_at', models.DateTimeField(auto_now=True)),
('deleted_at', models.DateTimeField(blank=True, null=True)),
('linked_from_project_page', models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, related_name='backlinks_from', to='core.project')),
('linked_to_project_page', models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, related_name='backlinks_to', to='core.project')),
('linking_from_blog_post', models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, related_name='backlinks', to='core.generatedblogpost')),
('linkning_to_project_page', models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, related_name='backlinks', to='core.projectpage')),
],
options={
'abstract': False,
},
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Backlink migration has the same field name typo.

The field linkning_to_project_page (line 27) has the same typo as in core/models.py. If you fix the model, you'll need to add a RenameField operation in a new migration.

Note: The static analysis hints about ClassVar are false positives for Django migration files and can be safely ignored.

🤖 Prompt for AI Agents
In core/migrations/0050_backlink_generatedblogpostresearchquestion_and_more.py
around lines 16 to 32, the Backlink model migration contains a typo in the field
name "linkning_to_project_page"; fix the field name in the Django model to
"linking_to_project_page" and then create a new migration that uses
migrations.RenameField on the Backlink model to rename
"linkning_to_project_page" -> "linking_to_project_page" so the database column
is preserved and migrations remain consistent.

Comment on lines +2005 to +2021
class Backlink(BaseModel):
linked_to_project_page = models.ForeignKey(
Project, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks_to"
)
linkning_to_project_page = models.ForeignKey(
ProjectPage, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks"
)

linked_from_project_page = models.ForeignKey(
Project, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks_from"
)
linking_from_blog_post = models.ForeignKey(
GeneratedBlogPost, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks"
)

def __str__(self):
return f"{self.linking_from_blog_post.title} -> {self.linked_to_project_page.url}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Typo in field name and potential AttributeError in __str__.

  1. Field name linkning_to_project_page has a typo - should be linking_to_project_page.
  2. The __str__ method will raise AttributeError if linking_from_blog_post or linked_to_project_page is None (both are nullable FKs).
🔎 Proposed fixes
 class Backlink(BaseModel):
     linked_to_project_page = models.ForeignKey(
         Project, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks_to"
     )
-    linkning_to_project_page = models.ForeignKey(
+    linking_to_project_page = models.ForeignKey(
         ProjectPage, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks"
     )
 
     linked_from_project_page = models.ForeignKey(
         Project, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks_from"
     )
     linking_from_blog_post = models.ForeignKey(
         GeneratedBlogPost, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks"
     )
 
     def __str__(self):
-        return f"{self.linking_from_blog_post.title} -> {self.linked_to_project_page.url}"
+        from_title = self.linking_from_blog_post.title if self.linking_from_blog_post else "Unknown"
+        to_url = self.linked_to_project_page.url if self.linked_to_project_page else "Unknown"
+        return f"{from_title} -> {to_url}"

Note: Fixing the field name typo requires a new migration. Consider whether you want to fix it now or defer.

🤖 Prompt for AI Agents
In core/models.py around lines 2005 to 2021, there's a typo in the field name
`linkning_to_project_page` (should be `linking_to_project_page`) and the __str__
method can raise AttributeError because nullable FKs `linking_from_blog_post` or
`linked_to_project_page` may be None; rename the field to
`linking_to_project_page` (updated related_name if needed) and create a Django
migration to apply the rename (or add a db_column/AlterField migration if
preserving existing DB name is required), and update __str__ to safely handle
None by returning a fallback string (e.g., use getattr(..., "title",
"<no-post>") and getattr(..., "url", "<no-url>")).

Copy link

@code-review-doctor code-review-doctor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some food for thought. View full project report here.

)
)

if settings.DEBUG:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using settings.DEBUG here makes it harder to unit test or manually test this code in isolation because the value of settings.DEBUG affects other Django behaviors such as showing technical error log when uncaught exceptions occur, sending admin emails, recording every SQL query executed, and many more.

Instead, consider adding a feature flag specifically for this rather than using settings.DEBUG. More details.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/models.py (1)

1033-1034: Validate URL scheme before opening with urlopen.

Same security concern as in content_generation/models.py. The urlopen call should validate the URL scheme before opening.

🔎 Proposed fix
+            from urllib.parse import urlparse
+
+            parsed_url = urlparse(image_url)
+            if parsed_url.scheme not in ("http", "https"):
+                logger.error(
+                    "[GenerateOGImage] Invalid URL scheme from Replicate",
+                    blog_post_id=self.id,
+                    project_id=self.project_id,
+                    image_url=image_url,
+                    scheme=parsed_url.scheme,
+                )
+                return False, f"Invalid URL scheme: {parsed_url.scheme}"
+
             image_response = urlopen(image_url)
♻️ Duplicate comments (2)
core/models.py (1)

2006-2022: Typo and potential AttributeError in Backlink model.

This issue was flagged in past review comments:

  1. Field name linkning_to_project_page has a typo - should be linking_to_project_page
  2. The __str__ method will raise AttributeError if nullable FKs are None
core/content_generator/pipeline.py (1)

343-350: Return type inconsistency: returns 0 instead of False.

This was flagged in a past review. The function signature indicates -> bool but line 350 returns 0 for invalid URLs.

🧹 Nitpick comments (10)
frontend/templates/blog/generated_blog_post_detail.html (1)

34-46: LGTM!

The conditional rendering correctly guards the button display. Consider using generated_post.title_suggestion_id directly in the URL for consistency with the condition check, though the current approach works correctly given Django's FK integrity.

href="{% url 'blog_post_research_process' project_pk=generated_post.project.id pk=generated_post.title_suggestion_id %}"
frontend/templates/blog/blog_post_research_process.html (1)

204-223: Consider adding accessible labels for interactive summary elements.

The <summary> elements in the <details> sections function as buttons. For better screen reader support, consider adding role="button" or ensuring the summary text clearly indicates the expandable nature.

However, this is a minor accessibility enhancement - the current implementation using semantic <details>/<summary> elements already provides good baseline accessibility per coding guidelines.

core/views.py (1)

1004-1082: Consider extracting the repeated sorting pattern.

The _build_generated_blog_posts_data method works correctly, but there's repeated logic for building question data with sorted links. Consider extracting a helper:

🔎 Optional refactor for reduced duplication
def _build_question_data(self, question):
    """Build question dict with sorted research links."""
    research_links = sorted(
        list(question.research_links.all()),
        key=lambda link: link.id,
    )
    return {
        "id": question.id,
        "question": question.question,
        "links": research_links,
    }

Then use it in both section questions and blog-level questions loops.

core/agents/research_link_summary_agent.py (2)

51-51: Remove unused noqa directive.

Static analysis indicates this noqa: E501 directive is unnecessary as the line length rule is not enabled.

🔎 Proposed fix
-        "You must tailor the summary to help the writer answer the research question for that section.\n"  # noqa: E501
+        "You must tailor the summary to help the writer answer the research question for that section.\n"

60-66: Remove unused noqa directives.

The noqa: E501 directives on lines 62, 79, and 80 are flagged as unused by static analysis.

🔎 Proposed fix
         system_prompt=(
             "You are an expert content summarizer. Summarize the web page content provided.\n"
-            "Return a concise 2-3 sentence summary that captures the main purpose and key information.\n"  # noqa: E501
+            "Return a concise 2-3 sentence summary that captures the main purpose and key information.\n"
             "Focus on what the page is about and its main value proposition.\n"
         ),
core/content_generator/tasks.py (1)

130-149: Remove unused noqa directive on line 149.

Static analysis indicates the noqa: E501 directive is unnecessary.

🔎 Proposed fix
-    return f"Generated {len(created_research_question_ids)} research questions for section {section_id}"  # noqa: E501
+    return f"Generated {len(created_research_question_ids)} research questions for section {section_id}"
content_generation/models.py (2)

240-248: Avoid catching bare Exception.

Per coding guidelines, use try-except blocks for specific errors. The broad Exception catch here obscures the root cause. Consider catching more specific exceptions or at minimum re-raising after logging.

🔎 Proposed fix
-        except Exception as error:
+        except (OSError, ValueError, TypeError) as error:
             logger.error(
                 "[GenerateOGImage] Unexpected error during image generation",
                 error=str(error),

Alternatively, if you need to catch all exceptions to prevent task failures, consider adding a comment explaining why.


79-79: Remove unused noqa directives.

Static analysis indicates the noqa: E501 directives on lines 79, 261, and 334 are unnecessary.

core/content_generator/pipeline.py (2)

615-616: Consider parameterizing the DEBUG limit.

The LOCAL_MAX_RESEARCH_QUESTIONS_PER_SECTION is used to limit questions in DEBUG mode. While useful for development, consider making this configurable via settings rather than a module constant for more flexibility.


407-407: Consider simplifying list(dict.fromkeys(update_fields)).

While this works to deduplicate the list, since you're appending distinct field names sequentially, duplicates shouldn't occur. You could use a plain list or, if deduplication is truly needed for safety, consider using a set earlier.

🔎 Proposed simplification
-    research_link.save(update_fields=list(dict.fromkeys(update_fields)))
+    research_link.save(update_fields=update_fields)

The dict.fromkeys pattern preserves order while deduplicating, but since you're appending unique field names, this is unnecessary complexity.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11e071d and d01bb1c.

📒 Files selected for processing (21)
  • README.md
  • content_generation/__init__.py
  • content_generation/admin.py
  • content_generation/apps.py
  • content_generation/migrations/__init__.py
  • content_generation/models.py
  • content_generation/tests.py
  • content_generation/views.py
  • core/agents/generate_blog_post_intro_conclusion_agent.py
  • core/agents/generate_blog_post_section_content_agent.py
  • core/agents/research_link_summary_agent.py
  • core/agents/schemas.py
  • core/content_generator/pipeline.py
  • core/content_generator/tasks.py
  • core/models.py
  • core/urls.py
  • core/views.py
  • frontend/templates/blog/blog_post_research_process.html
  • frontend/templates/blog/generated_blog_post_detail.html
  • frontend/templates/components/blog_post_suggestion_card.html
  • tuxseo/settings.py
✅ Files skipped from review due to trivial changes (3)
  • content_generation/views.py
  • content_generation/admin.py
  • content_generation/tests.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • README.md
  • tuxseo/settings.py
🧰 Additional context used
📓 Path-based instructions (12)
frontend/templates/**/*.html

📄 CodeRabbit inference engine (CLAUDE.md)

frontend/templates/**/*.html: Use semantic HTML elements (dialog, details/summary, etc.) in Django templates
Use Stimulus controllers for interactive frontend behavior, connecting via data attributes

Files:

  • frontend/templates/components/blog_post_suggestion_card.html
  • frontend/templates/blog/generated_blog_post_detail.html
  • frontend/templates/blog/blog_post_research_process.html
**/*.html

📄 CodeRabbit inference engine (.cursor/rules/backend.mdc)

Use Django templates for HTML rendering

Files:

  • frontend/templates/components/blog_post_suggestion_card.html
  • frontend/templates/blog/generated_blog_post_detail.html
  • frontend/templates/blog/blog_post_research_process.html
**/*.{js,html}

📄 CodeRabbit inference engine (.cursor/rules/frontend.mdc)

**/*.{js,html}: Prefer Stimulus JS for adding interactivity to Django templates instead of raw script elements
Leverage Stimulus data attributes to connect HTML elements with JavaScript functionality
Employ Stimulus actions to handle user interactions and events

Files:

  • frontend/templates/components/blog_post_suggestion_card.html
  • frontend/templates/blog/generated_blog_post_detail.html
  • frontend/templates/blog/blog_post_research_process.html
**/*.{html,erb,stimulus.{js,ts}}

📄 CodeRabbit inference engine (.cursor/rules/ui-ux-design-guidelines.mdc)

**/*.{html,erb,stimulus.{js,ts}}: Always generate semantic HTML when writing HTML, CSS, or styles in Stimulus controllers
Always favor the 'utility first' Tailwind approach when using TailwindCSS v3.x; reusable style classes should not be created often; code should be reused primarily through template components

Files:

  • frontend/templates/components/blog_post_suggestion_card.html
  • frontend/templates/blog/generated_blog_post_detail.html
  • frontend/templates/blog/blog_post_research_process.html
**/*.{css,scss,html,erb,stimulus.{js,ts}}

📄 CodeRabbit inference engine (.cursor/rules/ui-ux-design-guidelines.mdc)

Never create new styles without explicitly receiving permission to do so in HTML, CSS, or Stimulus controllers that use D3.js

Files:

  • frontend/templates/components/blog_post_suggestion_card.html
  • frontend/templates/blog/generated_blog_post_detail.html
  • frontend/templates/blog/blog_post_research_process.html
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Follow PEP 8 for Python code style
Use descriptive variable names with underscores (snake_case) in Python

Use try-except blocks for specific errors, avoid excepting Exception

**/*.py: Use Python's logging module (structlog) extensively
Include context in log messages with relevant fields (error, exc_info, resource IDs, etc.) to aid debugging and traceability

**/*.py: Follow PEP 8 style guide and prioritize readability and maintainability
Use descriptive variable and function names with underscores (snake_case)
Leverage Django's ORM and avoid raw SQL when possible
Use Pydantic models for schema validation in django-ninja APIs

**/*.py: Use descriptive, full-word variable names that clearly communicate purpose and context; avoid abbreviations and single-letter variables
Provide context in variable names, especially when format or type matters to the implementation (e.g., 'current_date_iso_format')
Extract unchanging values into constants using UPPER_CASE naming (e.g., MAX_LOGIN_ATTEMPTS, DEFAULT_TIMEOUT_MS)
Break down complex operations with descriptive intermediate variables instead of accessing array indices directly
Use 'is_', 'has_', 'can_' prefixes for boolean variables
Include 'date' in variable names that represent dates
Use snake_case for variables and functions in Python
Use PascalCase for class names in Python
Keep variable lifespan short by defining variables close to where they're used to reduce cognitive load
Name functions after what they do, not how they're used; ask 'Will I understand this without my current context?'
Avoid generic function/variable names like 'data', 'info', 'manager'; be specific about purpose (e.g., 'calculate_customer_lifetime_value')
Include necessary context in function names without being verbose (e.g., 'add_month_to_date' not 'add_to_date' or 'add_number_of_months_to_date')
If a function cannot be named clearly, split it into smaller, focused functions with better-defined responsibilities
Use the same verbs ...

Files:

  • core/agents/generate_blog_post_section_content_agent.py
  • core/agents/generate_blog_post_intro_conclusion_agent.py
  • core/content_generator/tasks.py
  • core/content_generator/pipeline.py
  • content_generation/models.py
  • core/views.py
  • content_generation/apps.py
  • core/models.py
  • core/agents/schemas.py
  • core/agents/research_link_summary_agent.py
  • core/urls.py
core/agents/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pydantic-ai for AI agent functionality with implementations in core/agents/

Files:

  • core/agents/generate_blog_post_section_content_agent.py
  • core/agents/generate_blog_post_intro_conclusion_agent.py
  • core/agents/schemas.py
  • core/agents/research_link_summary_agent.py
**/*.{js,ts,tsx,jsx,py,java,cs,php,rb,go,rs,swift,kt,scala,groovy}

📄 CodeRabbit inference engine (.cursor/rules/code-style.mdc)

Use double quotes instead of single quotes

Files:

  • core/agents/generate_blog_post_section_content_agent.py
  • core/agents/generate_blog_post_intro_conclusion_agent.py
  • core/content_generator/tasks.py
  • core/content_generator/pipeline.py
  • content_generation/models.py
  • core/views.py
  • content_generation/apps.py
  • core/models.py
  • core/agents/schemas.py
  • core/agents/research_link_summary_agent.py
  • core/urls.py
**/tasks.py

📄 CodeRabbit inference engine (.cursor/rules/backend.mdc)

Use django-q2 syntax for implementing background workers and tasks

Files:

  • core/content_generator/tasks.py
core/{views,models}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Apply fat models, skinny views pattern: keep business logic primarily in Django models and mixins, while views handle request/response only

Files:

  • core/views.py
  • core/models.py
**/views.py

📄 CodeRabbit inference engine (.cursor/rules/backend.mdc)

Use Class-Based Views (CBVs) for complex views and Function-Based Views (FBVs) for simpler logic

Files:

  • core/views.py
core/models.py

📄 CodeRabbit inference engine (CLAUDE.md)

Validate simple constraints in database, complex logic in Django models

Files:

  • core/models.py
🧠 Learnings (6)
📚 Learning: 2025-11-28T10:30:04.521Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: .cursor/rules/agent-rules.mdc:0-0
Timestamp: 2025-11-28T10:30:04.521Z
Learning: Always add AGENTS.md into AI context

Applied to files:

  • core/agents/generate_blog_post_section_content_agent.py
  • core/agents/generate_blog_post_intro_conclusion_agent.py
  • core/agents/research_link_summary_agent.py
📚 Learning: 2025-11-28T10:30:00.003Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-28T10:30:00.003Z
Learning: Applies to core/agents/**/*.py : Use pydantic-ai for AI agent functionality with implementations in core/agents/

Applied to files:

  • core/agents/generate_blog_post_section_content_agent.py
📚 Learning: 2025-11-28T10:30:00.003Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-28T10:30:00.003Z
Learning: Add AGENTS.md into AI context when working with the repository

Applied to files:

  • core/agents/generate_blog_post_section_content_agent.py
  • core/agents/generate_blog_post_intro_conclusion_agent.py
📚 Learning: 2025-11-28T10:30:23.438Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: .cursor/rules/backend.mdc:0-0
Timestamp: 2025-11-28T10:30:23.438Z
Learning: Applies to **/tasks.py : Use django-q2 syntax for implementing background workers and tasks

Applied to files:

  • core/content_generator/tasks.py
📚 Learning: 2025-11-28T10:30:00.003Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-28T10:30:00.003Z
Learning: Applies to core/tasks.py : Define background tasks using django-q2 in core/tasks.py

Applied to files:

  • core/content_generator/tasks.py
  • core/models.py
📚 Learning: 2025-11-28T10:30:00.003Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-28T10:30:00.003Z
Learning: Applies to core/schemas.py : Use Pydantic schemas in core/schemas.py for structured AI outputs

Applied to files:

  • core/agents/schemas.py
🧬 Code graph analysis (7)
core/agents/generate_blog_post_section_content_agent.py (3)
core/agents/schemas.py (2)
  • BlogPostSectionContentGenerationContext (247-270)
  • GeneratedBlogPostSectionContentSchema (273-278)
core/choices.py (2)
  • ContentType (4-6)
  • get_default_ai_model (140-142)
core/models.py (1)
  • project_details (449-464)
core/agents/generate_blog_post_intro_conclusion_agent.py (3)
core/agents/schemas.py (2)
  • BlogPostIntroConclusionGenerationContext (281-291)
  • GeneratedBlogPostIntroConclusionSchema (294-300)
core/choices.py (2)
  • ContentType (4-6)
  • get_default_ai_model (140-142)
core/models.py (1)
  • project_details (449-464)
core/content_generator/pipeline.py (8)
core/agents/blog_post_outline_agent.py (1)
  • create_blog_post_outline_agent (44-60)
core/agents/generate_blog_post_intro_conclusion_agent.py (1)
  • create_generate_blog_post_intro_conclusion_agent (26-100)
core/agents/generate_blog_post_section_content_agent.py (1)
  • create_generate_blog_post_section_content_agent (27-123)
core/agents/research_link_summary_agent.py (1)
  • create_research_link_analysis_agent (94-123)
core/agents/schemas.py (8)
  • BlogPostGenerationContext (207-214)
  • BlogPostIntroConclusionGenerationContext (281-291)
  • BlogPostSectionContentGenerationContext (247-270)
  • GeneratedBlogPostIntroConclusionSchema (294-300)
  • GeneratedBlogPostSectionContentSchema (273-278)
  • PriorSectionContext (242-244)
  • ResearchLinkContextualSummaryContext (217-223)
  • ResearchQuestionWithAnsweredLinks (234-239)
core/content_generator/utils.py (1)
  • get_exa_date_range_iso_strings (8-15)
core/models.py (5)
  • GeneratedBlogPost (848-1186)
  • GeneratedBlogPostResearchLink (1221-1253)
  • GeneratedBlogPostResearchQuestion (1203-1218)
  • GeneratedBlogPostSection (1189-1200)
  • project_details (449-464)
core/utils.py (2)
  • get_markdown_content (219-251)
  • run_agent_synchronously (254-309)
content_generation/models.py (6)
core/agents/insert_links_agent.py (1)
  • create_insert_links_agent (7-127)
core/agents/schemas.py (3)
  • GeneratedBlogPostSchema (303-313)
  • LinkInsertionContext (391-397)
  • ProjectPageContext (196-204)
core/base_models.py (1)
  • BaseModel (7-19)
core/choices.py (1)
  • OGImageStyle (87-95)
core/models.py (11)
  • AutoSubmissionSetting (818-845)
  • BlogPostTitleSuggestion (725-815)
  • Project (382-722)
  • GeneratedBlogPost (848-1186)
  • blog_post_structure_rules (884-896)
  • generated_blog_post_schema (899-905)
  • submit_blog_post_to_endpoint (907-956)
  • generate_og_image (958-1065)
  • save (1305-1308)
  • save (1899-1926)
  • insert_links_into_post (1067-1186)
core/utils.py (3)
  • get_og_image_prompt (69-96)
  • get_relevant_external_pages_for_blog_post (599-684)
  • get_relevant_pages_for_blog_post (517-596)
core/views.py (2)
core/models.py (5)
  • BlogPostTitleSuggestion (725-815)
  • generated_blog_posts (483-484)
  • get_internal_links (783-791)
  • get_keywords (702-722)
  • get_blog_post_keywords (793-801)
core/utils.py (1)
  • get_relevant_external_pages_for_blog_post (599-684)
core/agents/schemas.py (2)
core/base_models.py (1)
  • BaseModel (7-19)
core/models.py (1)
  • web_page_content (1333-1338)
core/urls.py (1)
core/views.py (1)
  • BlogPostResearchProcessView (983-1187)
🪛 Ruff (0.14.10)
core/content_generator/tasks.py

149-149: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

core/content_generator/pipeline.py

77-77: Avoid specifying long messages outside the exception class

(TRY003)


80-80: Avoid specifying long messages outside the exception class

(TRY003)


160-160: Avoid specifying long messages outside the exception class

(TRY003)


221-221: Avoid specifying long messages outside the exception class

(TRY003)


225-225: Avoid specifying long messages outside the exception class

(TRY003)


341-341: Avoid specifying long messages outside the exception class

(TRY003)


355-355: Avoid specifying long messages outside the exception class

(TRY003)


442-442: Avoid specifying long messages outside the exception class

(TRY003)


447-447: Avoid specifying long messages outside the exception class

(TRY003)


489-489: Avoid specifying long messages outside the exception class

(TRY003)


566-566: Avoid specifying long messages outside the exception class

(TRY003)


580-580: Avoid specifying long messages outside the exception class

(TRY003)


721-721: Avoid specifying long messages outside the exception class

(TRY003)


725-725: Avoid specifying long messages outside the exception class

(TRY003)


1008-1008: Avoid specifying long messages outside the exception class

(TRY003)


1012-1012: Avoid specifying long messages outside the exception class

(TRY003)


1044-1044: Avoid specifying long messages outside the exception class

(TRY003)


1199-1199: Avoid specifying long messages outside the exception class

(TRY003)

content_generation/models.py

79-79: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


129-129: Consider moving this statement to an else block

(TRY300)


216-216: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


229-229: Consider moving this statement to an else block

(TRY300)


239-239: Use explicit conversion flag

Replace with conversion flag

(RUF010)


248-248: Use explicit conversion flag

Replace with conversion flag

(RUF010)


261-261: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


334-334: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

core/agents/research_link_summary_agent.py

51-51: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


62-62: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


79-79: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


80-80: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

🔇 Additional comments (30)
frontend/templates/components/blog_post_suggestion_card.html (1)

167-177: LGTM!

The new Research link is well-integrated with consistent styling, appropriate icon, and correct URL routing using named parameters that match the new blog_post_research_process URL pattern. The flex container properly groups the Research and Archive actions together.

content_generation/apps.py (1)

1-6: LGTM!

Standard Django AppConfig following best practices with BigAutoField for scalable primary keys.

core/urls.py (1)

59-63: LGTM!

The new URL pattern follows the established conventions for project-scoped routes and uses consistent parameter naming that aligns with the view's kwargs expectations.

core/agents/schemas.py (2)

17-39: LGTM!

Well-structured Pydantic schemas with clear Field descriptions. The ResearchLinkAnalysis schema properly captures the multi-faceted analysis output needed for research link processing.


217-300: LGTM!

The new context and output schemas are well-designed for the content generation pipeline:

  • ResearchLinkContextualSummaryContext provides complete context for link analysis
  • ResearchQuestionWithAnsweredLinks correctly filters to only include answered research
  • Section and intro/conclusion schemas properly separate generation context from output
  • Appropriate use of default_factory=list for optional collections

Based on learnings, these schemas align with the project's pattern of using Pydantic schemas in core/agents/schemas.py for structured AI outputs.

core/agents/generate_blog_post_intro_conclusion_agent.py (2)

26-43: LGTM!

The agent factory function follows the pydantic-ai pattern correctly:

  • Proper output and deps types
  • Combined system prompts for flexibility
  • Reasonable model settings (retries=2 for resilience, temperature=0.7 for creative content)

45-98: LGTM!

The context builder comprehensively assembles the prompt with:

  • Current date for temporal awareness
  • Full project and title suggestion context
  • Properly handles empty/None cases for sections with fallback text
  • Clear structure for the AI to understand the content requirements

As per coding guidelines, this follows the pattern of using pydantic-ai for AI agent functionality in core/agents/.

frontend/templates/blog/blog_post_research_process.html (2)

1-51: LGTM!

Good template structure with:

  • Proper breadcrumb navigation
  • Conditional button logic for compute links toggle based on API key availability
  • Clear user messaging when Jina API key is not configured

136-161: LGTM!

Proper semantic table structure with <thead>, <tbody>, and appropriate column headers. External links open in new tabs with correct security attributes (rel="noopener noreferrer").

core/views.py (4)

30-31: LGTM!

Appropriate imports for the new view functionality.

Also applies to: 42-42


983-1002: LGTM!

The view correctly:

  • Uses LoginRequiredMixin for authentication
  • Filters queryset by user's profile for authorization
  • Optimizes queries with select_related and prefetch_related to avoid N+1 issues

1143-1176: Exception handling catches specific exceptions appropriately.

The try-except blocks follow coding guidelines by catching specific exceptions (AttributeError, TypeError, ValueError) rather than bare Exception. The logging includes relevant context fields (title_suggestion_id, project_id, exc_info=True) which aids debugging.

One consideration: if get_blog_post_keywords(), _get_internal_links(), or _get_external_links() raise unexpected exceptions (e.g., network errors for external API calls), they won't be caught. This may be intentional to surface unexpected failures rather than silently degrading. Verify this is the desired behavior.


1178-1187: LGTM!

Context data is properly assembled with defensive defaults (or [] for links). The view exposes appropriate flags for template conditional rendering (jina_api_key_configured, should_compute_links, has_pro_subscription).

core/agents/generate_blog_post_section_content_agent.py (3)

1-10: LGTM!

Imports are well-organized and correctly reference the necessary schemas, choices, and prompts modules. The use of timezone from Django for date formatting is appropriate.


27-44: LGTM!

The agent factory function follows the established pydantic-ai pattern. Good use of:

  • Default fallback to get_default_ai_model()
  • Appropriate retries=2 for resilience
  • Reasonable temperature=0.7 for creative content generation
  • High max_tokens=16000 suitable for section content

46-121: Well-structured dynamic system prompt.

The context assembly is thorough and provides the AI with comprehensive information for coherent section generation. The handling of empty lists with fallback text "- (none)" and "\n(none)\n" ensures graceful degradation.

core/agents/research_link_summary_agent.py (2)

55-91: LGTM!

Both create_general_research_link_summary_agent and create_contextual_research_link_summary_agent are well-structured with appropriate configurations. The contextual agent correctly attaches two system prompts for research context and webpage content.


94-123: LGTM!

The create_research_link_analysis_agent consolidates three outputs (general summary, contextual summary, answer to question) into a single model call, which is efficient. The docstring clearly documents the expected outputs.

core/content_generator/tasks.py (2)

22-45: LGTM!

The task correctly overrides num_results_per_question in DEBUG mode for faster local development. Good structured logging with relevant context fields.


48-69: Good resilience pattern for scraping.

The docstring clearly explains the rationale for always queuing the analysis task regardless of scrape outcome—this prevents the pipeline from stalling on bad links. The task chaining with async_task follows django-q2 patterns correctly.

core/models.py (4)

803-815: Clean delegation to the content generation pipeline.

The generate_content method now serves as a backward-compatible wrapper that delegates to the centralized pipeline. This is a good pattern for gradual migration while preserving existing call sites.


1189-1200: LGTM!

GeneratedBlogPostSection model is well-structured with appropriate fields and foreign key relationships. The blank=True, default="" on content allows sections to be created before content is generated.


1203-1218: LGTM!

GeneratedBlogPostResearchQuestion correctly links to both GeneratedBlogPost and GeneratedBlogPostSection, enabling section-level research tracking.


1221-1254: LGTM!

GeneratedBlogPostResearchLink has comprehensive fields for tracking the full lifecycle: initial data from Exa, Jina scraping augmentation, and AI analysis results. The VectorField for embedding enables semantic search capabilities.

core/content_generator/pipeline.py (6)

1-10: LGTM!

Imports are well-organized, covering Django utilities, external APIs (Exa), and internal agents/schemas. The use of __future__ annotations enables forward references.


51-57: Good use of constants for pipeline configuration.

The constants for section titles, character limits, and retry settings are appropriately defined at module level, making them easy to tune and test.


110-147: Good transactional integrity for blog post creation.

The use of transaction.atomic() ensures that both GeneratedBlogPost and all GeneratedBlogPostSection records are created atomically. This prevents orphaned sections if creation fails midway.


256-303: Good validation and error handling for Exa results.

The code properly handles both object and dictionary response formats from Exa, validates URL schemes and lengths, and correctly parses/localizes datetime values. The conditional queuing of scrape tasks only for links without content is efficient.


833-896: Well-designed retry mechanism with bounded retries.

The retry logic for section synthesis:

  • Uses cache to track retry counts (preventing unbounded retries across restarts)
  • Has configurable max retries (different for DEBUG vs production)
  • Only triggers when link processing is complete but sections still missing
  • Logs warnings when max retries reached

This is a robust pattern for handling transient failures.


1162-1182: Clean markdown assembly logic.

The _build_full_blog_post_markdown function correctly assembles the final markdown with proper heading levels (H1 for title, H2 for sections) and handles empty sections gracefully.

Comment on lines +216 to +220
image_response = urlopen(image_url)
image_content = ContentFile(image_response.read())

filename = f"og-image-{self.id}.png"
self.image.save(filename, image_content, save=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate URL scheme before opening with urlopen.

The urlopen call is flagged by static analysis (S310) because it can open arbitrary URL schemes including file://. Since image_url comes from an external API response, validate that it uses only allowed schemes.

🔎 Proposed fix
+            from urllib.parse import urlparse
+
+            parsed_url = urlparse(image_url)
+            if parsed_url.scheme not in ("http", "https"):
+                logger.error(
+                    "[GenerateOGImage] Invalid URL scheme from Replicate",
+                    blog_post_id=self.id,
+                    project_id=self.project_id,
+                    image_url=image_url,
+                    scheme=parsed_url.scheme,
+                )
+                return False, f"Invalid URL scheme: {parsed_url.scheme}"
+
             image_response = urlopen(image_url)
             image_content = ContentFile(image_response.read())

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.10)

216-216: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

🤖 Prompt for AI Agents
In content_generation/models.py around lines 216 to 220, the code calls
urlopen(image_url) on an external URL without validating the scheme; update the
code to parse the image_url (e.g., with urllib.parse.urlparse), allow only
'http' and 'https' schemes, and raise/log and skip or fail fast if the scheme is
not allowed before calling urlopen; ensure any new parsing/util helpers are
imported and that invalid URLs do not reach urlopen so only safe HTTP(S)
requests are performed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant