Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughThis PR introduces a complete end-to-end blog post content generation pipeline. It adds AI agents for outline and research question generation, a centralized orchestration pipeline with Django models for tracking generated content, async task handlers for queue-based execution, integration with Exa search API, and frontend views to visualize the research and generation process. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Frontend as Frontend<br/>View
participant Pipeline as Content Gen<br/>Pipeline
participant Agents as AI Agents
participant ExaAPI as Exa API
participant DB as Database
participant LinkScraper as Link Scraper
participant Tasks as Django-Q<br/>Tasks
User->>Frontend: Request blog generation
Frontend->>Pipeline: init_blog_post_content_generation(title)
activate Pipeline
Pipeline->>Agents: create_blog_post_outline_agent()
Agents-->>Pipeline: outline (section titles)
Pipeline->>DB: Create GeneratedBlogPost + Sections
Pipeline->>DB: Create GeneratedBlogPostIntroConcluding scaffolds
Pipeline->>Tasks: queue_research_question_generation_for_sections()
deactivate Pipeline
Tasks->>Agents: create_blog_post_section_research_questions_agent()
Agents-->>Tasks: research questions per section
Tasks->>DB: Create GeneratedBlogPostResearchQuestion records
Tasks->>Tasks: queue populate_research_links_task for each question
par Research Link Discovery
Tasks->>ExaAPI: Search with question + keywords
ExaAPI-->>Tasks: Search results (URLs, titles, metadata)
Tasks->>DB: Create GeneratedBlogPostResearchLink records
Tasks->>Tasks: queue scrape_research_link_content_task
and Section Content Synthesis (when ready)
Tasks->>LinkScraper: Fetch + scrape HTML content for each link
LinkScraper-->>DB: Update GeneratedBlogPostResearchLink.content
Tasks->>Tasks: queue analyze_research_link_content_task
end
Tasks->>Agents: create_research_link_analysis_agent()
Agents-->>Tasks: general_summary, contextual_summary, answer_to_question
Tasks->>DB: Update GeneratedBlogPostResearchLink analyses
opt When all questions answered
Tasks->>Agents: create_generate_blog_post_section_content_agent()
Agents->>DB: Fetch prior sections + research answers
Agents-->>Tasks: Section content (formatted markdown)
Tasks->>DB: Update GeneratedBlogPostSection.content
end
opt When all middle sections complete
Tasks->>Agents: create_generate_blog_post_intro_conclusion_agent()
Agents->>DB: Fetch all sections + outline
Agents-->>Tasks: Intro + Conclusion content
Tasks->>DB: Update GeneratedBlogPostSection (intro/conclusion)
end
Tasks->>Pipeline: populate_generated_blog_post_content()
Pipeline->>DB: Assemble markdown from all sections
Pipeline->>DB: Update GeneratedBlogPost.content (finalize)
Tasks-->>Frontend: Workflow complete
User->>Frontend: View research process
Frontend->>DB: Fetch GeneratedBlogPost + all relations
DB-->>Frontend: Blog post data with sections/questions/links
Frontend-->>User: Render Research Process view
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryReplaced GPTResearcher-based content generation with a new multi-stage pipeline that uses Exa for research, Jina Reader for content scraping, and pydantic-ai agents for summarization and outline generation. Key changes:
Issues found:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant API
participant Pipeline as pipeline.py
participant Task as django-q2 Tasks
participant Agent as AI Agents
participant Exa as Exa API
participant Jina as Jina Reader
participant DB as Database
User->>API: Generate blog post from title suggestion
API->>Pipeline: init_blog_post_content_generation()
Pipeline->>Agent: create_blog_post_outline_agent()
Agent-->>Pipeline: sections list (4-8 sections)
Pipeline->>DB: Create GeneratedBlogPost + Sections
loop For each research section
Pipeline->>Task: Queue generate_research_questions_for_section_task
Task->>Agent: create_blog_post_section_research_questions_agent()
Agent-->>Task: 3-6 research questions
Task->>DB: Create GeneratedBlogPostResearchQuestion records
loop For each question
Task->>Task: Queue populate_research_links_for_question_from_exa_task
Task->>Exa: Search for research question
Exa-->>Task: 2 research results
Task->>DB: Create GeneratedBlogPostResearchLink records
loop For each link
Task->>Task: Queue scrape_research_link_content_task
Task->>Jina: Fetch markdown content
Jina-->>Task: Page content
Task->>DB: Update research link with content
Task->>Task: Queue analyze_research_link_content_task
Task->>Agent: create_general_research_link_summary_agent()
Agent-->>Task: General summary
Task->>Agent: create_contextual_research_link_summary_agent()
Agent-->>Task: Contextual summary
Task->>DB: Update research link with summaries
end
end
end
Note over Pipeline,DB: Content generation continues in future updates
|
| name='date_scraped', | ||
| field=models.DateTimeField(auto_now_add=True, default=django.utils.timezone.now), | ||
| preserve_default=False, | ||
| ), |
There was a problem hiding this comment.
syntax: auto_now_add=True with default=django.utils.timezone.now is redundant - auto_now_add already automatically sets the timestamp on creation
| name='date_scraped', | |
| field=models.DateTimeField(auto_now_add=True, default=django.utils.timezone.now), | |
| preserve_default=False, | |
| ), | |
| field=models.DateTimeField(auto_now_add=True), |
Prompt To Fix With AI
This is a comment left during a code review.
Path: core/migrations/0051_rename_markdown_content_generatedblogpostresearchlink_content_and_more.py
Line: 40:43
Comment:
**syntax:** `auto_now_add=True` with `default=django.utils.timezone.now` is redundant - `auto_now_add` already automatically sets the timestamp on creation
```suggestion
field=models.DateTimeField(auto_now_add=True),
```
How can I resolve this? If you propose a fix, please make it concise.| ) | ||
| linking_from_blog_post = models.ForeignKey( | ||
| GeneratedBlogPost, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks" | ||
| ) | ||
|
|
||
| def __str__(self): | ||
| return f"{self.linking_from_blog_post.title} -> {self.linked_to_project_page.url}" |
There was a problem hiding this comment.
syntax: Typo in field name linkning_to_project_page - should be linking_to_project_page
| ) | |
| linking_from_blog_post = models.ForeignKey( | |
| GeneratedBlogPost, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks" | |
| ) | |
| def __str__(self): | |
| return f"{self.linking_from_blog_post.title} -> {self.linked_to_project_page.url}" | |
| linking_to_project_page = models.ForeignKey( | |
| ProjectPage, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks" | |
| ) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: core/models.py
Line: 2015:2021
Comment:
**syntax:** Typo in field name `linkning_to_project_page` - should be `linking_to_project_page`
```suggestion
linking_to_project_page = models.ForeignKey(
ProjectPage, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks"
)
```
How can I resolve this? If you propose a fix, please make it concise.| """ | ||
| current_datetime = timezone.now() | ||
| end_date_iso_format = current_datetime.date().isoformat() | ||
| start_date_iso_format = (current_datetime - timedelta(days=months_back * 30)).date().isoformat() |
There was a problem hiding this comment.
style: The date calculation assumes 30 days per month, which may not match user expectations for date ranges (e.g., 6 months back could be off by several days)
Consider using relativedelta from dateutil for accurate month arithmetic:
from dateutil.relativedelta import relativedelta
start_date_iso_format = (current_datetime - relativedelta(months=months_back)).date().isoformat()Prompt To Fix With AI
This is a comment left during a code review.
Path: core/content_generator/utils.py
Line: 14:14
Comment:
**style:** The date calculation assumes 30 days per month, which may not match user expectations for date ranges (e.g., 6 months back could be off by several days)
Consider using `relativedelta` from `dateutil` for accurate month arithmetic:
```python
from dateutil.relativedelta import relativedelta
start_date_iso_format = (current_datetime - relativedelta(months=months_back)).date().isoformat()
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (6)
core/content_generator/utils.py (1)
8-15: Consider using more accurate month arithmetic.The current implementation approximates each month as 30 days (line 14), which doesn't account for the actual varying lengths of months (28-31 days). For date range filters in search queries, this approximation may be acceptable, but consider using
dateutil.relativedeltafor more precise month subtraction if accuracy matters for your use case.🔎 Alternative implementation with dateutil
from __future__ import annotations -from datetime import timedelta +from dateutil.relativedelta import relativedelta from django.utils import timezone def get_exa_date_range_iso_strings(*, months_back: int) -> tuple[str, str]: """ Exa expects date filters as strings (YYYY-MM-DD). """ current_datetime = timezone.now() end_date_iso_format = current_datetime.date().isoformat() - start_date_iso_format = (current_datetime - timedelta(days=months_back * 30)).date().isoformat() + start_date_iso_format = (current_datetime - relativedelta(months=months_back)).date().isoformat() return start_date_iso_format, end_date_iso_formatNote: This would require adding
python-dateutilas a dependency if it's not already included (though it's likely already present as a transitive dependency of other packages).core/tasks.py (1)
1785-1794: Delegation pattern is appropriate, but consider renaming the alias for clarity.The local import avoids circular dependencies and the delegation is clean. However, importing
generate_research_questions_for_section_task as delegated_taskuses a very generic alias. Consider a more descriptive name like_inner_taskor just call the function directly without aliasing.🔎 Optional: More explicit delegation
def generate_research_questions_for_section_task(section_id: int): """ Generate research questions for one blog post section, then queue Exa research link tasks for each created question. """ from core.content_generator.tasks import ( - generate_research_questions_for_section_task as delegated_task, + generate_research_questions_for_section_task as _content_generator_task, ) - return delegated_task(section_id=section_id) + return _content_generator_task(section_id=section_id)snippets/inspect_blog_post_title_suggestion.py (2)
19-29: Add error handling for invalid input in the developer tool.While this is a developer tool, wrapping the input parsing in try-except would provide a better experience:
🔎 Proposed improvement
blog_post_title_suggestion_id_raw = input("BlogPostTitleSuggestion id: ").strip() -blog_post_title_suggestion_id = int(blog_post_title_suggestion_id_raw) +try: + blog_post_title_suggestion_id = int(blog_post_title_suggestion_id_raw) +except ValueError: + raise SystemExit(f"Invalid ID: {blog_post_title_suggestion_id_raw!r}")
78-78: Remove unusednoqadirectives.Static analysis correctly identifies these
# noqa: E501comments as unused. Remove them to keep the code clean.Also applies to: 127-127, 154-156
core/content_generator/tasks.py (1)
72-91: Task implementation is correct; remove unused noqa directive.The task correctly generates research questions and queues downstream tasks for each. The
# noqa: E501on line 91 is flagged as unused and should be removed.core/agents/research_link_summary_agent.py (1)
71-90: Contextual summary agent implementation is good.The multiple system prompts provide rich context for the agent. Remove the unused
noqadirectives on lines 78-79.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (16)
README.mdcore/agents/blog_post_outline_agent.pycore/agents/research_link_summary_agent.pycore/agents/schemas.pycore/content_generator/__init__.pycore/content_generator/pipeline.pycore/content_generator/tasks.pycore/content_generator/utils.pycore/migrations/0050_backlink_generatedblogpostresearchquestion_and_more.pycore/migrations/0051_rename_markdown_content_generatedblogpostresearchlink_content_and_more.pycore/models.pycore/tasks.pypyproject.tomlrequirements.txtsnippets/inspect_blog_post_title_suggestion.pytuxseo/settings.py
🧰 Additional context used
📓 Path-based instructions (8)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Follow PEP 8 for Python code style
Use descriptive variable names with underscores (snake_case) in PythonUse try-except blocks for specific errors, avoid excepting Exception
**/*.py: Use Python's logging module (structlog) extensively
Include context in log messages with relevant fields (error, exc_info, resource IDs, etc.) to aid debugging and traceability
**/*.py: Follow PEP 8 style guide and prioritize readability and maintainability
Use descriptive variable and function names with underscores (snake_case)
Leverage Django's ORM and avoid raw SQL when possible
Use Pydantic models for schema validation in django-ninja APIs
**/*.py: Use descriptive, full-word variable names that clearly communicate purpose and context; avoid abbreviations and single-letter variables
Provide context in variable names, especially when format or type matters to the implementation (e.g., 'current_date_iso_format')
Extract unchanging values into constants using UPPER_CASE naming (e.g., MAX_LOGIN_ATTEMPTS, DEFAULT_TIMEOUT_MS)
Break down complex operations with descriptive intermediate variables instead of accessing array indices directly
Use 'is_', 'has_', 'can_' prefixes for boolean variables
Include 'date' in variable names that represent dates
Use snake_case for variables and functions in Python
Use PascalCase for class names in Python
Keep variable lifespan short by defining variables close to where they're used to reduce cognitive load
Name functions after what they do, not how they're used; ask 'Will I understand this without my current context?'
Avoid generic function/variable names like 'data', 'info', 'manager'; be specific about purpose (e.g., 'calculate_customer_lifetime_value')
Include necessary context in function names without being verbose (e.g., 'add_month_to_date' not 'add_to_date' or 'add_number_of_months_to_date')
If a function cannot be named clearly, split it into smaller, focused functions with better-defined responsibilities
Use the same verbs ...
Files:
core/agents/schemas.pytuxseo/settings.pysnippets/inspect_blog_post_title_suggestion.pycore/tasks.pycore/content_generator/utils.pycore/content_generator/__init__.pycore/agents/research_link_summary_agent.pycore/content_generator/tasks.pycore/agents/blog_post_outline_agent.pycore/content_generator/pipeline.pycore/migrations/0050_backlink_generatedblogpostresearchquestion_and_more.pycore/migrations/0051_rename_markdown_content_generatedblogpostresearchlink_content_and_more.pycore/models.py
core/agents/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use pydantic-ai for AI agent functionality with implementations in core/agents/
Files:
core/agents/schemas.pycore/agents/research_link_summary_agent.pycore/agents/blog_post_outline_agent.py
**/*.{js,ts,tsx,jsx,py,java,cs,php,rb,go,rs,swift,kt,scala,groovy}
📄 CodeRabbit inference engine (.cursor/rules/code-style.mdc)
Use double quotes instead of single quotes
Files:
core/agents/schemas.pytuxseo/settings.pysnippets/inspect_blog_post_title_suggestion.pycore/tasks.pycore/content_generator/utils.pycore/content_generator/__init__.pycore/agents/research_link_summary_agent.pycore/content_generator/tasks.pycore/agents/blog_post_outline_agent.pycore/content_generator/pipeline.pycore/migrations/0050_backlink_generatedblogpostresearchquestion_and_more.pycore/migrations/0051_rename_markdown_content_generatedblogpostresearchlink_content_and_more.pycore/models.py
pyproject.toml
📄 CodeRabbit inference engine (CLAUDE.md)
Use Poetry for Python dependency management with pyproject.toml
Files:
pyproject.toml
core/tasks.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define background tasks using django-q2 in core/tasks.py
Files:
core/tasks.py
**/tasks.py
📄 CodeRabbit inference engine (.cursor/rules/backend.mdc)
Use django-q2 syntax for implementing background workers and tasks
Files:
core/tasks.pycore/content_generator/tasks.py
core/{views,models}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Apply fat models, skinny views pattern: keep business logic primarily in Django models and mixins, while views handle request/response only
Files:
core/models.py
core/models.py
📄 CodeRabbit inference engine (CLAUDE.md)
Validate simple constraints in database, complex logic in Django models
Files:
core/models.py
🧠 Learnings (6)
📚 Learning: 2025-11-28T10:30:00.003Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-28T10:30:00.003Z
Learning: Applies to core/tasks.py : Define background tasks using django-q2 in core/tasks.py
Applied to files:
tuxseo/settings.pycore/content_generator/tasks.pycore/models.py
📚 Learning: 2025-11-28T10:30:23.438Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: .cursor/rules/backend.mdc:0-0
Timestamp: 2025-11-28T10:30:23.438Z
Learning: Applies to **/tasks.py : Use django-q2 syntax for implementing background workers and tasks
Applied to files:
tuxseo/settings.pycore/content_generator/tasks.py
📚 Learning: 2025-11-28T10:30:00.003Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-28T10:30:00.003Z
Learning: Applies to pyproject.toml : Use Poetry for Python dependency management with pyproject.toml
Applied to files:
pyproject.toml
📚 Learning: 2025-11-28T10:31:29.426Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-28T10:31:29.426Z
Learning: Applies to **/*.py : Include necessary context in function names without being verbose (e.g., 'add_month_to_date' not 'add_to_date' or 'add_number_of_months_to_date')
Applied to files:
core/content_generator/utils.py
📚 Learning: 2025-11-28T10:30:04.521Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: .cursor/rules/agent-rules.mdc:0-0
Timestamp: 2025-11-28T10:30:04.521Z
Learning: Always add AGENTS.md into AI context
Applied to files:
core/agents/research_link_summary_agent.pycore/agents/blog_post_outline_agent.py
📚 Learning: 2025-11-28T10:30:00.003Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-28T10:30:00.003Z
Learning: Add AGENTS.md into AI context when working with the repository
Applied to files:
core/agents/blog_post_outline_agent.py
🧬 Code graph analysis (6)
core/agents/schemas.py (2)
core/base_models.py (1)
BaseModel(7-19)core/models.py (1)
web_page_content(1332-1337)
core/tasks.py (1)
core/content_generator/tasks.py (1)
generate_research_questions_for_section_task(72-91)
core/agents/research_link_summary_agent.py (3)
core/agents/schemas.py (3)
ResearchLinkContextualSummaryContext(196-202)TextSummary(17-18)WebPageContent(11-14)core/choices.py (1)
get_default_ai_model(140-142)core/models.py (2)
web_page_content(1332-1337)project_details(449-464)
core/content_generator/tasks.py (2)
core/content_generator/pipeline.py (2)
analyze_research_link_content(401-518)generate_research_questions_for_section(521-598)tuxseo/utils.py (1)
get_tuxseo_logger(4-10)
core/agents/blog_post_outline_agent.py (3)
core/agents/schemas.py (1)
BlogPostGenerationContext(186-193)core/agents/system_prompts.py (2)
add_target_keywords(140-153)add_title_details(116-129)core/choices.py (1)
get_default_ai_model(140-142)
core/models.py (1)
core/content_generator/pipeline.py (1)
init_blog_post_content_generation(171-189)
🪛 Ruff (0.14.10)
snippets/inspect_blog_post_title_suggestion.py
29-29: Avoid specifying long messages outside the exception class
(TRY003)
71-71: Do not catch blind exception: Exception
(BLE001)
78-78: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
89-89: Do not catch blind exception: Exception
(BLE001)
127-127: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
154-154: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
155-155: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
156-156: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
core/agents/research_link_summary_agent.py
50-50: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
61-61: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
78-78: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
79-79: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
core/content_generator/tasks.py
91-91: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
core/agents/blog_post_outline_agent.py
24-24: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
79-79: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
core/content_generator/pipeline.py
62-62: Avoid specifying long messages outside the exception class
(TRY003)
65-65: Avoid specifying long messages outside the exception class
(TRY003)
145-145: Avoid specifying long messages outside the exception class
(TRY003)
206-206: Avoid specifying long messages outside the exception class
(TRY003)
210-210: Avoid specifying long messages outside the exception class
(TRY003)
321-321: Avoid specifying long messages outside the exception class
(TRY003)
335-335: Avoid specifying long messages outside the exception class
(TRY003)
421-421: Avoid specifying long messages outside the exception class
(TRY003)
426-426: Avoid specifying long messages outside the exception class
(TRY003)
472-472: Avoid specifying long messages outside the exception class
(TRY003)
494-494: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
537-537: Avoid specifying long messages outside the exception class
(TRY003)
551-551: Avoid specifying long messages outside the exception class
(TRY003)
core/migrations/0050_backlink_generatedblogpostresearchquestion_and_more.py
11-13: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
15-95: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
core/migrations/0051_rename_markdown_content_generatedblogpostresearchlink_content_and_more.py
9-11: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
13-44: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🔇 Additional comments (25)
core/agents/schemas.py (2)
17-19: LGTM! Well-defined schema for text summaries.The
TextSummarymodel is simple, focused, and includes a clear field description that will help with AI agent prompting.
196-203: LGTM! Comprehensive context model for research link summaries.The
ResearchLinkContextualSummaryContextmodel aggregates all necessary contextual information for generating research-linked blog post summaries. The field names are descriptive and follow the project's naming conventions.core/content_generator/__init__.py (1)
1-7: LGTM! Clear package documentation.The docstring provides a concise overview of the package structure and its modules.
tuxseo/settings.py (2)
552-552: LGTM! Appropriate shell import for development convenience.Adding the Exa import to
SHELL_PLUS_IMPORTSfollows the existing pattern and will make the Exa client available in the Django shell for manual testing and debugging.
555-555: LGTM! Consistent API key configuration.The
EXA_API_KEYenvironment variable follows the same pattern as other API keys in the codebase, with an appropriate default empty string for optional configuration.README.md (1)
43-43: LGTM! Documentation updated for new API key requirement.The README correctly documents the new
EXA_API_KEYenvironment variable as required for deployment, consistent with the integration of the Exa API introduced in this PR.requirements.txt (1)
1-297: LGTM! Auto-generated dependency file.This file is automatically generated by Poetry from
pyproject.toml. The addition ofexa-py==2.0.2(line 67) and other dependency version updates are expected as part of the dependency resolution process.core/migrations/0051_rename_markdown_content_generatedblogpostresearchlink_content_and_more.py (2)
9-11: Static analysis false positive - ignore Ruff hints.The Ruff hints about
ClassVarannotations are false positives. Django migration classes use class attributes in a specific way that doesn't requireClassVarannotations. These hints can be safely ignored.
7-44: Field removal is safe—thequestionfield was redundant.The
questionfield onGeneratedBlogPostResearchLink(removed at lines 24-27) does not pose a data loss risk. The codebase never accessed this field directly; all question data is retrieved through theresearch_questionForeignKey relationship, which points toGeneratedBlogPostResearchQuestionwhere the actual question text is stored. This removal is a safe cleanup of redundant data that existed in parallel to the relationship.The
preserve_default=Falseon thedate_scrapedfield alteration (line 42) is correct.pyproject.toml (1)
45-45: exa-py version 2.0.2 exists and has no known vulnerabilities.Version 2.0.2 of the exa-py package is available on PyPI (released December 19, 2025) and has no recorded CVE advisories or known security vulnerabilities according to PyPI, Safety DB, and the official GitHub repository.
core/content_generator/tasks.py (2)
1-14: LGTM! Clean task module structure.The imports are well-organized, and the module follows the django-q2 patterns correctly as per learnings. Using a dedicated logger for this module provides good traceability.
37-54: Good task chaining pattern.The conditional queuing of
analyze_research_link_content_taskonly when content was successfully fetched prevents unnecessary work and follows a clean pipeline pattern.core/agents/research_link_summary_agent.py (2)
12-31: Clean helper functions for formatting agent context.The helper functions
_add_webpage_content_from_web_page_contentand_add_webpage_content_from_contextual_depsprovide consistent formatting for the agent prompts.
54-68: General summary agent is well-configured.The agent uses appropriate settings with retries and a reasonable temperature for summarization. The system prompt is clear and focused.
core/agents/blog_post_outline_agent.py (3)
17-27: Clean Pydantic models for agent output.The schema definitions are clear and include helpful field descriptions. The Field descriptions document the expected output format.
44-60: Outline agent is well-implemented.The agent reuses existing system prompt helpers for consistency. The temperature of 0.7 is appropriate for creative outline generation.
82-98: Research questions agent follows same pattern.Good consistency with the outline agent. Both agents share the same system prompt helpers, ensuring consistent context.
core/models.py (3)
803-815: Clean backward-compatible wrapper for content generation.The refactor preserves the existing API while delegating to the new pipeline. The docstring clearly explains the change.
1189-1218: New models for sections and research questions are well-structured.The models have appropriate fields and relationships. The
on_delete=CASCADEis correct for these child records.
1221-1253: GeneratedBlogPostResearchLink model is comprehensive.Good separation between initial data, Jina augmentation, and AI augmentation fields. The VectorField for embeddings is correctly configured with 1024 dimensions.
core/content_generator/pipeline.py (4)
38-54: Well-defined constants and context creation.The constants for section titles and the helper function for creating generation context are clean and reusable.
95-132: Correct use of transaction.atomic for data integrity.The blog post and sections are created atomically, ensuring consistency. The logging provides good traceability.
241-277: Robust Exa result parsing with validation.Good handling of both object and dict response formats. URL validation (length check, protocol check) prevents invalid data from being stored.
573-589: Bulk creation of research questions is efficient.Using
bulk_createfor the questions is appropriate. The question text is correctly truncated to fit the model's max_length.core/migrations/0050_backlink_generatedblogpostresearchquestion_and_more.py (1)
48-95: Migration structure is correct for the new models.The models are created with appropriate fields and relationships. The subsequent migration (0051) correctly handles field renames and additions.
| url = (research_link.url or "").strip() | ||
| if not url.startswith(("http://", "https://")): | ||
| logger.info( | ||
| "[ContentGenerator] Skipping scrape/summarize for invalid research link url", | ||
| research_link_id=research_link.id, | ||
| url=url, | ||
| ) | ||
| return 0 | ||
|
|
There was a problem hiding this comment.
Return type inconsistency: returns 0 instead of False.
The function signature indicates it returns bool, but line 330 returns 0 for invalid URLs. This should return False for consistency.
🔎 Proposed fix
if not url.startswith(("http://", "https://")):
logger.info(
"[ContentGenerator] Skipping scrape/summarize for invalid research link url",
research_link_id=research_link.id,
url=url,
)
- return 0
+ return False📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| url = (research_link.url or "").strip() | |
| if not url.startswith(("http://", "https://")): | |
| logger.info( | |
| "[ContentGenerator] Skipping scrape/summarize for invalid research link url", | |
| research_link_id=research_link.id, | |
| url=url, | |
| ) | |
| return 0 | |
| url = (research_link.url or "").strip() | |
| if not url.startswith(("http://", "https://")): | |
| logger.info( | |
| "[ContentGenerator] Skipping scrape/summarize for invalid research link url", | |
| research_link_id=research_link.id, | |
| url=url, | |
| ) | |
| return False |
🤖 Prompt for AI Agents
In core/content_generator/pipeline.py around lines 323 to 331, the function is
declared to return a bool but currently returns the integer 0 for invalid
research link URLs; change the return value from 0 to False to match the
declared return type and maintain consistency across callers and type checks.
Ensure any related tests or callers that may check truthiness still work, and
run linters/type-checkers after the change.
| migrations.CreateModel( | ||
| name='Backlink', | ||
| fields=[ | ||
| ('id', models.BigAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')), | ||
| ('uuid', models.UUIDField(default=uuid.uuid4, editable=False)), | ||
| ('created_at', models.DateTimeField(auto_now_add=True)), | ||
| ('updated_at', models.DateTimeField(auto_now=True)), | ||
| ('deleted_at', models.DateTimeField(blank=True, null=True)), | ||
| ('linked_from_project_page', models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, related_name='backlinks_from', to='core.project')), | ||
| ('linked_to_project_page', models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, related_name='backlinks_to', to='core.project')), | ||
| ('linking_from_blog_post', models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, related_name='backlinks', to='core.generatedblogpost')), | ||
| ('linkning_to_project_page', models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, related_name='backlinks', to='core.projectpage')), | ||
| ], | ||
| options={ | ||
| 'abstract': False, | ||
| }, | ||
| ), |
There was a problem hiding this comment.
Backlink migration has the same field name typo.
The field linkning_to_project_page (line 27) has the same typo as in core/models.py. If you fix the model, you'll need to add a RenameField operation in a new migration.
Note: The static analysis hints about ClassVar are false positives for Django migration files and can be safely ignored.
🤖 Prompt for AI Agents
In core/migrations/0050_backlink_generatedblogpostresearchquestion_and_more.py
around lines 16 to 32, the Backlink model migration contains a typo in the field
name "linkning_to_project_page"; fix the field name in the Django model to
"linking_to_project_page" and then create a new migration that uses
migrations.RenameField on the Backlink model to rename
"linkning_to_project_page" -> "linking_to_project_page" so the database column
is preserved and migrations remain consistent.
| class Backlink(BaseModel): | ||
| linked_to_project_page = models.ForeignKey( | ||
| Project, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks_to" | ||
| ) | ||
| linkning_to_project_page = models.ForeignKey( | ||
| ProjectPage, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks" | ||
| ) | ||
|
|
||
| linked_from_project_page = models.ForeignKey( | ||
| Project, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks_from" | ||
| ) | ||
| linking_from_blog_post = models.ForeignKey( | ||
| GeneratedBlogPost, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks" | ||
| ) | ||
|
|
||
| def __str__(self): | ||
| return f"{self.linking_from_blog_post.title} -> {self.linked_to_project_page.url}" |
There was a problem hiding this comment.
Typo in field name and potential AttributeError in __str__.
- Field name
linkning_to_project_pagehas a typo - should belinking_to_project_page. - The
__str__method will raiseAttributeErroriflinking_from_blog_postorlinked_to_project_pageisNone(both are nullable FKs).
🔎 Proposed fixes
class Backlink(BaseModel):
linked_to_project_page = models.ForeignKey(
Project, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks_to"
)
- linkning_to_project_page = models.ForeignKey(
+ linking_to_project_page = models.ForeignKey(
ProjectPage, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks"
)
linked_from_project_page = models.ForeignKey(
Project, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks_from"
)
linking_from_blog_post = models.ForeignKey(
GeneratedBlogPost, null=True, blank=True, on_delete=models.CASCADE, related_name="backlinks"
)
def __str__(self):
- return f"{self.linking_from_blog_post.title} -> {self.linked_to_project_page.url}"
+ from_title = self.linking_from_blog_post.title if self.linking_from_blog_post else "Unknown"
+ to_url = self.linked_to_project_page.url if self.linked_to_project_page else "Unknown"
+ return f"{from_title} -> {to_url}"Note: Fixing the field name typo requires a new migration. Consider whether you want to fix it now or defer.
🤖 Prompt for AI Agents
In core/models.py around lines 2005 to 2021, there's a typo in the field name
`linkning_to_project_page` (should be `linking_to_project_page`) and the __str__
method can raise AttributeError because nullable FKs `linking_from_blog_post` or
`linked_to_project_page` may be None; rename the field to
`linking_to_project_page` (updated related_name if needed) and create a Django
migration to apply the rename (or add a db_column/AlterField migration if
preserving existing DB name is required), and update __str__ to safely handle
None by returning a fallback string (e.g., use getattr(..., "title",
"<no-post>") and getattr(..., "url", "<no-url>")).
| ) | ||
| ) | ||
|
|
||
| if settings.DEBUG: |
There was a problem hiding this comment.
Using settings.DEBUG here makes it harder to unit test or manually test this code in isolation because the value of settings.DEBUG affects other Django behaviors such as showing technical error log when uncaught exceptions occur, sending admin emails, recording every SQL query executed, and many more.
Instead, consider adding a feature flag specifically for this rather than using settings.DEBUG. More details.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/models.py (1)
1033-1034: Validate URL scheme before opening withurlopen.Same security concern as in
content_generation/models.py. Theurlopencall should validate the URL scheme before opening.🔎 Proposed fix
+ from urllib.parse import urlparse + + parsed_url = urlparse(image_url) + if parsed_url.scheme not in ("http", "https"): + logger.error( + "[GenerateOGImage] Invalid URL scheme from Replicate", + blog_post_id=self.id, + project_id=self.project_id, + image_url=image_url, + scheme=parsed_url.scheme, + ) + return False, f"Invalid URL scheme: {parsed_url.scheme}" + image_response = urlopen(image_url)
♻️ Duplicate comments (2)
core/models.py (1)
2006-2022: Typo and potentialAttributeErrorinBacklinkmodel.This issue was flagged in past review comments:
- Field name
linkning_to_project_pagehas a typo - should belinking_to_project_page- The
__str__method will raiseAttributeErrorif nullable FKs areNonecore/content_generator/pipeline.py (1)
343-350: Return type inconsistency: returns0instead ofFalse.This was flagged in a past review. The function signature indicates
-> boolbut line 350 returns0for invalid URLs.
🧹 Nitpick comments (10)
frontend/templates/blog/generated_blog_post_detail.html (1)
34-46: LGTM!The conditional rendering correctly guards the button display. Consider using
generated_post.title_suggestion_iddirectly in the URL for consistency with the condition check, though the current approach works correctly given Django's FK integrity.href="{% url 'blog_post_research_process' project_pk=generated_post.project.id pk=generated_post.title_suggestion_id %}"frontend/templates/blog/blog_post_research_process.html (1)
204-223: Consider adding accessible labels for interactive summary elements.The
<summary>elements in the<details>sections function as buttons. For better screen reader support, consider addingrole="button"or ensuring the summary text clearly indicates the expandable nature.However, this is a minor accessibility enhancement - the current implementation using semantic
<details>/<summary>elements already provides good baseline accessibility per coding guidelines.core/views.py (1)
1004-1082: Consider extracting the repeated sorting pattern.The
_build_generated_blog_posts_datamethod works correctly, but there's repeated logic for building question data with sorted links. Consider extracting a helper:🔎 Optional refactor for reduced duplication
def _build_question_data(self, question): """Build question dict with sorted research links.""" research_links = sorted( list(question.research_links.all()), key=lambda link: link.id, ) return { "id": question.id, "question": question.question, "links": research_links, }Then use it in both section questions and blog-level questions loops.
core/agents/research_link_summary_agent.py (2)
51-51: Remove unusednoqadirective.Static analysis indicates this
noqa: E501directive is unnecessary as the line length rule is not enabled.🔎 Proposed fix
- "You must tailor the summary to help the writer answer the research question for that section.\n" # noqa: E501 + "You must tailor the summary to help the writer answer the research question for that section.\n"
60-66: Remove unusednoqadirectives.The
noqa: E501directives on lines 62, 79, and 80 are flagged as unused by static analysis.🔎 Proposed fix
system_prompt=( "You are an expert content summarizer. Summarize the web page content provided.\n" - "Return a concise 2-3 sentence summary that captures the main purpose and key information.\n" # noqa: E501 + "Return a concise 2-3 sentence summary that captures the main purpose and key information.\n" "Focus on what the page is about and its main value proposition.\n" ),core/content_generator/tasks.py (1)
130-149: Remove unusednoqadirective on line 149.Static analysis indicates the
noqa: E501directive is unnecessary.🔎 Proposed fix
- return f"Generated {len(created_research_question_ids)} research questions for section {section_id}" # noqa: E501 + return f"Generated {len(created_research_question_ids)} research questions for section {section_id}"content_generation/models.py (2)
240-248: Avoid catching bareException.Per coding guidelines, use
try-exceptblocks for specific errors. The broadExceptioncatch here obscures the root cause. Consider catching more specific exceptions or at minimum re-raising after logging.🔎 Proposed fix
- except Exception as error: + except (OSError, ValueError, TypeError) as error: logger.error( "[GenerateOGImage] Unexpected error during image generation", error=str(error),Alternatively, if you need to catch all exceptions to prevent task failures, consider adding a comment explaining why.
79-79: Remove unusednoqadirectives.Static analysis indicates the
noqa: E501directives on lines 79, 261, and 334 are unnecessary.core/content_generator/pipeline.py (2)
615-616: Consider parameterizing the DEBUG limit.The
LOCAL_MAX_RESEARCH_QUESTIONS_PER_SECTIONis used to limit questions in DEBUG mode. While useful for development, consider making this configurable via settings rather than a module constant for more flexibility.
407-407: Consider simplifyinglist(dict.fromkeys(update_fields)).While this works to deduplicate the list, since you're appending distinct field names sequentially, duplicates shouldn't occur. You could use a plain list or, if deduplication is truly needed for safety, consider using a set earlier.
🔎 Proposed simplification
- research_link.save(update_fields=list(dict.fromkeys(update_fields))) + research_link.save(update_fields=update_fields)The
dict.fromkeyspattern preserves order while deduplicating, but since you're appending unique field names, this is unnecessary complexity.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
README.mdcontent_generation/__init__.pycontent_generation/admin.pycontent_generation/apps.pycontent_generation/migrations/__init__.pycontent_generation/models.pycontent_generation/tests.pycontent_generation/views.pycore/agents/generate_blog_post_intro_conclusion_agent.pycore/agents/generate_blog_post_section_content_agent.pycore/agents/research_link_summary_agent.pycore/agents/schemas.pycore/content_generator/pipeline.pycore/content_generator/tasks.pycore/models.pycore/urls.pycore/views.pyfrontend/templates/blog/blog_post_research_process.htmlfrontend/templates/blog/generated_blog_post_detail.htmlfrontend/templates/components/blog_post_suggestion_card.htmltuxseo/settings.py
✅ Files skipped from review due to trivial changes (3)
- content_generation/views.py
- content_generation/admin.py
- content_generation/tests.py
🚧 Files skipped from review as they are similar to previous changes (2)
- README.md
- tuxseo/settings.py
🧰 Additional context used
📓 Path-based instructions (12)
frontend/templates/**/*.html
📄 CodeRabbit inference engine (CLAUDE.md)
frontend/templates/**/*.html: Use semantic HTML elements (dialog, details/summary, etc.) in Django templates
Use Stimulus controllers for interactive frontend behavior, connecting via data attributes
Files:
frontend/templates/components/blog_post_suggestion_card.htmlfrontend/templates/blog/generated_blog_post_detail.htmlfrontend/templates/blog/blog_post_research_process.html
**/*.html
📄 CodeRabbit inference engine (.cursor/rules/backend.mdc)
Use Django templates for HTML rendering
Files:
frontend/templates/components/blog_post_suggestion_card.htmlfrontend/templates/blog/generated_blog_post_detail.htmlfrontend/templates/blog/blog_post_research_process.html
**/*.{js,html}
📄 CodeRabbit inference engine (.cursor/rules/frontend.mdc)
**/*.{js,html}: Prefer Stimulus JS for adding interactivity to Django templates instead of raw script elements
Leverage Stimulus data attributes to connect HTML elements with JavaScript functionality
Employ Stimulus actions to handle user interactions and events
Files:
frontend/templates/components/blog_post_suggestion_card.htmlfrontend/templates/blog/generated_blog_post_detail.htmlfrontend/templates/blog/blog_post_research_process.html
**/*.{html,erb,stimulus.{js,ts}}
📄 CodeRabbit inference engine (.cursor/rules/ui-ux-design-guidelines.mdc)
**/*.{html,erb,stimulus.{js,ts}}: Always generate semantic HTML when writing HTML, CSS, or styles in Stimulus controllers
Always favor the 'utility first' Tailwind approach when using TailwindCSS v3.x; reusable style classes should not be created often; code should be reused primarily through template components
Files:
frontend/templates/components/blog_post_suggestion_card.htmlfrontend/templates/blog/generated_blog_post_detail.htmlfrontend/templates/blog/blog_post_research_process.html
**/*.{css,scss,html,erb,stimulus.{js,ts}}
📄 CodeRabbit inference engine (.cursor/rules/ui-ux-design-guidelines.mdc)
Never create new styles without explicitly receiving permission to do so in HTML, CSS, or Stimulus controllers that use D3.js
Files:
frontend/templates/components/blog_post_suggestion_card.htmlfrontend/templates/blog/generated_blog_post_detail.htmlfrontend/templates/blog/blog_post_research_process.html
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Follow PEP 8 for Python code style
Use descriptive variable names with underscores (snake_case) in PythonUse try-except blocks for specific errors, avoid excepting Exception
**/*.py: Use Python's logging module (structlog) extensively
Include context in log messages with relevant fields (error, exc_info, resource IDs, etc.) to aid debugging and traceability
**/*.py: Follow PEP 8 style guide and prioritize readability and maintainability
Use descriptive variable and function names with underscores (snake_case)
Leverage Django's ORM and avoid raw SQL when possible
Use Pydantic models for schema validation in django-ninja APIs
**/*.py: Use descriptive, full-word variable names that clearly communicate purpose and context; avoid abbreviations and single-letter variables
Provide context in variable names, especially when format or type matters to the implementation (e.g., 'current_date_iso_format')
Extract unchanging values into constants using UPPER_CASE naming (e.g., MAX_LOGIN_ATTEMPTS, DEFAULT_TIMEOUT_MS)
Break down complex operations with descriptive intermediate variables instead of accessing array indices directly
Use 'is_', 'has_', 'can_' prefixes for boolean variables
Include 'date' in variable names that represent dates
Use snake_case for variables and functions in Python
Use PascalCase for class names in Python
Keep variable lifespan short by defining variables close to where they're used to reduce cognitive load
Name functions after what they do, not how they're used; ask 'Will I understand this without my current context?'
Avoid generic function/variable names like 'data', 'info', 'manager'; be specific about purpose (e.g., 'calculate_customer_lifetime_value')
Include necessary context in function names without being verbose (e.g., 'add_month_to_date' not 'add_to_date' or 'add_number_of_months_to_date')
If a function cannot be named clearly, split it into smaller, focused functions with better-defined responsibilities
Use the same verbs ...
Files:
core/agents/generate_blog_post_section_content_agent.pycore/agents/generate_blog_post_intro_conclusion_agent.pycore/content_generator/tasks.pycore/content_generator/pipeline.pycontent_generation/models.pycore/views.pycontent_generation/apps.pycore/models.pycore/agents/schemas.pycore/agents/research_link_summary_agent.pycore/urls.py
core/agents/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use pydantic-ai for AI agent functionality with implementations in core/agents/
Files:
core/agents/generate_blog_post_section_content_agent.pycore/agents/generate_blog_post_intro_conclusion_agent.pycore/agents/schemas.pycore/agents/research_link_summary_agent.py
**/*.{js,ts,tsx,jsx,py,java,cs,php,rb,go,rs,swift,kt,scala,groovy}
📄 CodeRabbit inference engine (.cursor/rules/code-style.mdc)
Use double quotes instead of single quotes
Files:
core/agents/generate_blog_post_section_content_agent.pycore/agents/generate_blog_post_intro_conclusion_agent.pycore/content_generator/tasks.pycore/content_generator/pipeline.pycontent_generation/models.pycore/views.pycontent_generation/apps.pycore/models.pycore/agents/schemas.pycore/agents/research_link_summary_agent.pycore/urls.py
**/tasks.py
📄 CodeRabbit inference engine (.cursor/rules/backend.mdc)
Use django-q2 syntax for implementing background workers and tasks
Files:
core/content_generator/tasks.py
core/{views,models}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Apply fat models, skinny views pattern: keep business logic primarily in Django models and mixins, while views handle request/response only
Files:
core/views.pycore/models.py
**/views.py
📄 CodeRabbit inference engine (.cursor/rules/backend.mdc)
Use Class-Based Views (CBVs) for complex views and Function-Based Views (FBVs) for simpler logic
Files:
core/views.py
core/models.py
📄 CodeRabbit inference engine (CLAUDE.md)
Validate simple constraints in database, complex logic in Django models
Files:
core/models.py
🧠 Learnings (6)
📚 Learning: 2025-11-28T10:30:04.521Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: .cursor/rules/agent-rules.mdc:0-0
Timestamp: 2025-11-28T10:30:04.521Z
Learning: Always add AGENTS.md into AI context
Applied to files:
core/agents/generate_blog_post_section_content_agent.pycore/agents/generate_blog_post_intro_conclusion_agent.pycore/agents/research_link_summary_agent.py
📚 Learning: 2025-11-28T10:30:00.003Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-28T10:30:00.003Z
Learning: Applies to core/agents/**/*.py : Use pydantic-ai for AI agent functionality with implementations in core/agents/
Applied to files:
core/agents/generate_blog_post_section_content_agent.py
📚 Learning: 2025-11-28T10:30:00.003Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-28T10:30:00.003Z
Learning: Add AGENTS.md into AI context when working with the repository
Applied to files:
core/agents/generate_blog_post_section_content_agent.pycore/agents/generate_blog_post_intro_conclusion_agent.py
📚 Learning: 2025-11-28T10:30:23.438Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: .cursor/rules/backend.mdc:0-0
Timestamp: 2025-11-28T10:30:23.438Z
Learning: Applies to **/tasks.py : Use django-q2 syntax for implementing background workers and tasks
Applied to files:
core/content_generator/tasks.py
📚 Learning: 2025-11-28T10:30:00.003Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-28T10:30:00.003Z
Learning: Applies to core/tasks.py : Define background tasks using django-q2 in core/tasks.py
Applied to files:
core/content_generator/tasks.pycore/models.py
📚 Learning: 2025-11-28T10:30:00.003Z
Learnt from: CR
Repo: rasulkireev/TuxSEO PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-28T10:30:00.003Z
Learning: Applies to core/schemas.py : Use Pydantic schemas in core/schemas.py for structured AI outputs
Applied to files:
core/agents/schemas.py
🧬 Code graph analysis (7)
core/agents/generate_blog_post_section_content_agent.py (3)
core/agents/schemas.py (2)
BlogPostSectionContentGenerationContext(247-270)GeneratedBlogPostSectionContentSchema(273-278)core/choices.py (2)
ContentType(4-6)get_default_ai_model(140-142)core/models.py (1)
project_details(449-464)
core/agents/generate_blog_post_intro_conclusion_agent.py (3)
core/agents/schemas.py (2)
BlogPostIntroConclusionGenerationContext(281-291)GeneratedBlogPostIntroConclusionSchema(294-300)core/choices.py (2)
ContentType(4-6)get_default_ai_model(140-142)core/models.py (1)
project_details(449-464)
core/content_generator/pipeline.py (8)
core/agents/blog_post_outline_agent.py (1)
create_blog_post_outline_agent(44-60)core/agents/generate_blog_post_intro_conclusion_agent.py (1)
create_generate_blog_post_intro_conclusion_agent(26-100)core/agents/generate_blog_post_section_content_agent.py (1)
create_generate_blog_post_section_content_agent(27-123)core/agents/research_link_summary_agent.py (1)
create_research_link_analysis_agent(94-123)core/agents/schemas.py (8)
BlogPostGenerationContext(207-214)BlogPostIntroConclusionGenerationContext(281-291)BlogPostSectionContentGenerationContext(247-270)GeneratedBlogPostIntroConclusionSchema(294-300)GeneratedBlogPostSectionContentSchema(273-278)PriorSectionContext(242-244)ResearchLinkContextualSummaryContext(217-223)ResearchQuestionWithAnsweredLinks(234-239)core/content_generator/utils.py (1)
get_exa_date_range_iso_strings(8-15)core/models.py (5)
GeneratedBlogPost(848-1186)GeneratedBlogPostResearchLink(1221-1253)GeneratedBlogPostResearchQuestion(1203-1218)GeneratedBlogPostSection(1189-1200)project_details(449-464)core/utils.py (2)
get_markdown_content(219-251)run_agent_synchronously(254-309)
content_generation/models.py (6)
core/agents/insert_links_agent.py (1)
create_insert_links_agent(7-127)core/agents/schemas.py (3)
GeneratedBlogPostSchema(303-313)LinkInsertionContext(391-397)ProjectPageContext(196-204)core/base_models.py (1)
BaseModel(7-19)core/choices.py (1)
OGImageStyle(87-95)core/models.py (11)
AutoSubmissionSetting(818-845)BlogPostTitleSuggestion(725-815)Project(382-722)GeneratedBlogPost(848-1186)blog_post_structure_rules(884-896)generated_blog_post_schema(899-905)submit_blog_post_to_endpoint(907-956)generate_og_image(958-1065)save(1305-1308)save(1899-1926)insert_links_into_post(1067-1186)core/utils.py (3)
get_og_image_prompt(69-96)get_relevant_external_pages_for_blog_post(599-684)get_relevant_pages_for_blog_post(517-596)
core/views.py (2)
core/models.py (5)
BlogPostTitleSuggestion(725-815)generated_blog_posts(483-484)get_internal_links(783-791)get_keywords(702-722)get_blog_post_keywords(793-801)core/utils.py (1)
get_relevant_external_pages_for_blog_post(599-684)
core/agents/schemas.py (2)
core/base_models.py (1)
BaseModel(7-19)core/models.py (1)
web_page_content(1333-1338)
core/urls.py (1)
core/views.py (1)
BlogPostResearchProcessView(983-1187)
🪛 Ruff (0.14.10)
core/content_generator/tasks.py
149-149: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
core/content_generator/pipeline.py
77-77: Avoid specifying long messages outside the exception class
(TRY003)
80-80: Avoid specifying long messages outside the exception class
(TRY003)
160-160: Avoid specifying long messages outside the exception class
(TRY003)
221-221: Avoid specifying long messages outside the exception class
(TRY003)
225-225: Avoid specifying long messages outside the exception class
(TRY003)
341-341: Avoid specifying long messages outside the exception class
(TRY003)
355-355: Avoid specifying long messages outside the exception class
(TRY003)
442-442: Avoid specifying long messages outside the exception class
(TRY003)
447-447: Avoid specifying long messages outside the exception class
(TRY003)
489-489: Avoid specifying long messages outside the exception class
(TRY003)
566-566: Avoid specifying long messages outside the exception class
(TRY003)
580-580: Avoid specifying long messages outside the exception class
(TRY003)
721-721: Avoid specifying long messages outside the exception class
(TRY003)
725-725: Avoid specifying long messages outside the exception class
(TRY003)
1008-1008: Avoid specifying long messages outside the exception class
(TRY003)
1012-1012: Avoid specifying long messages outside the exception class
(TRY003)
1044-1044: Avoid specifying long messages outside the exception class
(TRY003)
1199-1199: Avoid specifying long messages outside the exception class
(TRY003)
content_generation/models.py
79-79: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
129-129: Consider moving this statement to an else block
(TRY300)
216-216: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
229-229: Consider moving this statement to an else block
(TRY300)
239-239: Use explicit conversion flag
Replace with conversion flag
(RUF010)
248-248: Use explicit conversion flag
Replace with conversion flag
(RUF010)
261-261: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
334-334: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
core/agents/research_link_summary_agent.py
51-51: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
62-62: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
79-79: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
80-80: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
🔇 Additional comments (30)
frontend/templates/components/blog_post_suggestion_card.html (1)
167-177: LGTM!The new Research link is well-integrated with consistent styling, appropriate icon, and correct URL routing using named parameters that match the new
blog_post_research_processURL pattern. The flex container properly groups the Research and Archive actions together.content_generation/apps.py (1)
1-6: LGTM!Standard Django AppConfig following best practices with
BigAutoFieldfor scalable primary keys.core/urls.py (1)
59-63: LGTM!The new URL pattern follows the established conventions for project-scoped routes and uses consistent parameter naming that aligns with the view's
kwargsexpectations.core/agents/schemas.py (2)
17-39: LGTM!Well-structured Pydantic schemas with clear Field descriptions. The
ResearchLinkAnalysisschema properly captures the multi-faceted analysis output needed for research link processing.
217-300: LGTM!The new context and output schemas are well-designed for the content generation pipeline:
ResearchLinkContextualSummaryContextprovides complete context for link analysisResearchQuestionWithAnsweredLinkscorrectly filters to only include answered research- Section and intro/conclusion schemas properly separate generation context from output
- Appropriate use of
default_factory=listfor optional collectionsBased on learnings, these schemas align with the project's pattern of using Pydantic schemas in
core/agents/schemas.pyfor structured AI outputs.core/agents/generate_blog_post_intro_conclusion_agent.py (2)
26-43: LGTM!The agent factory function follows the pydantic-ai pattern correctly:
- Proper output and deps types
- Combined system prompts for flexibility
- Reasonable model settings (retries=2 for resilience, temperature=0.7 for creative content)
45-98: LGTM!The context builder comprehensively assembles the prompt with:
- Current date for temporal awareness
- Full project and title suggestion context
- Properly handles empty/None cases for sections with fallback text
- Clear structure for the AI to understand the content requirements
As per coding guidelines, this follows the pattern of using pydantic-ai for AI agent functionality in
core/agents/.frontend/templates/blog/blog_post_research_process.html (2)
1-51: LGTM!Good template structure with:
- Proper breadcrumb navigation
- Conditional button logic for compute links toggle based on API key availability
- Clear user messaging when Jina API key is not configured
136-161: LGTM!Proper semantic table structure with
<thead>,<tbody>, and appropriate column headers. External links open in new tabs with correct security attributes (rel="noopener noreferrer").core/views.py (4)
30-31: LGTM!Appropriate imports for the new view functionality.
Also applies to: 42-42
983-1002: LGTM!The view correctly:
- Uses
LoginRequiredMixinfor authentication- Filters queryset by user's profile for authorization
- Optimizes queries with
select_relatedandprefetch_relatedto avoid N+1 issues
1143-1176: Exception handling catches specific exceptions appropriately.The try-except blocks follow coding guidelines by catching specific exceptions (
AttributeError,TypeError,ValueError) rather than bareException. The logging includes relevant context fields (title_suggestion_id,project_id,exc_info=True) which aids debugging.One consideration: if
get_blog_post_keywords(),_get_internal_links(), or_get_external_links()raise unexpected exceptions (e.g., network errors for external API calls), they won't be caught. This may be intentional to surface unexpected failures rather than silently degrading. Verify this is the desired behavior.
1178-1187: LGTM!Context data is properly assembled with defensive defaults (
or []for links). The view exposes appropriate flags for template conditional rendering (jina_api_key_configured,should_compute_links,has_pro_subscription).core/agents/generate_blog_post_section_content_agent.py (3)
1-10: LGTM!Imports are well-organized and correctly reference the necessary schemas, choices, and prompts modules. The use of
timezonefrom Django for date formatting is appropriate.
27-44: LGTM!The agent factory function follows the established pydantic-ai pattern. Good use of:
- Default fallback to
get_default_ai_model()- Appropriate
retries=2for resilience- Reasonable
temperature=0.7for creative content generation- High
max_tokens=16000suitable for section content
46-121: Well-structured dynamic system prompt.The context assembly is thorough and provides the AI with comprehensive information for coherent section generation. The handling of empty lists with fallback text
"- (none)"and"\n(none)\n"ensures graceful degradation.core/agents/research_link_summary_agent.py (2)
55-91: LGTM!Both
create_general_research_link_summary_agentandcreate_contextual_research_link_summary_agentare well-structured with appropriate configurations. The contextual agent correctly attaches two system prompts for research context and webpage content.
94-123: LGTM!The
create_research_link_analysis_agentconsolidates three outputs (general summary, contextual summary, answer to question) into a single model call, which is efficient. The docstring clearly documents the expected outputs.core/content_generator/tasks.py (2)
22-45: LGTM!The task correctly overrides
num_results_per_questionin DEBUG mode for faster local development. Good structured logging with relevant context fields.
48-69: Good resilience pattern for scraping.The docstring clearly explains the rationale for always queuing the analysis task regardless of scrape outcome—this prevents the pipeline from stalling on bad links. The task chaining with
async_taskfollows django-q2 patterns correctly.core/models.py (4)
803-815: Clean delegation to the content generation pipeline.The
generate_contentmethod now serves as a backward-compatible wrapper that delegates to the centralized pipeline. This is a good pattern for gradual migration while preserving existing call sites.
1189-1200: LGTM!
GeneratedBlogPostSectionmodel is well-structured with appropriate fields and foreign key relationships. Theblank=True, default=""oncontentallows sections to be created before content is generated.
1203-1218: LGTM!
GeneratedBlogPostResearchQuestioncorrectly links to bothGeneratedBlogPostandGeneratedBlogPostSection, enabling section-level research tracking.
1221-1254: LGTM!
GeneratedBlogPostResearchLinkhas comprehensive fields for tracking the full lifecycle: initial data from Exa, Jina scraping augmentation, and AI analysis results. TheVectorFieldfor embedding enables semantic search capabilities.core/content_generator/pipeline.py (6)
1-10: LGTM!Imports are well-organized, covering Django utilities, external APIs (Exa), and internal agents/schemas. The use of
__future__ annotationsenables forward references.
51-57: Good use of constants for pipeline configuration.The constants for section titles, character limits, and retry settings are appropriately defined at module level, making them easy to tune and test.
110-147: Good transactional integrity for blog post creation.The use of
transaction.atomic()ensures that bothGeneratedBlogPostand allGeneratedBlogPostSectionrecords are created atomically. This prevents orphaned sections if creation fails midway.
256-303: Good validation and error handling for Exa results.The code properly handles both object and dictionary response formats from Exa, validates URL schemes and lengths, and correctly parses/localizes datetime values. The conditional queuing of scrape tasks only for links without content is efficient.
833-896: Well-designed retry mechanism with bounded retries.The retry logic for section synthesis:
- Uses cache to track retry counts (preventing unbounded retries across restarts)
- Has configurable max retries (different for DEBUG vs production)
- Only triggers when link processing is complete but sections still missing
- Logs warnings when max retries reached
This is a robust pattern for handling transient failures.
1162-1182: Clean markdown assembly logic.The
_build_full_blog_post_markdownfunction correctly assembles the final markdown with proper heading levels (H1 for title, H2 for sections) and handles empty sections gracefully.
| image_response = urlopen(image_url) | ||
| image_content = ContentFile(image_response.read()) | ||
|
|
||
| filename = f"og-image-{self.id}.png" | ||
| self.image.save(filename, image_content, save=True) |
There was a problem hiding this comment.
Validate URL scheme before opening with urlopen.
The urlopen call is flagged by static analysis (S310) because it can open arbitrary URL schemes including file://. Since image_url comes from an external API response, validate that it uses only allowed schemes.
🔎 Proposed fix
+ from urllib.parse import urlparse
+
+ parsed_url = urlparse(image_url)
+ if parsed_url.scheme not in ("http", "https"):
+ logger.error(
+ "[GenerateOGImage] Invalid URL scheme from Replicate",
+ blog_post_id=self.id,
+ project_id=self.project_id,
+ image_url=image_url,
+ scheme=parsed_url.scheme,
+ )
+ return False, f"Invalid URL scheme: {parsed_url.scheme}"
+
image_response = urlopen(image_url)
image_content = ContentFile(image_response.read())Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.14.10)
216-216: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
🤖 Prompt for AI Agents
In content_generation/models.py around lines 216 to 220, the code calls
urlopen(image_url) on an external URL without validating the scheme; update the
code to parse the image_url (e.g., with urllib.parse.urlparse), allow only
'http' and 'https' schemes, and raise/log and skip or fail fast if the scheme is
not allowed before calling urlopen; ensure any new parsing/util helpers are
imported and that invalid URLs do not reach urlopen so only safe HTTP(S)
requests are performed.
Summary by CodeRabbit
Release Notes
New Features
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.