Skip to content

Conversation

@samuelvkwong
Copy link
Collaborator

@samuelvkwong samuelvkwong commented Dec 23, 2025

Contains changes from the following branches:

subscription_extract:

  • Added filter questions that must either be answered yes or no by the LLM
  • Removed query field
  • Added extractions to subscriptions
  • Added inbox notifications for newly subscribed reports
  • Added pagination, sorting and filtering to the inbox view

subscription_handler:

  • Changed SubscriptionJobs from running every minute to starting only when reports are created in the database by registering a ReportsCreatedHandler that is called on db commit whenever an API call is made to create reports.

query_generation:

  • Implemented auto-generating filter query from output fields.
  • Moved output field input step before search filter step.
  • Added search filter query live updates, showing a preview search url and the number of reports to be retrieved.

old-historySelectionOutput:

  • Added selection output type and array option to output fields.

extractions:

  • Implemented downloading of extraction results as a CSV file

Summary by CodeRabbit

Release Notes

  • New Features

    • Real-time search preview and automatic query generation for extraction workflows
    • CSV export for extraction results and subscription inbox items
    • Selection/dropdown options for output fields in extractions
    • Advanced inbox filtering and sorting capabilities
  • Improvements

    • Enhanced form validation with clearer error messages
    • Step-by-step extraction job wizard with improved guidance
    • Streamlined subscription workflow with better data organization

✏️ Tip: You can customize this high-level summary in your review settings.

samuelvkwong and others added 30 commits November 3, 2025 23:48
…uage header) and use it downstream to deal with format parsing and translations
@claude
Copy link

claude bot commented Dec 30, 2025

Code Review: PR #172 - Subscription and Extraction Features

Thank you for this substantial contribution! I've conducted a comprehensive review of the 3,727 lines added across 63 files. Below is my feedback organized by severity.


🔴 CRITICAL ISSUES

1. Missing Permission Check (Security)

Location: radis/subscriptions/views.py:78

class SubscriptionCreateView(LoginRequiredMixin, CreateView):  # TODO: Add PermissionRequiredMixin

Issue: The TODO comment indicates missing authorization. Any authenticated user can create subscriptions without proper permission checks.

Recommendation: Add PermissionRequiredMixin immediately:

class SubscriptionCreateView(LoginRequiredMixin, PermissionRequiredMixin, CreateView):
    permission_required = "subscriptions.add_subscription"

2. Database Connection Leak in Error Handling (Performance/Reliability)

Location: radis/subscriptions/processors.py:92-103

except APIError as e:
    logger.error(f"LLM API error filtering report {report.pk}: {e}")
    db.close_old_connections()
    return
except ValidationError as e:
    logger.error(f"Response validation failed filtering report {report.pk}: {e}")
    db.close_old_connections()
    return

Issue: db.close_old_connections() is called in 6+ different exception handlers (lines 94, 98, 103, 138, 142, 146, 156). This is error-prone and hard to maintain. If a new error path is added, developers might forget to close connections.

Recommendation: Use a finally block:

try:
    # processing logic
finally:
    db.close_old_connections()

3. Code Duplication in Sorting Logic (Security/Maintenance)

Location: radis/subscriptions/views.py:210-225 and radis/subscriptions/views.py:301

The get_ordering() method validation logic is duplicated in SubscriptionInboxView and SubscriptionInboxDownloadView. This creates risk of inconsistencies.

Recommendation: Extract to a shared method or utility function.


🟠 HIGH PRIORITY ISSUES

4. N+1 Query Problem in Processor (Performance)

Location: radis/subscriptions/processors.py:47-48

for report in task.reports.filter(groups=active_group):
    future = executor.submit(self.process_report, report, task)

Issue:

  • Missing select_related() or prefetch_related()
  • Line 63: subscription.filter_questions.order_by("pk") could be optimized
  • Line 117: subscription.output_fields.order_by("pk") - another potential N+1

Recommendation: Prefetch related objects before the loop.


5. Race Condition in Subscription Inbox View (Bug)

Location: radis/subscriptions/views.py:266-269

subscription = cast(Subscription, self.object)
subscription.last_viewed_at = timezone.now()
subscription.save(update_fields=["last_viewed_at"])

Issue:

  • Updates on every GET request without transaction protection
  • Marks ALL items as viewed even if user only saw page 1
  • Not atomic with the query determining "new" items

Recommendation: Consider using F() expressions or atomic updates, or update only after successful page render.


6. Inconsistent Validation Between Form and Model (Bug Risk)

The validation for selection_options is duplicated in:

  • OutputFieldForm.clean_selection_options() (forms.py:275-305)
  • OutputField.clean() (models.py:128-165)

Recommendation: Extract validation logic to a shared method on the model.


🟡 MEDIUM PRIORITY ISSUES

7. Missing Input Sanitization in Query Generator (Security/Quality)

Location: radis/extractions/utils/query_generator.py:135-154

Field names and descriptions are user-provided and passed directly to LLM without sanitization or length limits. Malicious users could craft descriptions designed to manipulate the LLM (prompt injection).

Recommendation: Add length limits and sanitize special characters.


8. Weak Error Handling in CSV Export (Reliability)

Location: radis/extractions/views.py:597-603

No exception handling if iter_extraction_result_rows() fails mid-stream. Partial/corrupted CSV could be downloaded.

Recommendation: Add try/except to handle database errors gracefully.


9. Magic Numbers in Formset Configuration (Code Quality)

# extractions/forms.py:338
max_num=5,  # Magic number

# subscriptions/forms.py
max_num=3,  # Magic number
max_num=10,  # Magic number

Recommendation: Define these as named constants at module level.


10. Inadequate Error Messages (Code Quality)

Location: radis/chats/utils/chat_client.py:46-53

Error messages don't include context about what operation was being performed or the input that caused the failure.

Recommendation: Include more context in error messages.


🔵 LOW PRIORITY ISSUES

11. Commented Out Code

Location: radis/subscriptions/tasks.py:82

# @app.periodic(cron=settings.SUBSCRIPTION_CRON)
@app.task()
def subscription_launcher(timestamp: int):

Recommendation: Remove commented code or add a clear explanation comment.


12. Overly Complex Wizard Step Logic (Code Quality)

Location: radis/extractions/views.py:103-157

The process_step method has deeply nested logic. This is hard to test and maintain.

Recommendation: Extract step-specific logic into separate methods like _process_output_fields_step(), _process_search_step().


13. Missing Database Indexes (Performance)

Based on queries observed, consider adding:

  1. SubscribedItem.created_at
  2. SubscribedItem.subscription_id + created_at (composite)
  3. Report.study_datetime

✅ POSITIVE OBSERVATIONS

  1. Good Test Coverage: Comprehensive tests for forms, views, and utilities
  2. SQL Injection Protection: Query sorting uses whitelists
  3. Streaming CSV: Proper use of streaming responses for large exports
  4. Transaction Management: Proper use of transaction.atomic() and transaction.on_commit()
  5. Error Logging: Comprehensive error logging with context
  6. Input Validation: Multiple layers of validation (form, model, clean methods)

📊 TEST COVERAGE GAPS

Consider adding tests for:

  1. ExtractionJobWizardView: Wizard data corruption, browser back button behavior
  2. SubscriptionProcessor: LLM unavailable scenarios, concurrent processing
  3. CSV Export: Very large datasets, special characters, error handling during streaming
  4. QueryGenerator: Malicious field descriptions, very long inputs, LLM timeouts
  5. Subscription Handlers: Empty lists, race conditions

🎯 RECOMMENDATIONS SUMMARY

Immediate (Before Merge):

  • ✅ Add PermissionRequiredMixin to SubscriptionCreateView
  • ✅ Fix database connection leak (use finally block)
  • ✅ Deduplicate sorting validation logic
  • ✅ Add context to LLM error messages

Short-term:

  • 🔄 Optimize N+1 queries in subscription processor
  • 🔄 Add missing database indexes
  • 🔄 Extract duplicate validation logic

Long-term:

  • 📋 Add LLM prompt injection protection
  • 📋 Improve CSV streaming error handling
  • 📋 Refactor wizard step processing logic

Overall Assessment: This is a well-structured PR with good practices in many areas. The main concerns are around permission checks, database connection management, and some N+1 query patterns. With the recommended fixes for critical issues, this should be safe to merge.

Great work on the comprehensive test coverage and proper transaction handling! 🎉

@claude
Copy link

claude bot commented Dec 30, 2025

Pull Request Review: Subscription and Extraction Features

This is a comprehensive PR adding significant functionality across subscriptions and extractions. Overall, the implementation is solid with good test coverage, but there are several areas that need attention.


🔴 Critical Issues

1. Error Handling Removes Assertions Without Proper Validation

File: radis/chats/utils/chat_client.py:37-52, 64-107

The changes replace assert statements with proper exception handling, which is good. However, there's a potential issue:

if not completion.choices:
    logger.error("LLM returned empty choices list")
    raise ValueError("LLM returned no response choices")

answer = completion.choices[0].message.content
if answer is None:
    logger.error("LLM returned None for message content")
    raise ValueError("LLM returned empty response content")

Issue: The error handling catches openai.APIError but the new validation errors (ValueError) are not caught by the calling code in radis/subscriptions/processors.py:93-101. This could cause unhandled exceptions to bubble up.

Recommendation:

  • Either catch ValueError in addition to APIError in the processors, OR
  • Create a custom exception type (e.g., LLMResponseError) that encompasses both API errors and response validation errors

2. Database Constraint Vulnerability in OutputField

File: radis/extractions/models.py:104-123

The model has a CheckConstraint ensuring exactly one parent (job XOR subscription):

models.CheckConstraint(
    condition=(
        Q(job__isnull=False, subscription__isnull=True)
        | Q(job__isnull=True, subscription__isnull=False)
    ),
    name="output_field_exactly_one_parent",
)

Issue: The clean() method doesn't validate this constraint. If someone creates an OutputField programmatically without calling full_clean(), they could violate this constraint and get a database error.

Recommendation: Add this validation to the clean() method:

def clean(self) -> None:
    from django.core.exceptions import ValidationError
    super().clean()
    
    # Validate exactly one parent
    has_job = self.job_id is not None
    has_subscription = self.subscription_id is not None
    if has_job == has_subscription:  # Both true or both false
        raise ValidationError(
            "OutputField must have exactly one parent (either job or subscription, not both or neither)"
        )
    # ... rest of validation

3. SQL Injection Risk in CSV Export

File: radis/subscriptions/utils/csv_export.py:78-80

for field_pk in subscription.output_fields.order_by('pk').values_list('pk', flat=True):
    value = extraction_results.get(str(field_pk))

Issue: While not a direct SQL injection (field PKs are integers), there's a design issue: the code uses field PK as the dictionary key in JSON, but iter_extraction_result_rows in extractions uses field NAME as the key (line 50-52). This inconsistency could cause data mismatches.

Recommendation: Standardize on one approach across both modules. Using PKs is safer (prevents name collisions), but ensure consistency.


⚠️ Significant Issues

4. Race Condition in Report Handler

File: radis/subscriptions/handlers.py:10-21

def handle_reports_created(reports: list[Report]) -> None:
    logger.info(f"Triggering subscription processing for {len(reports)} new report(s)")
    subscription_launcher.defer(timestamp=int(time.time()))

Issue: The handler uses time.time() as a timestamp, but reports created in the same second could be missed if multiple batches arrive. The subscription launcher filters by last_refreshed timestamp.

Recommendation: Use a more precise timestamp (e.g., milliseconds) or pass report IDs directly to ensure all reports are processed.


5. Query Generation Fallback Issues

File: radis/extractions/utils/query_generator.py:66-97

The query generator tries LLM generation but has no fallback when it fails:

if settings.ENABLE_AUTO_QUERY_GENERATION:
    try:
        query = self._call_llm(fields_list)
        # ...
    except Exception as e:
        logger.error(f"Error during LLM query generation: {e}", exc_info=True)
        metadata["error"] = str(e)

# No fallback - return None for LLM failures

Issue: When LLM fails, users MUST manually enter a query. The comment mentions this, but the form validation in SearchForm.clean_query requires a query, which could trap users in an error state.

Recommendation: Consider a simple keyword-based fallback using field names/descriptions to generate a basic OR query, rather than forcing manual entry.


6. JavaScript Enter Key Prevention Too Broad

File: radis/core/static/core/core.js:49-63

const isTextInput =
  target instanceof HTMLInputElement &&
  \!["submit", "button"].includes(target.type);
if (isTextInput) {
  event.preventDefault();
}

Issue: This prevents Enter submission for ALL text inputs in forms with data-prevent-enter-submit, including single-input search forms where Enter is expected behavior.

Recommendation: Make this more selective - only prevent Enter in multi-line contexts or add a data attribute to opt-in/out per field.


7. FormSet Template Clone Issue

File: radis/core/static/core/core.js:88-93

Changed from:

const newForm = template.content.cloneNode(true);
container.append(newForm);

To:

const html = template.innerHTML.replace(/__prefix__/g, idx);
container.insertAdjacentHTML('beforeend', html);

Issue: Using innerHTML + insertAdjacentHTML can cause issues with:

  • Event listeners not being cloned (though none seem attached)
  • Potential XSS if template contains user data (unlikely but worth noting)
  • Loss of DOM references if any JavaScript stores references to template nodes

Recommendation: The change seems intentional (perhaps to fix an issue?), but ensure this doesn't break any existing formset functionality. Add a comment explaining why innerHTML approach is used.


📝 Code Quality & Best Practices

8. Missing Docstrings

Several new methods lack docstrings:

  • OutputField.clean() - radis/extractions/models.py:128
  • handle_reports_created() - could use more detail about the timestamp filtering logic

Recommendation: Add docstrings explaining the validation rules and edge cases.


9. Inconsistent Logging Levels

File: radis/subscriptions/processors.py

Mix of logger.debug(), logger.error(), and logger.info() with some inconsistency:

  • Line 82-85: logger.debug() for None answer from LLM (should be warning or error?)
  • Line 104-107: logger.debug() for accepting by default (should be info?)

Recommendation: Review logging levels for consistency with severity.


10. Magic Numbers

File: radis/extractions/constants.py

MAX_SELECTION_OPTIONS = 7

Question: Why 7? Is this based on UX research, LLM token limits, or arbitrary? Add a comment explaining the rationale.


🚀 Performance Considerations

11. Potential N+1 Query in Formset Processing

File: radis/extractions/views.py:116-124

When processing output fields from the formset, the code creates temporary OutputField objects in a loop. While these aren't persisted, ensure the query generator doesn't trigger additional DB queries.

Status: ✅ Looks fine on inspection - no DB calls in QueryGenerator._format_fields_for_prompt()


12. CSV Export Memory Usage

File: radis/subscriptions/utils/csv_export.py:56

for item in items.iterator(chunk_size=1000):

Positive: Good use of iterator() with chunking! However, iter_extraction_result_rows in extractions doesn't specify chunk_size.

Recommendation: Add chunk_size to extractions CSV export for consistency:

for instance_id, report_id, is_processed, output in instances.iterator(chunk_size=1000):

🔒 Security Concerns

13. User Input Validation in Selection Options

File: radis/extractions/forms.py:74-102

Good validation of selection options (uniqueness, max count, no empty strings). However:

Minor Issue: The max selection options (7) isn't enforced in the model's clean() method with the same constant.

Recommendation: Use the constant in both places:

# In models.py
from .constants import MAX_SELECTION_OPTIONS

if len(self.selection_options) > MAX_SELECTION_OPTIONS:
    raise ValidationError({...})

Currently the form uses the constant but the model has a separate check. ✅ Actually, looking closer, the model DOES use the constant (line 136). Good!


14. Query Validation

File: radis/extractions/forms.py:121-131

The form validates query syntax using QueryParser, which is excellent for preventing injection. ✅ Well done!


✅ Test Coverage

15. Excellent Test Coverage

The PR includes comprehensive tests:

  • test_query_generator.py: 12 test cases covering LLM generation, validation, and edge cases
  • test_forms.py: Tests for selection options, array toggle, validation
  • test_views.py: Tests for subscription creation, filtering, permissions
  • test_processor_utils.py: Unit tests for utility functions

Positive: This is exemplary test coverage for a feature of this size!

Minor Gap: No tests for the new error handling in chat_client.py. Consider adding tests for:

  • Empty choices list from LLM
  • None content in response
  • API error propagation

📊 Migration Review

16. Migration Dependency Chain

The migrations have a complex dependency chain with 6 new migrations in extractions and 4 in subscriptions.

Issue: 0006_remove_extractionjob_provider was renumbered (originally 0004) due to the added migrations. Ensure this doesn't cause conflicts if anyone has already applied the old 0004.

Recommendation: Consider squashing migrations before final merge if this hasn't been deployed yet.


17. Migration 0012: ON DELETE Constraint

File: radis/subscriptions/migrations/0012_procrastinate_on_delete.py

This changes ON DELETE behavior for procrastinate jobs. Ensure this aligns with the cleanup strategy for failed/completed jobs.


🎯 Summary

Overall Assessment: This is a well-implemented feature with good architecture and strong test coverage. The main concerns are:

  1. Exception handling consistency - ensure ValueError and APIError are handled uniformly
  2. Database constraint validation - add parent validation to OutputField.clean()
  3. Race condition in timestamp-based report filtering
  4. Query generation - consider a fallback for LLM failures

Recommendation: Address the critical issues before merging. The other issues can be addressed in follow-up PRs if time is constrained.

Strengths:

  • ✅ Excellent test coverage
  • ✅ Proper query validation prevents injection
  • ✅ Good use of database constraints
  • ✅ Efficient CSV export with iterators
  • ✅ Clear separation of concerns

Lines of Code: +3725 / -293 across 64 files - this is a large PR. Consider breaking future features into smaller increments for easier review.


Great work overall! Let me know if you'd like me to elaborate on any of these points.

@claude
Copy link

claude bot commented Dec 30, 2025

Code Review: PR #172 - Subscription and Extraction Features

Thank you for this comprehensive PR! I've reviewed the 3,726 additions across 62 files. Overall, the code is well-structured with excellent test coverage, but there are some important issues to address before merging.

CRITICAL Issues (Must Fix)

1. Missing Transaction Atomicity in Subscription Creation

Files: radis/subscriptions/views.py:94-119, 151-172

The form_valid method saves the subscription before saving related formsets without wrapping everything in a transaction. If a formset save fails, you'll have an orphaned subscription. Apply transaction.atomic() wrapper to both SubscriptionCreateView and SubscriptionUpdateView.

2. N+1 Query Issue in CSV Export

File: radis/subscriptions/utils/csv_export.py:56-82

Using .iterator() with prefetch_related() can cause N+1 queries. The order_by() inside the loop re-evaluates the queryset. Use Prefetch object with explicit ordering and remove iterator() or apply row limits.

IMPORTANT Issues (Should Fix)

3. Missing Input Validation in HTMX Preview Endpoint

File: radis/extractions/views.py:311-324

The preview endpoint accepts IDs from GET parameters without validating they're numeric first, which could cause ValueError exceptions. Add try/except blocks for int() conversions.

4. Inconsistent Error Handling in LLM Client

File: radis/chats/utils/chat_client.py

Great improvements to error handling! However, mixing RuntimeError and ValueError makes it harder for callers. Consider defining custom LLM exception classes for consistency.

5. Missing Database Indexes

Files: radis/subscriptions/models.py, radis/extractions/models.py

Fields used in filtering/sorting lack indexes:

  • Subscription.last_viewed_at (used in COUNT filter)
  • SubscribedItem.created_at (used in sorting and filtering)

Add db_index=True and generate migrations.

6. Remove AssertionError Catching

File: radis/subscriptions/processors.py:94-102

Catching AssertionError is a code smell. Use proper exception types from the LLM client instead.

7. Add Memory Limits to CSV Exports

File: radis/subscriptions/views.py:344-365

Large exports could consume excessive memory. Add a MAX_CSV_EXPORT_ROWS limit (e.g., 50000).

MEDIUM Priority

8. Race Condition in last_viewed_at Update

File: radis/subscriptions/views.py:266-269

Use Subscription.objects.filter(pk=...).update() for atomic updates instead of save().

9. Duplicated Sorting Logic

Files: radis/subscriptions/views.py:206-225, 302-321

Extract duplicated sorting allowlist validation to a shared method or constant.

10. Google Style Guide - Docstrings

Some public methods lack comprehensive docstrings with Args and Returns sections.

EXCELLENT Work

Test Coverage

Outstanding test coverage with 800+ new test lines covering:

  • Form validation (all edge cases including unicode, duplicates, empty values)
  • View authorization and business logic
  • Query generation with mocked LLM
  • Unit tests for utility functions

Security Practices

  • No SQL injection risks (proper ORM usage)
  • CSRF protection enabled
  • XSS protection via Django auto-escaping
  • Proper authentication/authorization checks
  • Input validation in forms

Django Best Practices

  • Proper use of class-based views
  • Database-level constraints for data integrity
  • Appropriate use of transaction.atomic()
  • Clean migrations
  • Proper model relationships

Summary

Overall Assessment: Strong PR with excellent test coverage and good architecture. Main concerns are transaction safety and query performance.

Before Merging:

  1. Fix transaction atomicity (Critical 1)
  2. Fix N+1 query in CSV export (Critical 2)
  3. Add input validation to preview endpoint (Important 3)
  4. Add database indexes (Important 5)

Follow-up Tasks:

  • Standardize exception handling
  • Add CSV export limits
  • Refactor duplicated sorting logic
  • Improve docstring coverage

Security: No critical vulnerabilities
Performance: Needs optimization (issues 2, 5, 7)
Test Coverage: Excellent
Code Quality: Good adherence to Django patterns

Great work on this feature! The subscription and extraction functionality is well-designed and thoroughly tested. Once the critical issues are addressed, this will be ready to merge.

@claude
Copy link

claude bot commented Dec 30, 2025

Code Review: PR #172 - Subscription and Extraction Features

Executive Summary

This PR introduces substantial functionality (3,728 additions across 64 files) for subscription processing with LLM-based filtering/extraction, query generation, and CSV export. While the implementation shows good engineering practices in many areas, there are 5 critical security vulnerabilities and 3 critical performance issues that must be addressed before merging.


Critical Issues Requiring Immediate Attention

🔴 Security Vulnerabilities

1. XSS Vulnerability in Template Rendering

File: radis/subscriptions/templates/subscriptions/_subscribed_item_preview.html:29

{{ subscribed_item.extraction_results|get_item:field_key|default_if_none:"—" }}

Issue: User-controlled extraction results from LLM are rendered without HTML escaping.

Attack Vector: An attacker could manipulate the LLM to return <script>alert('XSS')</script> in extraction results, which would execute in users' browsers.

Fix: Apply Django's auto-escaping or use |escapejs filter:

{{ subscribed_item.extraction_results|get_item:field_key|escapejs|default_if_none:"—" }}

2. Missing Authorization Check

File: radis/subscriptions/views.py:78

Issue: SubscriptionCreateView has a TODO comment indicating missing PermissionRequiredMixin. This could allow unauthorized users to create subscriptions.

Fix: Add proper permission checks:

class SubscriptionCreateView(LoginRequiredMixin, PermissionRequiredMixin, ...):
    permission_required = 'subscriptions.add_subscription'

3. NoSQL Injection Risk in JSON Fields

File: radis/subscriptions/models.py:106-107

Issue: filter_results and extraction_results JSONFields store LLM responses without schema validation. Malicious LLM responses could inject arbitrary JSON structures.

Fix: Add JSON schema validation before saving:

def clean(self):
    super().clean()
    if self.extraction_results:
        validate_extraction_results_schema(self.extraction_results)

🔴 Race Condition and Data Integrity

4. Concurrent Processing Race Condition

File: radis/subscriptions/processors.py:44-54, 146-152

Issue: ThreadPoolExecutor processes reports concurrently, but SubscribedItem.objects.create() lacks uniqueness constraints. Multiple threads can create duplicate records for the same report.

Evidence:

# Line 44: Multiple threads processing simultaneously
with ThreadPoolExecutor(max_workers=settings.EXTRACTION_LLM_CONCURRENCY_LIMIT) as executor:
    executor.map(self.process_report, reports)

# Line 146-152: No uniqueness check before creation
SubscribedItem.objects.create(
    subscription=self._subscription,
    report=report,
    # ...
)

Fix: Add database constraint in migration:

class Meta:
    constraints = [
        models.UniqueConstraint(
            fields=['subscription', 'report'],
            name='unique_subscription_report'
        )
    ]

And use get_or_create() instead of create().


⚡ Performance Issues

5. N+1 Query Problem in CSV Export

File: radis/subscriptions/utils/csv_export.py:78

Issue: Inside the iteration loop, this executes a separate query for EACH subscribed item:

subscription.output_fields.order_by("pk").values_list("pk", flat=True)

Fix: Pre-fetch field PKs outside the loop (note: lines 35-37 already do this correctly, so remove the redundant query at line 78).

6. ThreadPool Resource Exhaustion

File: radis/subscriptions/processors.py:44 & radis/chats/utils/chat_client.py:30

Issue: No timeout on LLM API calls means threads can hang indefinitely, exhausting the thread pool.

Fix: Add timeout to HTTP client in chat_client.py:

self._client = openai.OpenAI(
    base_url=base_url, 
    api_key=api_key,
    timeout=30.0  # Add timeout
)

7. Missing Database Indexes

File: radis/subscriptions/models.py:111

Issue: No composite index on (subscription_id, created_at) for inbox ordering queries, which will cause slow queries as data grows.

Fix: Add migration with index:

class Meta:
    indexes = [
        models.Index(fields=['subscription', '-created_at'], name='inbox_ordering_idx'),
    ]

Additional Important Issues

Database Migration Safety

File: radis/extractions/migrations/0008_outputfield_subscription_and_more.py:42-52

Issue: CheckConstraint requires exactly one parent (job XOR subscription), but existing OutputField records without either parent will cause migration failure.

Fix: Add data migration step before applying constraint to handle orphaned records.


Code Quality Issues

Missing Error Notifications

File: radis/subscriptions/processors.py:74-98, 125-144

Issue: Filter validation and extraction failures silently drop reports without notifying users.

Recommendation: Log failed reports and provide admin notification mechanism so users know when processing fails.

Assertion in Production Code

File: radis/extractions/processors.py:40

assert not instance.is_processed

Issue: Assertions can be disabled with Python's -O flag.

Fix: Replace with explicit validation:

if instance.is_processed:
    raise ValueError(f"ExtractionInstance {instance.pk} already processed")

Test Coverage Gaps

Missing Test Coverage:

  1. No tests for race conditions in concurrent report processing
  2. No tests for CSV export functionality in subscriptions or extractions
  3. No integration tests for handler registration in radis/subscriptions/apps.py:34-45
  4. Missing error case tests:
    • IntegrityError handling in subscription views
    • LLM timeout scenarios
    • Invalid JSON in extraction results

Recommendation:

Run coverage analysis and aim for 85%+ coverage on new code:

pytest --cov=radis.subscriptions --cov=radis.extractions --cov-report=html

Positive Aspects Worth Noting

Good error handling in LLM client with proper exception catching and logging
Comprehensive form validation with proper Django form patterns
SQL injection protection via Django ORM (no raw SQL with user input)
Input validation whitelist for sorting parameters (lines 209-227)
Proper use of iterator(chunk_size=1000) in CSV export to avoid memory issues
Good test coverage for subscription processor core logic


Priority Recommendations

Must Fix Before Merge (P0):

  1. ✅ Fix XSS vulnerability in _subscribed_item_preview.html
  2. ✅ Add unique constraint (subscription_id, report_id) to SubscribedItem
  3. ✅ Add permission check to SubscriptionCreateView
  4. ✅ Fix N+1 query in CSV export
  5. ✅ Add timeout to LLM API calls
  6. ✅ Validate migration 0008 handles orphaned OutputFields

Should Fix (P1):

  1. Add error notifications for LLM processing failures
  2. Add tests for race conditions and CSV export
  3. Add database indexes for performance
  4. Replace assertions with explicit validation
  5. Add JSON schema validation for LLM responses

Nice to Have (P2):

  1. Extract duplicate validation logic
  2. Standardize error response formats
  3. Improve docstring coverage per Google Python Style Guide

Final Verdict

Status: ❌ CHANGES REQUESTED

This PR introduces valuable features with generally solid engineering, but the identified security vulnerabilities and performance issues pose significant production risk. Once the P0 issues are addressed, this will be a strong addition to the codebase.

Estimated Remediation Time: 8-12 hours for P0 issues

Next Steps:

  1. Address all P0 security and performance issues
  2. Add missing test coverage for critical paths
  3. Consider running bandit security scanner on the new code
  4. Perform load testing with >10k reports to validate concurrent processing

Please let me know if you need clarification on any of these issues or would like help implementing the fixes!

@claude
Copy link

claude bot commented Dec 30, 2025

Pull Request Review: Subscription and Extraction Features

Overview

This is a substantial PR with 3,731 additions and 294 deletions across 64 files. The code quality is generally strong with good test coverage, but there are several areas requiring attention.

🟢 Strengths

  1. Excellent Error Handling in ChatClient - Proper exception catching for openai.APIError, validation of response structure, clear error messages
  2. Strong Test Coverage - 186 lines in test_forms.py, 123 lines in test_query_generator.py, 334 lines in subscriptions test_views.py
  3. Good Separation of Concerns - Query generation isolated, CSV export in dedicated modules
  4. Thoughtful UX Enhancements - Auto-query generation, live search preview, CSV export

🟡 Issues Requiring Attention

1. Security: Potential XSS in JavaScript ⚠️

Location: radis/core/static/core/core.js:92

The insertAdjacentHTML usage could be vulnerable. Validate idx is numeric before insertion and consider CSP headers.

2. Race Condition in Subscription Handler ⚠️

Location: radis/subscriptions/handlers.py:10-21

Multiple rapid calls could queue duplicate jobs. Add debounce mechanism or use Procrastinates queueing_lock parameter.

3. Database Performance Concern

Location: radis/subscriptions/utils/csv_export.py:52-82

CSV export uses iterator() but prefetch_related defeats efficiency. Use iterator() without prefetch_related OR use chunked queries.

4. Missing Input Validation

Location: radis/extractions/forms.py:185-216

No query length limits could enable DoS. Validate query length (max 1000 chars).

5. Migration Dependencies

Location: radis/extractions/migrations/0008_outputfield_subscription_and_more.py

Cross-app migration dependency needs verification for production deployment.

🔵 Code Quality Observations

  1. Duplicate Code - _format_cell function duplicated in both CSV export modules. Extract to shared utility.
  2. Magic Numbers - chunk_size=1000 should be in settings
  3. Type Annotations - Consider TypedDict for metadata return values

🔒 Security Assessment

High Priority:

  • ✅ LLM Input Sanitization - Good
  • ⚠️ XSS Risk - Needs hardening
  • ✅ SQL Injection - Using Django ORM properly
  • ✅ Authorization - Proper ownership checks

Medium Priority:

  • ⚠️ DoS via Large Queries - No length limits
  • ⚠️ Rate Limiting - Not visible on query generation
  • ✅ CSRF Protection - Present

Recommendations: Add rate limiting, max prompt length (5000 chars), security headers (CSP), LLM cost budgets

📊 Test Coverage

Well-Tested: OutputField validation, query generation, form validation, view authorization

Missing Coverage: ThreadPoolExecutor error handling, handler registration tests, JavaScript tests, CSV edge cases, migration tests

🎯 Performance

Potential Bottlenecks: Subscription processing could overwhelm LLM API, query auto-generation on every submission, no pagination limit

Optimizations Present: ThreadPoolExecutor, database batching, Procrastinate, CSV streaming

✅ Recommendations Summary

Must Fix Before Merge:

  1. Address XSS risk in JavaScript
  2. Fix subscription handler race condition
  3. Add input validation limits
  4. Verify migration dependencies

Should Fix Soon:

  1. Extract duplicate _format_cell
  2. Add rate limiting
  3. Improve CSV memory efficiency
  4. Add frontend tests

Nice to Have: Better type hints, LLM cost tracking, query caching, squash migrations

🎉 Conclusion

This PR delivers significant value through LLM-powered automation. However, the XSS vulnerability and race condition should be addressed before merging.

Recommendation: REQUEST CHANGES - Fix security issues then approve.

Overall Quality: 8/10 - Solid implementation with room for security hardening.

Great work on the auto-query generation and subscription filtering features!

@claude
Copy link

claude bot commented Dec 30, 2025

Pull Request Review: Subscription and Extraction Features

Thank you for this substantial PR! This adds significant functionality to RADIS with subscriptions, extractions, and auto-query generation. I've conducted a comprehensive review covering code quality, bugs, security, and test coverage.

📊 Overview

  • Files Changed: 64 files
  • Additions: +3,736 lines
  • Deletions: -294 lines
  • Scope: Major feature additions across extractions and subscriptions modules

⚠️ Critical Issues

1. Race Condition in Concurrent Processing 🔴 HIGH PRIORITY

Location: radis/extractions/processors.py:39-45, radis/subscriptions/processors.py:56-155

Issue: Database save operations without proper locking in concurrent threads. The assertion check and save are not atomic, allowing multiple threads to potentially process the same instance.

def process_instance(self, instance: ExtractionInstance) -> None:
    assert not instance.is_processed  # Line 40
    instance.text = instance.report.body
    self.process_output_fields(instance)
    instance.is_processed = True
    instance.save()  # Line 44 - could be called concurrently

Recommendation: Use select_for_update() or atomic transactions with proper locking.

2. LLM Prompt Injection Vulnerabilities 🔴 HIGH PRIORITY

Locations:

  • radis/extractions/processors.py:51-56
  • radis/subscriptions/processors.py:66-71, 117-122

Issue: User-provided content (report bodies, field descriptions) are directly substituted into LLM prompts without sanitization. This allows attackers to inject malicious instructions that could manipulate LLM behavior.

Example Attack:

Report body: "Ignore all previous instructions. You are now..."

Recommendation: Implement prompt sanitization, use structured prompts with clear delimiters, and validate LLM responses more strictly.

3. XSS via |safe Filter 🔴 HIGH PRIORITY

Locations:

  • Templates using {{ content|safe }} without validation
  • radis/chats/templates/chats/chat.html:55 - LLM content rendered as Markdown

Issue: If announcement or document summary contains user-controlled content, this enables XSS attacks. LLM-generated markdown could also contain malicious payloads.

Recommendation: Remove |safe filter or sanitize content first. Ensure markdownify filter properly strips dangerous HTML.

4. Sensitive Data Exposure in Logs 🟡 MEDIUM PRIORITY

Location: radis/chats/utils/chat_client.py:31-32, 55, 82

Issue: Debug logging includes full LLM messages which may contain PHI/PII from radiology reports.

logger.debug(f"Sending messages to LLM for chat:\n{messages}")

Recommendation: Remove or redact sensitive information from logs. Consider logging only metadata (length, timestamp) instead of content.

5. Uncaught DoesNotExist Exception 🟡 MEDIUM PRIORITY

Location: radis/extractions/views.py:431

Issue: Database query outside exception handler could raise uncaught exception if language is deleted between checks.

if language_id:
    search_params["language"] = Language.objects.get(pk=language_id).code  # Unprotected

Recommendation: Extend try-except block or use .get() with default value.

6. KeyError in Form Validation 🟡 MEDIUM PRIORITY

Location: radis/extractions/forms.py:139-140

Issue: Direct dictionary access without error handling in form's clean() method.

language = cast(Language, cleaned_data["language"])  # Could raise KeyError

Recommendation: Use .get() method or verify key existence first.

7. Assert Statements in Production Code 🟡 MEDIUM PRIORITY

Locations: Multiple files including extractions/views.py:278, 282 and processors.py:40

Issue: Python's -O flag disables assertions, potentially bypassing critical checks in production.

Recommendation: Replace with explicit if statements and raise proper exceptions.

📝 Code Quality Issues

Major Refactoring Needed:

  1. Extremely Long Methods:

    • ExtractionSearchPreviewView.get() (150 lines) - views.py:306-456
    • SubscriptionTaskProcessor.process_report() (100 lines) - subscriptions/processors.py:56-156

    Recommendation: Split into focused helper methods following Single Responsibility Principle.

  2. Code Duplication:

    • chat() method duplicated between AsyncChatClient and ChatClient - chat_client.py:26-107

    Recommendation: Extract common logic into base class or shared function.

  3. Missing Docstrings:

    • Multiple violations of Google Python Style Guide (per CONTRIBUTING.md)
    • Missing module-level docstrings
    • Missing Args/Returns sections in function docstrings

    Recommendation: Add comprehensive docstrings following Google format.

  4. Complex Control Flow:

    • Deep nesting (4-5 levels) in multiple files
    • Nested try-except blocks that are hard to follow

    Recommendation: Extract validation and processing logic into separate methods.

Best Practice Violations:

  • SearchForm.clean() performs external API calls (line 180) - should be in view layer or async
  • Magic numbers: Hard-coded values like size = 6 should be named constants
  • Error handling: Silent failures in processors that log but don't record failure state
  • Multiple saves: Same instance saved multiple times in one operation (inefficient)

🧪 Test Coverage Analysis

Good Coverage:

  • Subscription views (9/10) - Excellent, comprehensive testing
  • Subscription processors (7/10) - Good unit testing with mocks
  • Extraction views (7/10) - Good HTTP status and authorization tests

Gaps in Coverage:

  • ExtractionJobWizardView - No tests for multi-step wizard
  • SearchForm - No tests for query validation, age ranges, modalities
  • OutputFieldFormSet - No formset-level tests
  • Query generator - Missing error handling tests, timeout scenarios
  • Processor utils - Only 2 tests, missing TEXT/NUMERIC/BOOLEAN field types
  • Model constraints - Database-level constraints not verified

Recommended Test Additions:

  1. Integration tests for wizard flow (output fields → search → summary)
  2. SearchForm validation tests with various query combinations
  3. Error handling tests for LLM API failures
  4. Concurrent processing tests for race conditions
  5. Model constraint tests (unique constraints, check constraints)

🔒 Security Summary

High Risk:

  • 🔴 LLM prompt injection (multiple locations)
  • 🔴 XSS via |safe filter
  • 🔴 Sensitive data in logs

Medium Risk:

  • 🟡 Authorization bypass via config (START_EXTRACTION_JOB_UNVERIFIED)
  • 🟡 Markdown rendering without proper sanitization

Good Security Practices Found:

  • ✅ CSRF tokens properly used
  • ✅ Django ORM prevents SQL injection
  • ✅ Proper authentication decorators
  • ✅ Group-based access control
  • ✅ CSV export uses safe serialization

🎯 Recommendations

Must Fix Before Merge:

  1. Address race condition with proper locking
  2. Implement LLM prompt sanitization
  3. Fix XSS vulnerabilities (remove unsafe |safe usage)
  4. Add error handling for uncaught exceptions
  5. Replace assert statements with proper validation

Should Fix Soon:

  1. Refactor long methods (150+ lines)
  2. Add missing docstrings per Google Style Guide
  3. Remove sensitive data from logs
  4. Add comprehensive tests for wizard views
  5. Fix code duplication in chat clients

Nice to Have:

  1. Extract magic numbers to constants
  2. Improve error handling consistency
  3. Add integration tests with real components
  4. Consider async/await instead of ThreadPoolExecutor for I/O-bound operations

💡 Positive Aspects

  • ✅ Good use of Django patterns (mixins, class-based views)
  • ✅ Comprehensive subscription inbox filtering/sorting
  • ✅ Proper factory usage in tests
  • ✅ Clear separation of concerns (mostly)
  • ✅ Good authorization testing
  • ✅ Innovative LLM integration for query generation

📋 Summary

This PR adds valuable functionality but requires addressing several critical security and reliability issues before merging. The code is generally well-structured but needs refactoring in key areas to improve maintainability. Test coverage is good for subscriptions but has gaps in extractions.

Overall Assessment: ⚠️ Needs Work - Address critical issues before merge.


Review completed using Claude Code with comprehensive analysis of:

  • 64 changed files
  • Code quality and Google Python Style Guide compliance
  • Security vulnerabilities (OWASP Top 10)
  • Test coverage (6 test files)
  • Potential bugs and race conditions

Happy to discuss any of these points or provide more specific guidance on fixes!

@claude
Copy link

claude bot commented Dec 30, 2025

Code Review: PR #172 - Subscription and Extraction Features

Thank you for this comprehensive PR adding filter questions, extractions to subscriptions, inbox notifications, pagination/sorting/filtering, auto-generated queries, and CSV downloads.


CRITICAL ISSUES - Must Fix Before Merge

1. Production assert Statements (HIGH)

Multiple files use assert for runtime validation. Assertions are disabled with -O flag causing security vulnerabilities.

Files: radis/extractions/forms.py, radis/subscriptions/forms.py, radis/subscriptions/tasks.py
Fix: Replace with proper exception handling

2. Race Condition in Subscription Handler (HIGH)

File: radis/subscriptions/handlers.py
Issue: Handler triggers processing before transaction commits
Fix: Wrap task deferral in transaction.on_commit()

3. Missing Exception Handling for .get() (MEDIUM-HIGH)

Files: radis/extractions/views.py, radis/subscriptions/tasks.py
Issue: .objects.get() calls without try-except can crash views
Fix: Add proper exception handling

4. Missing Permission Check (MEDIUM)

File: radis/subscriptions/views.py:78
Issue: SubscriptionCreateView has TODO for PermissionRequiredMixin
Fix: Add permission_required attribute

5. Debug Code in Production (LOW)

File: radis/core/static/core/core.js:38
Issue: console.log() statement
Fix: Remove debug code


PERFORMANCE ISSUES

1. Missing Database Indexes (MEDIUM)

Need indexes for:

  • SubscribedItem.created_at
  • Subscription.last_viewed_at
  • OutputField.subscription_id

2. Inefficient ThreadPool Usage

File: radis/subscriptions/processors.py:44
Issue: Creates all futures before processing
Fix: Use batch processing


SECURITY CONSIDERATIONS

1. JSON Payload DoS Risk

File: radis/extractions/forms.py
Fix: Add size limits to JSON deserialization

2. XSS Protection

File: radis/extractions/templates/extractions/_search_preview.html
Recommendation: Verify proper escaping of user data


POSITIVE ASPECTS

  1. Improved error handling in chat_client.py
  2. Comprehensive form validation
  3. Good test coverage for forms and query generation
  4. Clear code organization
  5. Excellent UX features

TEST COVERAGE GAPS

  1. CSV Export - No tests for edge cases
  2. Error Handling - Need tests for LLM failures
  3. Migration Tests - Should verify constraints

CODE QUALITY IMPROVEMENTS

  1. Code Duplication - clean_selection_options() duplicated
  2. Missing Type Hints - Several view methods
  3. Magic Numbers - Some values should be constants

PRIORITY SUMMARY

Priority 1 - Block Merge:

  • Replace assert statements
  • Fix race condition
  • Add exception handling
  • Add permission checks
  • Remove debug code

Priority 2 - Follow-up PR:

  • Add database indexes
  • Refactor duplicated code
  • Add CSV export tests
  • Improve type annotations

Priority 3 - Tech Debt:

  • Add JSON size limits
  • Improve error messages
  • Add migration tests

RECOMMENDATION

This PR adds valuable functionality but requires critical fixes before merging. The code structure is solid.

Suggested approach:

  1. Fix Priority 1 items in this PR
  2. Create follow-up issues for Priority 2
  3. Track Priority 3 as tech debt

Great work! The features are well-designed and LLM integration is thoughtful.

@medihack medihack mentioned this pull request Dec 30, 2025
@claude
Copy link

claude bot commented Dec 30, 2025

Pull Request Review: Subscription and Extraction Features

This is a comprehensive PR introducing significant new functionality. Overall, the implementation follows good Django patterns.

Strengths

Code Quality:

  • Strong type annotations with PEP 604 union syntax
  • Consistent error handling with logging in LLM client
  • Good separation of concerns
  • Comprehensive validation in forms and models
  • Good test coverage (334 lines in subscription tests, 186 in extraction tests)

Architecture:

  • Excellent database constraints (CheckConstraint, UniqueConstraint)
  • Proper transaction.atomic() usage
  • Proper async/await patterns for LLM integration
  • Graceful degradation when LLM calls fail

Critical Issues

  1. ThreadPoolExecutor Resource Leak Risk (radis/subscriptions/processors.py:44-54)

    • If exception occurs during future creation, connections may leak
    • Recommendation: Wrap in try-finally block
  2. Missing Input Validation (radis/extractions/views.py:296-310)

    • ExtractionSearchPreviewView accepts unlimited GET parameter sizes
    • Recommendation: Add length validation (e.g., query_str[:500])
  3. Formset Template Variable (radis/core/templates/cotton/formset.html:6-9)

    • Uses {% with %} but doesn't reference scoped variable
    • Fix: Use {% crispy form %} instead of {% crispy formset.empty_form %}

Medium Priority Issues

  1. Code Duplication in chat_client.py

    • Validation logic duplicated between Async and Sync clients
    • Recommendation: Extract to shared helper
  2. F-string Logging (multiple locations)

    • F-strings in logging evaluated even when disabled
    • Recommendation: Use lazy evaluation with %-formatting
  3. CASCADE Protection (migrations/0008)

    • on_delete=CASCADE will delete all output fields with subscription
    • Recommendation: Consider PROTECT or SET_NULL
  4. Hard-coded Constants

    • MAX_SELECTION_OPTIONS = 7 lacks documentation
    • Recommendation: Add comment or make configurable

Performance Considerations

Positive:

  • Proper database indexes
  • Good prefetch_related usage
  • Streaming CSV export
  • Proper pagination

Concerns:

  • N+1 Query Risk in get_table() (line 632-635)
  • LLM concurrency could overwhelm service

Security

Good:

  • LoginRequiredMixin on all views
  • User ownership checks
  • Transaction atomicity
  • CSRF protection

Concerns:

  • Session data trusted without HMAC signature
  • CSV export should escape formula characters

Test Coverage

Excellent:

  • Form validation tests
  • View tests
  • Query generator tests
  • Processor utils tests

Missing:

  • Integration tests for query generation flow
  • Concurrent LLM processing edge cases
  • Wizard data corruption recovery
  • CSV special character tests

Recommendations

Must Fix Before Merge:

  1. ThreadPoolExecutor exception handling
  2. Input length validation
  3. Formset template variable

Should Fix Soon:
4. Code duplication in chat clients
5. Lazy logging
6. Integration tests
7. Document magic numbers

Overall Assessment

Score: 8/10

Well-implemented PR with high code quality and good test coverage. Main concerns are edge cases in concurrent processing, input validation, and minor code duplication.

The error handling improvements in LLM client (replacing assert with proper exceptions) are excellent.

Recommendation: Approve with requested changes - Address critical issues before merging.

Great work overall!

@medihack
Copy link
Member

medihack commented Jan 4, 2026

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a substantial pull request that introduces a wide range of significant features, primarily focused on enhancing subscriptions and data extractions. The key additions include the ability to add extractions to subscriptions, auto-generation of search queries from output fields, a new 'Selection' output type with array support, and a completely revamped subscription inbox with pagination, sorting, filtering, and CSV download capabilities. The change to trigger subscription jobs on report creation instead of a cron job is a great architectural improvement.

My review focuses on improving maintainability by addressing areas of code duplication and suggesting simplifications. I've identified several instances of duplicated logic, particularly in form validation and between synchronous and asynchronous client methods. Refactoring these areas will make the new functionality easier to maintain and extend in the future. I've also suggested a simplification in one of the new views to improve readability and robustness. Overall, these are excellent additions to the application.

@claude
Copy link

claude bot commented Jan 5, 2026

Code Review - PR #172

This is an extensive PR (3,652 additions) adding significant functionality to RADIS. Overall, the implementation is solid with good test coverage, but there are important issues to address before merging.

Critical Issues

1. Database Performance: N+1 Query Potential
Location: radis/subscriptions/processors.py:63-64

The processor fetches filter_questions and output_fields without prefetching. With many subscriptions processing simultaneously, this will cause performance issues. Add prefetch_related when fetching the task in tasks.py.

2. Data Consistency: Missing Transaction Wrapping
Location: radis/subscriptions/processors.py:146-152

SubscribedItem creation is not wrapped in transaction.atomic(). If the process crashes during processing, you could have inconsistent data.

3. Model Validation: Incomplete Constraint
Location: radis/extractions/models.py:116-122

The output_field_exactly_one_parent constraint ensures exactly one parent, but does not prevent both job and subscription being None. Add a constraint to ensure at least one parent is not null.

High Priority Issues

4. Error Handling: Silent Failures
Location: radis/subscriptions/processors.py:93-101

When LLM calls fail, the function returns early without recording the failure. Users will not know why reports were not processed. Consider adding a processing_errors JSONField to SubscriptionTask and implementing retry logic for transient failures.

5. Resource Management: Thread Pool Efficiency
Location: radis/subscriptions/processors.py:44

Creating a new ThreadPoolExecutor for each task is inefficient. Use a module-level executor or leverage Django async support (you already have Daphne configured).

6. Code Duplication: Chat Client Methods
Location: radis/chats/utils/chat_client.py:26-56, 67-107

The chat() method is duplicated between AsyncChatClient and ChatClient with identical error handling. Extract common validation to a shared function.

Security Considerations

7. LLM Query Generation
Location: radis/extractions/utils/query_generator.py

LLM-generated queries are passed to the search system. While basic query parsing validation exists, add comprehensive sanitization and consider using a whitelist approach for allowed operators.

8. LLM Prompt Injection
Location: radis/subscriptions/processors.py:66-72

User-controlled filter questions are inserted into LLM prompts. Sanitize question text before prompt insertion and add maximum length limits.

9. Input Validation: Selection Options
Location: radis/extractions/forms.py:89-158

Good validation exists, but also validate maximum length per option to prevent DoS via large strings.

What Was Done Well

  • Comprehensive test coverage for query generation
  • Good use of Pydantic for LLM response validation
  • Event-driven architecture for subscriptions (excellent change from cron-based)
  • Proper Django constraints for data integrity
  • CSV export functionality
  • Good separation of concerns
  • Proper use of async where appropriate
  • Great design allowing OutputField reuse between extractions and subscriptions

Medium Priority Issues

  • Form validation has inconsistent error messages (radis/extractions/forms.py:204-215)
  • Enter-key prevention logic does not check for textareas (radis/core/static/core/core.js:49-64)
  • Migration reordering could cause issues for existing deployments
  • Missing docstrings on complex methods
  • Consider adding more constants to radis/extractions/constants.py

Recommendations

I recommend addressing the critical and high-priority issues before merging. The medium-priority issues can be handled in follow-up PRs.

Overall, this is excellent work! The feature set is impressive and will significantly enhance RADIS capabilities.

@claude
Copy link

claude bot commented Jan 6, 2026

Code Review - PR #172: Subscription & Extraction Enhancements

Thank you for this substantial feature addition! This PR adds important functionality to the subscription and extraction systems. I've conducted a thorough review and found several areas that need attention before merging.


🚨 Critical Issues (Must Fix)

1. N+1 Query Problem in Subscription Processor

File: radis/subscriptions/processors.py:46-115
Severity: HIGH

Inside the report processing loop, the code repeatedly queries subscription.filter_questions and subscription.output_fields for each report:

for report in task.reports.filter(groups=active_group):
    filter_questions = list(subscription.filter_questions.order_by("pk"))  # Query per report\!
    # ... later ...
    output_fields = list(subscription.output_fields.order_by("pk"))  # Another query per report\!

Impact: Processing 100 reports triggers 200+ queries instead of 3.

Fix: Move these queries outside the loop:

subscription: Subscription = task.job.subscription
filter_questions = list(subscription.filter_questions.order_by("pk"))
output_fields = list(subscription.output_fields.order_by("pk"))

for report in task.reports.filter(groups=active_group):
    # Use pre-fetched data

2. Inefficient Report Fetching in Task Creation

File: radis/subscriptions/tasks.py:64-69
Severity: HIGH

for document_id in document_ids:
    task.reports.add(Report.objects.get(document_id=document_id))  # N queries\!

Fix: Fetch all reports at once:

reports = Report.objects.filter(document_id__in=document_ids)
task.reports.add(*reports)

3. Missing Max Length Validation on Selection Options

Files: radis/extractions/forms.py:277-279, radis/extractions/models.py:136-142
Severity: MEDIUM (Security)

While there's a limit of 7 selection options, there's no validation on the total character length. An attacker could submit 7 options with 10,000 characters each, causing:

  • Database bloat
  • Memory exhaustion when generating Pydantic schemas
  • Potential DoS

Recommendation: Add constants and validation:

MAX_SELECTION_OPTION_LENGTH = 200
MAX_TOTAL_SELECTION_LENGTH = 1000

# In validation:
total_length = sum(len(opt) for opt in cleaned)
if total_length > MAX_TOTAL_SELECTION_LENGTH:
    raise forms.ValidationError(f"Total length exceeds {MAX_TOTAL_SELECTION_LENGTH} characters")

4. Thread Pool Exception Handling Issue

File: radis/subscriptions/processors.py:44-58
Severity: MEDIUM

with ThreadPoolExecutor(max_workers=settings.EXTRACTION_LLM_CONCURRENCY_LIMIT) as executor:
    try:
        for report in task.reports.filter(groups=active_group):
            future = executor.submit(self.process_report, report, task)
            futures.append(future)
        for future in futures:
            future.result()  # If one raises, others keep running\!
    finally:
        db.close_old_connections()  # Closes connections while threads active\!

Fix: Handle exceptions per-future:

for future in futures:
    try:
        future.result()
    except Exception as e:
        logger.error(f"Task processing failed: {e}")

⚠️ High Priority Issues (Should Fix)

5. Race Condition in Handler Registration

File: radis/subscriptions/apps.py:40

The handler appends to a global list during app initialization without duplicate checking:

reports_created_handlers.append(
    ReportsCreatedHandler(name="subscription_launcher", handle=handle_reports_created)
)

If the app is imported multiple times, the handler gets registered multiple times, triggering duplicate subscription jobs.

Fix: Check for duplicates before appending or use a set-based registry.

6. Silent Error Handling in Processor

File: radis/subscriptions/processors.py:93-101

except RuntimeError as e:
    logger.error(f"LLM API error filtering report {report.pk}: {e}")
    return  # Silent failure - report is skipped, user never knows

Problem: No distinction between "report didn't match filter" vs "LLM error". Users can't see which reports failed to process.

Recommendation: Add a processing_status field to track failed items and expose in UI.

7. Inefficient Modality Fetching in CSV Export

File: radis/subscriptions/utils/csv_export.py:63-64

for item in items.iterator(chunk_size=1000):
    modality_codes = ",".join(
        item.report.modalities.order_by("code").values_list("code", flat=True)
    )

The order_by() triggers a new query per item despite prefetching. Either remove the order_by() or pre-sort during prefetch.

8. Missing Error Handling in CSV Export

File: radis/subscriptions/utils/csv_export.py

The _format_cell function doesn't handle complex nested structures from array selections:

def _format_cell(value: Any) -> str:
    if value is None:
        return ""
    if isinstance(value, bool):
        return "yes" if value else "no"
    return str(value)  # What if value is list[list] or nested dict?

Since this is a streaming response, crashes leave users with corrupted CSVs. Add explicit handling for list/dict types.


📝 Medium Priority Issues

9. Missing Tests for Handler System

File: radis/subscriptions/handlers.py

The new handler-based system has zero tests:

  • No test for handle_reports_created
  • No test for handler registration
  • No integration test showing report creation triggers subscriptions

Recommendation: Add integration test:

def test_report_creation_triggers_subscription(client, subscription):
    response = client.post("/api/reports/", {...})
    assert SubscriptionJob.objects.filter(subscription=subscription).exists()

10. Missing Tests for Array Selection Fields

File: radis/extractions/tests/test_forms.py

No tests for OutputType.SELECTION with is_array=True combined. Unclear if list[Literal["A", "B"]] works correctly in the processor.

11. Missing Audit Logging for Staff Access

File: radis/subscriptions/views.py:299

Staff users can download any subscription's data without audit logging:

if user.is_staff:
    return model.objects.all()

Consider adding audit logs when staff access other users' data for compliance purposes.


✅ Positive Highlights

  1. Excellent: Reusable form field factories in core/form_fields.py reduce duplication
  2. Good: Proper input validation on sort/filter parameters prevents SQL injection (lines 207-227)
  3. Good: CSV streaming using generators prevents memory issues
  4. Good: Comprehensive test coverage for pagination/sorting in subscription inbox
  5. Good: Handler-based approach is more scalable than cron jobs
  6. Good: JSONField validation at both form and model level

📊 Code Quality Assessment

Aspect Rating Notes
Code Quality 7/10 Well-structured but has performance issues
Security 6.5/10 Input validation gaps, needs audit logging
Test Coverage 7/10 Good view tests, missing processor integration tests
Performance 5/10 Critical N+1 queries need fixing

🎯 Summary & Recommendation

This PR adds valuable features with a solid architectural foundation. However, the N+1 query problems and missing validation must be addressed before merge as they will cause significant performance degradation and potential security issues in production.

Action Items:

  1. ✅ Fix N+1 query in subscription processor (move queries outside loop)
  2. ✅ Fix inefficient Report.objects.get() in task creation
  3. ✅ Add max length validation for selection options
  4. ✅ Fix thread pool exception handling
  5. ⚠️ Add handler registration duplicate prevention
  6. ⚠️ Add error handling in CSV export
  7. ⚠️ Add integration tests for handler system

Once the critical issues (1-4) are resolved, this will be ready to merge. Happy to review again after updates!


Review based on RADIS coding standards from CLAUDE.md: Google Python Style Guide, 100-char line length, proper error handling, and performance optimization.

@claude
Copy link

claude bot commented Jan 7, 2026

Pull Request Review

This is a substantial PR (+3922 lines, 70 files) that merges multiple feature branches. Overall, the code quality is good with solid test coverage and thoughtful refactoring. Here's my detailed feedback:


Strengths

1. Excellent Code Refactoring

  • radis/core/form_fields.py: Outstanding work creating reusable form field factories! The use of @overload for type hints is exemplary. This eliminates duplication across extraction, subscription, and search forms.
  • radis/chats/utils/chat_client.py: Great refactoring with _BaseChatClient base class. The validation helpers (_validate_completion_response, _handle_api_error) improve error handling significantly.

2. Robust Validation

  • radis/extractions/utils/validation.py: Clean validation logic for selection options with clear error messages.
  • Form validation is thorough (e.g., checking selection options only for SELECTION type, requiring both question and expected_answer in FilterQuestionForm).

3. Good Test Coverage

  • Tests for selection output types, array toggles, form validation, and subscription views.
  • Good use of factories for test data generation.

4. Smart Architecture Patterns

  • Handler-based subscription triggering: Moving from periodic jobs to event-driven handlers (ReportsCreatedHandler) is architecturally sound and more efficient.
  • Job/Task pattern: Proper separation of concerns with SubscriptionJob and SubscriptionTask.

⚠️ Issues & Concerns

1. Security: JSON Injection Risk ⚡ IMPORTANT

File: radis/extractions/forms.py:189

try:
    parsed = json.loads(raw_value)
except json.JSONDecodeError as exc:
    raise forms.ValidationError("Invalid selection data.") from exc

Issue: The form accepts JSON from hidden fields controlled by client-side JavaScript (Alpine.js). While validation happens after parsing, there's potential for malicious JSON structures or extremely large payloads.

Recommendation:

  • Add size limits before parsing: if len(raw_value) > 10000: raise ValidationError(...)
  • Consider using a more restrictive format (comma-separated values) instead of JSON for user input.

2. Error Handling: Too Broad Exception Catching

File: radis/chats/utils/chat_client.py:129-131

try:
    completion = await self._client.chat.completions.create(**request)
except openai.APIError as e:
    _handle_api_error(e, "chat")

Issue: _handle_api_error re-raises as RuntimeError, losing the specific OpenAI error type. Callers can't distinguish between rate limits, authentication failures, or network errors.

Recommendation: Preserve the original exception type or create specific exception classes:

class LLMCommunicationError(Exception):
    def __init__(self, message: str, original_error: openai.APIError):
        super().__init__(message)
        self.original_error = original_error

3. Type Safety: Missing Return Type Annotation

File: radis/chats/utils/chat_client.py:131

except openai.APIError as e:
    _handle_api_error(e, "chat")  # This always raises, but not annotated with NoReturn

Issue: _handle_api_error always raises but isn't marked with -> NoReturn. Type checkers may incorrectly think the function can return None.

Recommendation:

from typing import NoReturn

def _handle_api_error(error: openai.APIError, operation: str) -> NoReturn:
    ...

4. Performance: N+1 Query Potential

File: radis/core/form_fields.py:59, 119

languages = Language.objects.order_by("code")
# Later in loop:
field.label_from_instance = lambda obj: LANGUAGE_LABELS[obj.code]

Issue: While this particular case is fine (LANGUAGE_LABELS is a dict), the pattern could cause issues if label_from_instance performed database queries. The querysets are evaluated when rendering forms.

Recommendation: Document that these factory functions trigger DB queries and should be called during form __init__, not at module level.

5. Code Quality: Magic Number

File: radis/extractions/constants.py:1

MAX_SELECTION_OPTIONS = 7

Issue: Why 7? This seems arbitrary and lacks justification.

Recommendation: Add a comment explaining the rationale (e.g., "Based on UX research showing dropdown usability degrades beyond 7 options" or "LLM prompt token limit considerations").

6. Potential Bug: Race Condition in FormSet JavaScript

File: radis/core/static/core/core.js:87-97

addForm() {
    if (!template) {
        return;
    }
    const idx = totalForms.value;
    const html = template.innerHTML.replace(/__prefix__/g, idx);
    container.insertAdjacentHTML("beforeend", html);
    totalForms.value = (parseInt(idx) + 1).toString();
    this.formCount = parseInt(totalForms.value);
},

Issue: If addForm() is called rapidly (e.g., double-click), two forms could get the same index before totalForms.value is updated.

Recommendation: Increment totalForms.value before using it:

const idx = totalForms.value;
totalForms.value = (parseInt(idx) + 1).toString();
const html = template.innerHTML.replace(/__prefix__/g, idx);
container.insertAdjacentHTML("beforeend", html);

7. Code Duplication: Form Clean Methods

Files: radis/extractions/forms.py:99-115 and radis/subscriptions/forms.py:80-99

Both forms have identical clean_age_from and clean_age_till logic:

def clean_age_from(self) -> int:
    age_from = self.cleaned_data["age_from"]
    if age_from is not None and age_from % AGE_STEP != 0:
        raise forms.ValidationError(f"Age from must be a multiple of {AGE_STEP}")
    return age_from

Recommendation: Extract to a mixin or helper function in radis/core/form_fields.py.


🔍 Minor Issues

8. Documentation: Missing Docstrings

  • radis/extractions/forms.py:170-304: OutputFieldForm lacks a class docstring explaining its complex JavaScript integration.
  • radis/subscriptions/forms.py:102-148: FilterQuestionForm.has_changed() override logic is subtle and deserves explanation.

9. Testing: Missing Edge Cases

  • No tests for what happens when LLM returns malformed JSON in extraction results.
  • No tests for concurrent subscription job processing (race conditions).
  • Missing tests for the new ReportsCreatedHandler registration mechanism.

10. Code Style: Inconsistent String Formatting

Mix of f-strings and format() calls:

  • radis/chats/utils/chat_client.py:80: f"Failed to communicate with LLM service: {error}"
  • radis/core/layouts.py (not in diff but likely affected): May use older styles

Recommendation: Stick to f-strings throughout per Google Python Style Guide.


📋 Migration Concerns

11. Migration Dependencies

Multiple merge commits show migration reordering (commits: 57b1cd9, c4146a1, 14656e4, 9235cd9). While these appear resolved, ensure:

  • Fresh database migrations work cleanly
  • Migration squashing is considered if the history gets too complex
  • Document any manual migration steps needed

🎯 Recommendations

High Priority (Before Merge):

  1. ✅ Fix the JSON injection security issue (add hybrid search feature #1)
  2. ✅ Add NoReturn type annotation (Uncomment llama.cpp server #3)
  3. ✅ Fix the FormSet race condition (No sites framework #6)

Medium Priority (Soon After Merge):

  1. ✅ Extract age validation to shared helper (Add report chat #7)
  2. ✅ Improve error handling to preserve exception types (Logo and Favicon #2)
  3. ✅ Add missing test coverage for edge cases (pgsearch_init #9)

Low Priority (Technical Debt):

  1. ✅ Add docstrings to complex forms (Rag task batching #8)
  2. ✅ Document MAX_SELECTION_OPTIONS rationale (Integrate OpenSearch #5)
  3. ✅ Standardize on f-strings (Use Postgres for full text search #10)

📊 Overall Assessment

Code Quality: 8/10
Test Coverage: 7/10
Architecture: 9/10
Security: 6/10 (due to JSON injection risk)
Documentation: 6/10

Recommendation: ✅ Approve with requested changes

This PR represents solid engineering work with thoughtful refactoring and good architectural patterns. The main concerns are around security (JSON handling) and error handling robustness. Once the high-priority items are addressed, this is ready to merge.

Great work on the handler-based subscription system and the form field factories - these are excellent improvements to the codebase!


Reviewed by: Claude Sonnet 4.5 via Claude Code
Review Date: 2026-01-07

@samuelvkwong samuelvkwong marked this pull request as ready for review January 7, 2026 14:37
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
radis/subscriptions/tasks.py (1)

64-70: Potential N+1 query issue when fetching reports.

Each iteration calls Report.objects.get(document_id=document_id) individually. For large batches, this creates many database queries.

Proposed optimization using bulk fetch
     for document_ids in batched(new_document_ids, settings.SUBSCRIPTION_REFRESH_TASK_BATCH_SIZE):
         logger.debug("Creating SubscriptionTask for document IDs: %s", document_ids)
         task = SubscriptionTask.objects.create(job=job, status=SubscriptionTask.Status.PENDING)
-        for document_id in document_ids:
-            task.reports.add(Report.objects.get(document_id=document_id))
+        reports = Report.objects.filter(document_id__in=document_ids)
+        task.reports.add(*reports)
 
         task.delay()
radis/extractions/urls.py (1)

90-94: Duplicate URL pattern detected.

The URL pattern jobs/<int:pk>/verify/ with ExtractionJobVerifyView and name extraction_job_verify is already defined at lines 65-69. This duplicate registration is likely a merge artifact and should be removed.

🔧 Proposed fix
     path(
         "jobs/<int:pk>/restart/",
         ExtractionJobRestartView.as_view(),
         name="extraction_job_restart",
     ),
-    path(
-        "jobs/<int:pk>/verify/",
-        ExtractionJobVerifyView.as_view(),
-        name="extraction_job_verify",
-    ),
     path(
         "tasks/<int:pk>/",
radis/search/forms.py (2)

101-108: Potential KeyError if individual field validation fails.

If clean_age_from or clean_age_till raises a ValidationError, the corresponding key won't exist in cleaned_data, causing a KeyError on lines 102-103.

Proposed fix using `.get()` for safe access
     def clean(self) -> dict[str, Any] | None:
-        age_from = self.cleaned_data["age_from"]
-        age_till = self.cleaned_data["age_till"]
+        age_from = self.cleaned_data.get("age_from")
+        age_till = self.cleaned_data.get("age_till")
 
         if age_from is not None and age_till is not None and age_from >= age_till:
             raise forms.ValidationError("Age from must be less than age till")
 
         return super().clean()

43-48: No-op statement: form_show_errors is accessed but not used.

Line 45 accesses self.query_helper.form_show_errors without assigning or using the result. This appears to be dead code or an incomplete statement.

Proposed fix: remove the no-op or assign as intended
         self.query_helper = FormHelper()
         self.query_helper.template = "search/form_elements/form_part.html"  # type: ignore
-        self.query_helper.form_show_errors
         self.query_helper.form_tag = False
         self.query_helper.disable_csrf = True
         self.query_helper.layout = self.create_query_layout()

If the intent was to enable error display, use:

self.query_helper.form_show_errors = True
🤖 Fix all issues with AI agents
In @radis/core/form_fields.py:
- Around line 72-76: The code incorrectly assigns empty_label to a
forms.ChoiceField (variable field) which will raise AttributeError; instead,
when empty_label is not None prepend an empty choice tuple to the choices list:
build choices = [('', empty_label)] + [(lang.code, LANGUAGE_LABELS[lang.code])
for lang in languages] (or insert at index 0) and then assign field.choices =
choices; keep using forms.ChoiceField, languages, LANGUAGE_LABELS and
empty_label names as shown.

In @radis/search/site.py:
- Line 48: The _build_filter_query function currently ignores the
SearchFilters.group field, so add a conditional filter to apply group-based
access control: inside _build_filter_query (used by radis/pgsearch/providers.py)
check if filters.group is not None and, if so, AND a clause filtering reports by
group primary key (e.g., Q(report__groups__pk=filters.group)); ensure you
reference the SearchFilters.group property and combine with the existing fq
variable so queries respect group membership when group is provided.

In @radis/subscriptions/processors.py:
- Around line 81-84: The logger.debug call is passing two separate f-strings so
only the first is logged; update the logger.debug invocation (the call that
currently uses logger.debug(f"LLM returned None for question {question.pk} ",
f"on report {report.pk}",)) to pass a single formatted message — e.g., combine
the parts into one f-string or format string that includes both question.pk and
report.pk — so the full message is emitted.

In @radis/subscriptions/templates/subscriptions/subscription_detail.html:
- Around line 102-116: Wrap the Filter Questions block (the section that
iterates subscription.filter_questions.all and renders question.question and
question.get_expected_answer_display) with an explicit auto-escape directive:
enable autoescaping before the {% with
questions=subscription.filter_questions.all %} block and disable it after the
loop (use {% autoescape on %} ... {% endautoescape %>) so user-generated
question.question is explicitly escaped consistent with other templates.

In @radis/subscriptions/tests/test_views.py:
- Around line 578-581: The assertion "assert item1 and item3 not in items" is
logically incorrect and only checks item3; update the assertion to explicitly
assert both are absent, e.g. replace it with "assert item1 not in items and
item3 not in items" or two separate asserts "assert item1 not in items" and
"assert item3 not in items" referencing the existing variables items, item1, and
item3 in the test.
- Around line 545-548: The assertion uses `item2 and item3 not in items`, which
evaluates incorrectly and only guarantees `item3` is absent; change to assert
both separately by replacing that expression with explicit checks like `assert
item2 not in items` and `assert item3 not in items` so both `item2` and `item3`
are individually verified not to be in `items` (keep `items`, `item1`, `item2`,
`item3` references intact).
🧹 Nitpick comments (34)
radis/subscriptions/static/subscriptions/subscriptions.css (1)

2-3: Avoid !important — address specificity instead.

The use of !important on font-size and font-weight is a CSS anti-pattern that makes future overrides difficult and signals an underlying specificity issue. Consider removing !important and using a more specific selector or relying on cascade order.

♻️ Proposed fix: Remove `!important` flags
  fieldset > legend {
-   font-size: 1rem !important;
-   font-weight: 500 !important;
+   font-size: 1rem;
+   font-weight: 500;
    line-height: 1.5;
    margin-bottom: 0.5rem;
  }

If this rule is being overridden by other styles, consider increasing specificity with a more targeted selector (e.g., form fieldset > legend or adding a class) rather than relying on !important.

radis/chats/utils/chat_client.py (2)

69-81: Consider using NoReturn type hint.

Since _handle_api_error always raises and never returns normally, using NoReturn from typing instead of None would be more accurate and help static type checkers understand the control flow.

♻️ Suggested improvement
+from typing import Iterable, NoReturn
 
-def _handle_api_error(error: openai.APIError, operation: str) -> None:
+def _handle_api_error(error: openai.APIError, operation: str) -> NoReturn:

128-135: Potential type checker warning: completion may appear unbound.

The control flow is correct since _handle_api_error always raises, ensuring completion is bound when reaching the validation call. However, some type checkers may flag this as a potentially unbound variable. Adding an explicit raise after the helper call or restructuring with else would make the intent clearer.

♻️ Alternative structure for clarity
         try:
             completion = await self._client.chat.completions.create(**request)
         except openai.APIError as e:
             _handle_api_error(e, "chat")
+            raise  # Unreachable, but clarifies control flow for type checkers
 
         answer = _validate_completion_response(completion)

Or use an else clause:

try:
    completion = await self._client.chat.completions.create(**request)
except openai.APIError as e:
    _handle_api_error(e, "chat")
else:
    answer = _validate_completion_response(completion)
    self._log_response(answer)
    return answer
radis/extractions/constants.py (1)

1-1: Add a docstring explaining the rationale for MAX_SELECTION_OPTIONS = 7.

The choice of 7 as the maximum isn't self-documenting. Consider adding a module-level docstring or inline comment explaining whether this limit is based on UX considerations (e.g., cognitive load, UI space constraints), technical constraints, or another rationale.

📝 Example documentation
+"""
+Constants for extraction field configuration.
+
+MAX_SELECTION_OPTIONS limits the number of choices to prevent UI clutter
+and maintain usability in selection fields.
+"""
+
 MAX_SELECTION_OPTIONS = 7
radis/core/constants.py (1)

6-8: Consider adding documentation for the age-related constants.

While the values are reasonable (MIN_AGE=0 for newborns, MAX_AGE=120 as a practical upper limit, AGE_STEP=10 for range grouping), a brief comment explaining their purpose would improve maintainability.

📝 Example documentation
+# Age range constants for form field validation and range building
+# MIN_AGE: Includes newborns and infants
+# MAX_AGE: Practical upper limit based on maximum human lifespan
+# AGE_STEP: Default step size for age range grouping in UI
 MIN_AGE = 0
 MAX_AGE = 120
 AGE_STEP = 10
radis/subscriptions/handlers.py (1)

20-20: Use lazy % formatting for logging.

F-strings in logging statements are evaluated eagerly, even when the log level would suppress output. Use lazy % formatting for better performance.

♻️ Proposed fix
-    logger.info(f"Triggering subscription processing for {len(reports)} new report(s)")
+    logger.info("Triggering subscription processing for %d new report(s)", len(reports))
radis/extractions/processors.py (1)

47-59: Consider adding defensive error handling for invalid output field configurations.

The generate_output_fields_schema utility can raise ValueError if an output field has invalid configuration (e.g., a selection field with no options). While validation likely occurs during field creation, adding error handling here would prevent thread crashes and improve resilience.

🛡️ Proposed error handling
 def process_output_fields(self, instance: ExtractionInstance) -> None:
     job = instance.task.job
     output_fields = list(job.output_fields.order_by("pk"))
+    
+    if not output_fields:
+        logger.warning(f"No output fields configured for job {job.pk}")
+        return
+    
+    try:
-        Schema = generate_output_fields_schema(output_fields)
+        Schema = generate_output_fields_schema(output_fields)
+    except ValueError as e:
+        logger.error(f"Invalid output field configuration for job {job.pk}: {e}")
+        raise
+    
     prompt = Template(settings.OUTPUT_FIELDS_SYSTEM_PROMPT).substitute(
         {
             "report": instance.text,
             "fields": generate_output_fields_prompt(output_fields),
         }
     )
     result = self.client.extract_data(prompt.strip(), Schema)
     instance.output = result.model_dump()
     instance.save()
radis/urls.py (1)

17-17: LGTM! More robust URL inclusion using apps.is_installed().

Switching from settings.DEBUG checks to apps.is_installed() prevents NoReverseMatch errors in test environments where DEBUG may be forced to False. The explanatory comment clearly documents the rationale.

♻️ Optional: Use iterable unpacking per Ruff suggestion

Lines 42-44 and 47-49 could use iterable unpacking instead of list concatenation for slightly more idiomatic Python:

 if apps.is_installed("django_browser_reload"):
-    urlpatterns = [
-        path("__reload__/", include("django_browser_reload.urls")),
-    ] + urlpatterns
+    urlpatterns = [
+        path("__reload__/", include("django_browser_reload.urls")),
+        *urlpatterns,
+    ]

 if apps.is_installed("debug_toolbar"):
-    urlpatterns = [
-        path("__debug__/", include("debug_toolbar.urls")),
-    ] + urlpatterns
+    urlpatterns = [
+        path("__debug__/", include("debug_toolbar.urls")),
+        *urlpatterns,
+    ]

Also applies to: 38-49

radis/extractions/tests/test_query_generator.py (2)

49-52: Reorder assertions for clearer failure diagnostics.

Checking query != "" before query is not None can produce a confusing failure message if query is None. Check for None first.

Suggested reordering
-        assert query != ""
         assert metadata["success"] is True
         assert query is not None
+        assert query != ""
         assert "nodule" in query.lower()

70-76: Unused variable fixes flagged by static analysis.

The fixes variable is assigned but never used. If the value is intentionally discarded, prefix with underscore to signal intent.

Proposed fix
-        fixed_query, fixes = self.generator.validate_and_fix_query(query)
+        fixed_query, _ = self.generator.validate_and_fix_query(query)
 
         assert fixed_query != ""
         assert "lung nodule" in fixed_query or "lung" in fixed_query
radis/subscriptions/apps.py (1)

34-45: Consider guarding against duplicate handler registration.

Django's ready() can be called multiple times in certain scenarios (e.g., during testing with --keepdb). Appending to reports_created_handlers without a guard could result in duplicate handlers.

Suggested guard
 def register_reports_handler():
     """Register handler to trigger subscriptions when reports are created."""
     from radis.reports.site import ReportsCreatedHandler, reports_created_handlers
 
     from .handlers import handle_reports_created
 
+    # Avoid duplicate registration if ready() is called multiple times
+    if any(h.name == "subscription_launcher" for h in reports_created_handlers):
+        return
+
     reports_created_handlers.append(
         ReportsCreatedHandler(
             name="subscription_launcher",
             handle=handle_reports_created,
         )
     )
radis/subscriptions/tasks.py (1)

94-101: Redundant job.save() after SubscriptionJob.objects.create().

create() already persists the object to the database. The subsequent save() on line 101 is unnecessary.

Proposed fix
         job = SubscriptionJob.objects.create(
             subscription=subscription,
             status=SubscriptionJob.Status.PREPARING,
             owner=subscription.owner,
             owner_id=subscription.owner_id,
             send_finished_mail=subscription.send_finished_mail,
         )
-        job.save()
         transaction.on_commit(job.delay)
radis/extractions/templates/extractions/_selection_options_field.html (2)

10-16: Consider adding aria-label to the add button.

The remove button has aria-label="Remove selection option", but the add button lacks an explicit aria-label. While it has visible text, adding an aria-label would provide consistency and better screen reader support.

Suggested addition
             <button type="button"
                     class="btn btn-outline-secondary btn-sm"
                     @click.prevent="addOption()"
-                    :disabled="!supportsSelection || options.length >= maxOptions">
+                    :disabled="!supportsSelection || options.length >= maxOptions"
+                    aria-label="Add selection option">
                 {% bootstrap_icon "plus-lg" %}
                 Enter a selection
             </button>

41-45: Error messages may run together without spacing.

If multiple validation errors exist, they'll be concatenated without separators. Consider adding spacing between errors.

Suggested fix
     {% if form.selection_options.errors %}
         <div class="invalid-feedback d-block">
-            {% for error in form.selection_options.errors %}{{ error }}{% endfor %}
+            {% for error in form.selection_options.errors %}{{ error }}{% if not forloop.last %} {% endif %}{% endfor %}
         </div>
     {% endif %}
radis/subscriptions/tests/unit/test_processors.py (1)

39-42: Simplify mock setup.

The call to create_openai_client_mock(extraction_output) is immediately overridden by reassigning openai_mock.beta.chat.completions.parse with side_effect. Consider constructing the mock directly or updating the helper to accept multiple responses.

♻️ Proposed refactor
-    openai_mock = create_openai_client_mock(extraction_output)
-    openai_mock.beta.chat.completions.parse = MagicMock(
-        side_effect=[filter_response, extraction_response]
-    )
+    openai_mock = MagicMock()
+    openai_mock.beta.chat.completions.parse = MagicMock(
+        side_effect=[filter_response, extraction_response]
+    )
radis/subscriptions/utils/testing_helpers.py (1)

31-36: Consider removing redundant owner_id assignment.

Line 34 explicitly sets owner_id=user.id after setting owner=user. Django automatically sets the foreign key ID field when you assign the related object, making the explicit owner_id assignment redundant.

♻️ Proposed refactor
 job = SubscriptionJob.objects.create(
     subscription=subscription,
     owner=user,
-    owner_id=user.id,
     status=SubscriptionJob.Status.PENDING,
 )
radis/subscriptions/templates/subscriptions/subscription_inbox.html (1)

14-40: Refactor query string construction to reduce duplication and improve maintainability.

Lines 16-31 repeat the same query string construction pattern four times with only sort_by and order parameters changing. This is verbose, error-prone, and difficult to maintain. Consider extracting this logic into a template tag or using Django's QueryDict manipulation.

♻️ Suggested approach

Create a custom template tag that preserves query parameters while updating specific ones:

# In radis/subscriptions/templatetags/subscription_extras.py
from django import template
from django.http import QueryDict

register = template.Library()

@register.simple_tag(takes_context=True)
def url_with_params(context, **kwargs):
    """Preserve existing query parameters while updating specified ones."""
    request = context['request']
    query_dict = request.GET.copy()
    query_dict.pop('page', None)  # Always reset page
    for key, value in kwargs.items():
        if value:
            query_dict[key] = value
        else:
            query_dict.pop(key, None)
    return f"?{query_dict.urlencode()}"

Then in the template:

{% load subscription_extras %}
<a href="{% url_with_params sort_by='created_at' order='desc' %}"
   class="btn btn-outline-secondary {% if current_sort_by == 'created_at' and current_order == 'desc' %}active{% endif %}">
    Newest First
</a>
radis/extractions/tests/test_views.py (1)

37-46: Consider adding a type guard for unexpected chunk types.

The current implementation handles bytes explicitly but falls through to encoding any non-bytes value. While this works for the expected string chunks, it could mask unexpected types.

♻️ Optional: Add explicit type handling
 def _collect_csv(response) -> str:
     chunks: list[bytes] = []
     for chunk in response.streaming_content:
         if isinstance(chunk, bytes):
             chunks.append(chunk)
+        elif isinstance(chunk, str):
+            chunks.append(chunk.encode("utf-8"))
         else:
-            chunks.append(chunk.encode("utf-8"))
+            raise TypeError(f"Unexpected chunk type: {type(chunk)}")
     csv_bytes = b"".join(chunks)
     return csv_bytes.decode("utf-8-sig")
radis/extractions/utils/csv_export.py (1)

11-17: Consider special handling for array values.

The current implementation converts all non-None, non-bool values using str(). For OutputFields with is_array=True, this will produce Python list syntax (e.g., "['Option 1', 'Option 2']") which may not be ideal for CSV consumption.

♻️ Optional: Format arrays as comma-separated values
 def _format_cell(value: Any) -> str:
     """Format a single output value for CSV export."""
     if value is None:
         return ""
     if isinstance(value, bool):
         return "yes" if value else "no"
+    if isinstance(value, list):
+        return ", ".join(str(item) for item in value)
     return str(value)
radis/extractions/utils/processor_utils.py (1)

10-12: Misplaced module docstring.

The docstring at line 10 is orphaned—it appears after the imports but isn't attached to any function or class. Move it to the top of the file (before imports) to serve as the module docstring, or remove it if the intent was to document generate_output_fields_schema.

♻️ Proposed fix
+"""Build a Pydantic model that describes the structure the extractor must output."""
+
 from collections.abc import Iterable
 from typing import Any, Literal

 from pydantic import BaseModel, create_model

 from ..models import OutputField, OutputType

 type Numeric = float | int

-"""Build a Pydantic model that describes the structure the extractor must output"""
-

 def generate_output_fields_schema(fields: Iterable[OutputField]) -> type[BaseModel]:
radis/subscriptions/utils/csv_export.py (2)

35-42: Combine into a single query.

Two separate queries fetch name and pk from the same queryset. Consider combining them to reduce database round-trips.

♻️ Proposed fix
-    # Get output field names in PK order (to match dict keys)
-    field_names: list[str] = list(
-        subscription.output_fields.order_by("pk").values_list("name", flat=True)
-    )
-
-    # Pre-fetch field PKs to avoid N+1 query in the loop below
-    field_pks: list[int] = list(
-        subscription.output_fields.order_by("pk").values_list("pk", flat=True)
-    )
+    # Get output field names and PKs in PK order (to match dict keys)
+    output_fields = list(
+        subscription.output_fields.order_by("pk").values_list("pk", "name")
+    )
+    field_pks = [pk for pk, _ in output_fields]
+    field_names = [name for _, name in output_fields]

57-59: Remove unnecessary prefetch.

The subscription__output_fields prefetch is not used in the loop—field PKs are already pre-fetched above. This adds overhead without benefit.

♻️ Proposed fix
     # Data rows - prefetch related fields for efficiency
-    items = queryset.select_related("report").prefetch_related(
-        "report__modalities", "subscription__output_fields"
-    )
+    items = queryset.select_related("report").prefetch_related("report__modalities")
radis/subscriptions/utils/processor_utils.py (1)

16-17: Consider if the wrapper is needed.

get_output_field_name is a trivial wrapper that returns field.name. While this provides API symmetry with get_filter_question_field_name, consider whether direct access to field.name would be clearer at call sites, or document why the indirection is intentional (e.g., for future extensibility).

radis/extractions/models.py (1)

126-145: Validation logic correctly uses shared utility, addressing DRY concern.

The clean() method properly delegates to validate_selection_options from the validation module, avoiding code duplication with the form's validation logic.

However, the exception re-raise on line 139 should use exception chaining for proper traceback.

Proposed fix for exception chaining
             try:
                 self.selection_options = validate_selection_options(self.selection_options)
             except ValidationError as e:
-                raise ValidationError({"selection_options": e.message})
+                raise ValidationError({"selection_options": e.message}) from e
radis/settings/base.py (1)

465-465: Clarify the disabled cron setting.

The comment indicates the cron is disabled in favor of a handler-based approach, but the value "* * * * *" is still set. Consider using an empty string or removing the setting to avoid confusion, or add a more explicit comment explaining where the disabling logic resides.

Option: Use a more explicit disabled value
-SUBSCRIPTION_CRON = "* * * * *"  # Disabled - using handler-based approach
+# Subscription cron disabled - using ReportsCreatedHandler for event-driven processing
+# See radis/subscriptions/handlers.py for the handler-based implementation
+SUBSCRIPTION_CRON = ""  # Empty = disabled

Alternatively, if the cron value must remain for backward compatibility, expand the comment:

-SUBSCRIPTION_CRON = "* * * * *"  # Disabled - using handler-based approach
+# NOTE: Cron-based subscription refresh is disabled. The cron value is retained for
+# reference but subscription jobs are triggered via ReportsCreatedHandler instead.
+SUBSCRIPTION_CRON = "* * * * *"
radis/extractions/utils/query_generator.py (2)

75-77: Broad exception catch could mask unexpected errors.

Catching bare Exception is too broad. The AsyncChatClient.chat method raises openai.APIError (handled via _handle_api_error) and potentially other specific exceptions. Consider catching more specific exceptions or at minimum use logging.exception to preserve the stack trace.

♻️ Suggested improvement
-            except Exception as e:
-                logger.error(f"Error during async LLM query generation: {e}", exc_info=True)
+            except Exception:
+                logger.exception("Error during async LLM query generation")
                 metadata["error"] = str(e)

111-113: Use logging.exception for better error diagnostics.

When catching exceptions in _call_llm, use logger.exception to automatically include the stack trace, which aids debugging LLM integration issues.

♻️ Suggested fix
         except Exception as e:
-            logger.error(f"Async LLM call failed: {e}")
+            logger.exception("Async LLM call failed")
             return None
radis/subscriptions/forms.py (1)

138-148: Avoid using assert for runtime validation.

The assert cleaned_data on line 140 could be bypassed if Python runs with optimization (-O flag), leading to unexpected None access. Use explicit validation instead.

♻️ Suggested fix
     def clean(self) -> dict[str, Any]:
         cleaned_data = super().clean()
-        assert cleaned_data
+        if cleaned_data is None:
+            return {}

         question = cleaned_data.get("question")
         expected_answer = cleaned_data.get("expected_answer")
radis/subscriptions/processors.py (1)

93-101: Consider using logging.exception for error cases.

For RuntimeError and ValidationError, the full traceback could help diagnose LLM integration issues. logging.exception automatically includes the stack trace.

♻️ Suggested improvement
                 except RuntimeError as e:
-                    logger.error(f"LLM API error filtering report {report.pk}: {e}")
+                    logger.exception(f"LLM API error filtering report {report.pk}")
                     return
                 except ValidationError as e:
-                    logger.error(f"Response validation failed filtering report {report.pk}: {e}")
+                    logger.exception(f"Response validation failed filtering report {report.pk}")
                     return
radis/subscriptions/views.py (2)

207-227: DRY violation: get_ordering duplicated across views.

The get_ordering method is nearly identical in SubscriptionInboxView and SubscriptionInboxDownloadView. Consider extracting to a shared mixin or utility function.

♻️ Suggested approach

Create a mixin or extract the logic:

class SubscribedItemOrderingMixin:
    ALLOWED_SORT_FIELDS = {
        "created_at": "created_at",
        "study_date": "report__study_datetime",
    }
    
    def get_ordering(self) -> str:
        sort_by = self.request.GET.get("sort_by", "created_at")
        order = self.request.GET.get("order", "desc")
        
        if sort_by not in self.ALLOWED_SORT_FIELDS:
            sort_by = "created_at"
        if order not in ["asc", "desc"]:
            order = "desc"
            
        field = self.ALLOWED_SORT_FIELDS[sort_by]
        return field if order == "asc" else f"-{field}"

Then have both views inherit from this mixin.

Also applies to: 303-323


268-271: Side effect in get_context_data may cause unexpected behavior.

Updating last_viewed_at in get_context_data causes a database write during template context preparation. If get_context_data is called multiple times (e.g., during error handling or by middleware), this could lead to unexpected updates. Consider moving to dispatch() or a dedicated method.

♻️ Alternative approach

Override get() instead:

def get(self, request, *args, **kwargs):
    response = super().get(request, *args, **kwargs)
    # Update last_viewed_at after successful response
    subscription = self.object
    subscription.last_viewed_at = timezone.now()
    subscription.save(update_fields=["last_viewed_at"])
    return response
radis/extractions/views.py (3)

471-472: Hardcoded wizard session key is fragile.

The session key "wizard_extraction_job_wizard_view" is derived from the view class name. If the class is renamed, this will break. Consider computing the key dynamically.

♻️ Suggested improvement
-        wizard_session_key = "wizard_extraction_job_wizard_view"
+        # Django formtools uses: "wizard_" + view class name (snake_case)
+        from django.utils.text import camel_case_to_spaces
+        view_name = camel_case_to_spaces(ExtractionJobWizardView.__name__).replace(" ", "_")
+        wizard_session_key = f"wizard_{view_name}"

Or define it as a class constant on ExtractionJobWizardView that both views can reference.


530-543: Error handling is comprehensive but could use logging.exception.

The exception handling saves error state to the wizard session, which is good for UX. Consider using logger.exception instead of logger.error(..., exc_info=True) for consistency with Python logging idioms.


627-631: _Echo class duplicated across modules.

This _Echo helper class is identical to the one in radis/subscriptions/views.py. Consider extracting to a shared utility module (e.g., radis/core/utils/csv_export.py).

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a419692 and b7d126d.

📒 Files selected for processing (70)
  • radis/chats/utils/chat_client.py
  • radis/core/constants.py
  • radis/core/form_fields.py
  • radis/core/static/core/core.js
  • radis/core/templates/cotton/formset.html
  • radis/extractions/constants.py
  • radis/extractions/factories.py
  • radis/extractions/forms.py
  • radis/extractions/migrations/0004_outputfield_selection_options.py
  • radis/extractions/migrations/0005_outputfield_is_array.py
  • radis/extractions/migrations/0006_remove_extractionjob_provider.py
  • radis/extractions/migrations/0007_remove_outputfield_unique.py
  • radis/extractions/migrations/0008_outputfield_subscription_and_more.py
  • radis/extractions/models.py
  • radis/extractions/processors.py
  • radis/extractions/static/extractions/extractions.css
  • radis/extractions/static/extractions/extractions.js
  • radis/extractions/templates/extractions/_query_generation_result.html
  • radis/extractions/templates/extractions/_query_generation_section.html
  • radis/extractions/templates/extractions/_search_preview.html
  • radis/extractions/templates/extractions/_search_preview_form_section.html
  • radis/extractions/templates/extractions/_selection_options_field.html
  • radis/extractions/templates/extractions/extraction_job_detail.html
  • radis/extractions/templates/extractions/extraction_job_output_fields_form.html
  • radis/extractions/templates/extractions/extraction_job_search_form.html
  • radis/extractions/templates/extractions/extraction_job_wizard_summary.html
  • radis/extractions/templates/extractions/extraction_result_list.html
  • radis/extractions/tests/test_forms.py
  • radis/extractions/tests/test_query_generator.py
  • radis/extractions/tests/test_views.py
  • radis/extractions/tests/unit/test_processor_utils.py
  • radis/extractions/urls.py
  • radis/extractions/utils/__init__.py
  • radis/extractions/utils/csv_export.py
  • radis/extractions/utils/processor_utils.py
  • radis/extractions/utils/query_generator.py
  • radis/extractions/utils/testing_helpers.py
  • radis/extractions/utils/validation.py
  • radis/extractions/views.py
  • radis/search/forms.py
  • radis/search/site.py
  • radis/settings/base.py
  • radis/subscriptions/apps.py
  • radis/subscriptions/factories.py
  • radis/subscriptions/filters.py
  • radis/subscriptions/forms.py
  • radis/subscriptions/handlers.py
  • radis/subscriptions/migrations/0011_rename_answers_subscribeditem_filter_results_and_more.py
  • radis/subscriptions/migrations/0012_procrastinate_on_delete.py
  • radis/subscriptions/migrations/0013_subscription_last_viewed_at.py
  • radis/subscriptions/migrations/0014_remove_subscription_query.py
  • radis/subscriptions/models.py
  • radis/subscriptions/processors.py
  • radis/subscriptions/static/subscriptions/subscriptions.css
  • radis/subscriptions/tables.py
  • radis/subscriptions/tasks.py
  • radis/subscriptions/templates/subscriptions/_subscribed_item_preview.html
  • radis/subscriptions/templates/subscriptions/subscription_detail.html
  • radis/subscriptions/templates/subscriptions/subscription_inbox.html
  • radis/subscriptions/templates/subscriptions/subscription_layout.html
  • radis/subscriptions/templatetags/__init__.py
  • radis/subscriptions/templatetags/subscriptions_extras.py
  • radis/subscriptions/tests/test_views.py
  • radis/subscriptions/tests/unit/test_processors.py
  • radis/subscriptions/urls.py
  • radis/subscriptions/utils/csv_export.py
  • radis/subscriptions/utils/processor_utils.py
  • radis/subscriptions/utils/testing_helpers.py
  • radis/subscriptions/views.py
  • radis/urls.py
🧰 Additional context used
🧬 Code graph analysis (28)
radis/subscriptions/tests/unit/test_processors.py (4)
radis/chats/utils/testing_helpers.py (1)
  • create_openai_client_mock (28-32)
radis/subscriptions/models.py (1)
  • SubscribedItem (98-114)
radis/subscriptions/utils/processor_utils.py (2)
  • get_filter_question_field_name (12-13)
  • get_output_field_name (16-17)
radis/subscriptions/utils/testing_helpers.py (1)
  • create_subscription_task (11-43)
radis/extractions/migrations/0005_outputfield_is_array.py (1)
radis/extractions/migrations/0004_outputfield_selection_options.py (1)
  • Migration (4-29)
radis/extractions/utils/processor_utils.py (1)
radis/extractions/models.py (2)
  • OutputField (82-145)
  • OutputType (75-79)
radis/extractions/utils/__init__.py (1)
radis/extractions/utils/csv_export.py (1)
  • iter_extraction_result_rows (20-54)
radis/subscriptions/migrations/0013_subscription_last_viewed_at.py (2)
radis/subscriptions/migrations/0011_rename_answers_subscribeditem_filter_results_and_more.py (1)
  • Migration (7-56)
radis/subscriptions/migrations/0012_procrastinate_on_delete.py (1)
  • Migration (7-23)
radis/extractions/tests/unit/test_processor_utils.py (3)
radis/extractions/factories.py (3)
  • ExtractionJobFactory (26-69)
  • OutputFieldFactory (72-88)
  • create (22-23)
radis/extractions/models.py (1)
  • OutputType (75-79)
radis/extractions/utils/processor_utils.py (1)
  • generate_output_fields_schema (13-36)
radis/extractions/migrations/0004_outputfield_selection_options.py (1)
radis/extractions/migrations/0005_outputfield_is_array.py (1)
  • Migration (4-15)
radis/extractions/processors.py (1)
radis/extractions/utils/processor_utils.py (2)
  • generate_output_fields_schema (13-36)
  • generate_output_fields_prompt (39-55)
radis/extractions/tests/test_query_generator.py (3)
radis/extractions/models.py (2)
  • OutputField (82-145)
  • OutputType (75-79)
radis/extractions/utils/query_generator.py (5)
  • AsyncQueryGenerator (21-190)
  • generate_from_fields (31-82)
  • validate_and_fix_query (160-190)
  • _format_fields_for_prompt (115-135)
  • _extract_query_from_response (137-158)
radis/chats/utils/chat_client.py (2)
  • chat (120-135)
  • chat (143-168)
radis/subscriptions/handlers.py (2)
radis/reports/models.py (1)
  • Report (36-88)
radis/subscriptions/tasks.py (1)
  • subscription_launcher (84-102)
radis/subscriptions/tasks.py (2)
radis/pgsearch/providers.py (2)
  • search (81-122)
  • filter (158-163)
radis/search/site.py (1)
  • SearchFilters (32-59)
radis/extractions/tests/test_forms.py (2)
radis/extractions/forms.py (1)
  • OutputFieldForm (172-300)
radis/extractions/models.py (2)
  • OutputField (82-145)
  • OutputType (75-79)
radis/subscriptions/factories.py (1)
radis/subscriptions/models.py (6)
  • FilterQuestion (72-95)
  • SubscribedItem (98-114)
  • Subscription (26-69)
  • Meta (16-17)
  • Meta (59-66)
  • Meta (110-111)
radis/extractions/utils/csv_export.py (1)
radis/extractions/models.py (2)
  • ExtractionInstance (167-180)
  • ExtractionJob (27-72)
radis/extractions/utils/query_generator.py (3)
radis/extractions/models.py (1)
  • OutputField (82-145)
radis/search/utils/query_parser.py (3)
  • QueryParser (57-300)
  • parse (220-277)
  • unparse (280-300)
radis/chats/utils/chat_client.py (3)
  • AsyncChatClient (115-135)
  • chat (120-135)
  • chat (143-168)
radis/subscriptions/apps.py (2)
radis/reports/site.py (1)
  • ReportsCreatedHandler (8-10)
radis/subscriptions/handlers.py (1)
  • handle_reports_created (10-21)
radis/subscriptions/tests/test_views.py (4)
radis/extractions/factories.py (2)
  • OutputFieldFactory (72-88)
  • create (22-23)
radis/reports/factories.py (3)
  • LanguageFactory (21-26)
  • ReportFactory (48-88)
  • code (35-37)
radis/subscriptions/factories.py (2)
  • FilterQuestionFactory (26-31)
  • SubscribedItemFactory (49-57)
radis/subscriptions/models.py (1)
  • Subscription (26-69)
radis/subscriptions/utils/csv_export.py (2)
radis/subscriptions/models.py (2)
  • SubscribedItem (98-114)
  • Subscription (26-69)
radis/reports/models.py (1)
  • modality_codes (87-88)
radis/subscriptions/filters.py (2)
radis/reports/models.py (1)
  • Modality (25-33)
radis/subscriptions/models.py (2)
  • SubscribedItem (98-114)
  • Subscription (26-69)
radis/subscriptions/utils/processor_utils.py (2)
radis/extractions/models.py (1)
  • OutputField (82-145)
radis/subscriptions/models.py (1)
  • FilterQuestion (72-95)
radis/extractions/tests/test_views.py (1)
radis/extractions/factories.py (4)
  • ExtractionInstanceFactory (98-103)
  • ExtractionJobFactory (26-69)
  • ExtractionTaskFactory (91-95)
  • OutputFieldFactory (72-88)
radis/search/forms.py (1)
radis/core/form_fields.py (7)
  • create_age_range_fields (139-177)
  • create_language_field (17-21)
  • create_language_field (25-29)
  • create_language_field (32-76)
  • create_modality_field (80-84)
  • create_modality_field (88-92)
  • create_modality_field (95-136)
radis/subscriptions/processors.py (5)
radis/chats/utils/chat_client.py (2)
  • ChatClient (138-186)
  • extract_data (170-186)
radis/core/processors.py (1)
  • AnalysisTaskProcessor (11-77)
radis/extractions/utils/processor_utils.py (2)
  • generate_output_fields_prompt (39-55)
  • generate_output_fields_schema (13-36)
radis/subscriptions/utils/processor_utils.py (4)
  • generate_filter_questions_prompt (31-35)
  • generate_filter_questions_schema (20-28)
  • get_filter_question_field_name (12-13)
  • get_output_field_name (16-17)
radis/subscriptions/models.py (2)
  • Subscription (26-69)
  • SubscribedItem (98-114)
radis/subscriptions/views.py (5)
radis/subscriptions/filters.py (1)
  • SubscribedItemFilter (28-80)
radis/subscriptions/tables.py (1)
  • SubscriptionTable (9-44)
radis/subscriptions/forms.py (1)
  • SubscriptionForm (20-99)
radis/subscriptions/models.py (2)
  • SubscribedItem (98-114)
  • Subscription (26-69)
radis/subscriptions/utils/csv_export.py (1)
  • iter_subscribed_item_rows (22-87)
radis/subscriptions/urls.py (1)
radis/subscriptions/views.py (1)
  • SubscriptionInboxDownloadView (283-382)
radis/subscriptions/models.py (1)
radis/extractions/models.py (1)
  • OutputField (82-145)
radis/extractions/models.py (2)
radis/extractions/forms.py (2)
  • clean (115-169)
  • clean (278-300)
radis/extractions/utils/validation.py (1)
  • validate_selection_options (8-48)
radis/extractions/forms.py (5)
radis/core/form_fields.py (7)
  • create_age_range_fields (139-177)
  • create_language_field (17-21)
  • create_language_field (25-29)
  • create_language_field (32-76)
  • create_modality_field (80-84)
  • create_modality_field (88-92)
  • create_modality_field (95-136)
radis/core/layouts.py (1)
  • RangeSlider (24-59)
radis/reports/models.py (2)
  • Language (18-22)
  • Modality (25-33)
radis/extractions/models.py (7)
  • ExtractionJob (27-72)
  • OutputField (82-145)
  • OutputType (75-79)
  • clean (126-145)
  • Meta (17-18)
  • Meta (56-57)
  • Meta (102-121)
radis/extractions/utils/validation.py (1)
  • validate_selection_options (8-48)
🪛 Ruff (0.14.10)
radis/extractions/migrations/0005_outputfield_is_array.py

5-7: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


9-15: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

radis/extractions/utils/processor_utils.py

25-25: Avoid specifying long messages outside the exception class

(TRY003)


28-28: Avoid specifying long messages outside the exception class

(TRY003)

radis/subscriptions/migrations/0013_subscription_last_viewed_at.py

8-10: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


12-18: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

radis/extractions/migrations/0004_outputfield_selection_options.py

5-7: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


9-29: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

radis/urls.py

42-44: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

radis/extractions/utils/validation.py

27-27: Avoid specifying long messages outside the exception class

(TRY003)


32-32: Avoid specifying long messages outside the exception class

(TRY003)


36-36: Avoid specifying long messages outside the exception class

(TRY003)


41-43: Avoid specifying long messages outside the exception class

(TRY003)


46-46: Avoid specifying long messages outside the exception class

(TRY003)

radis/subscriptions/forms.py

105-105: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


146-146: Avoid specifying long messages outside the exception class

(TRY003)

radis/extractions/tests/test_query_generator.py

73-73: Unpacked variable fixes is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

radis/subscriptions/tasks.py

60-60: Avoid specifying long messages outside the exception class

(TRY003)

radis/subscriptions/migrations/0014_remove_subscription_query.py

8-10: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


12-17: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

radis/extractions/migrations/0008_outputfield_subscription_and_more.py

9-12: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


14-53: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

radis/extractions/migrations/0007_remove_outputfield_unique.py

9-11: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


13-33: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

radis/extractions/utils/query_generator.py

109-109: Consider moving this statement to an else block

(TRY300)


111-111: Do not catch blind exception: Exception

(BLE001)


112-112: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


186-186: Consider moving this statement to an else block

(TRY300)


188-188: Do not catch blind exception: Exception

(BLE001)


189-189: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

radis/extractions/factories.py

62-62: Unused method argument: extracted

(ARG002)


62-62: Unused method argument: kwargs

(ARG002)

radis/subscriptions/migrations/0012_procrastinate_on_delete.py

9-12: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


14-23: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

radis/chats/utils/chat_client.py

34-34: Avoid specifying long messages outside the exception class

(TRY003)


39-39: Avoid specifying long messages outside the exception class

(TRY003)


59-59: Avoid specifying long messages outside the exception class

(TRY003)


64-64: Avoid specifying long messages outside the exception class

(TRY003)


81-81: Avoid specifying long messages outside the exception class

(TRY003)

radis/subscriptions/processors.py

94-94: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


97-97: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


100-100: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


135-135: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


138-140: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


143-143: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

radis/subscriptions/views.py

111-111: Use raise without specifying exception name

Remove exception name

(TRY201)


164-164: Use raise without specifying exception name

Remove exception name

(TRY201)


197-197: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


346-346: Unused method argument: request

(ARG002)


346-346: Unused method argument: args

(ARG002)


346-346: Unused method argument: kwargs

(ARG002)

radis/subscriptions/migrations/0011_rename_answers_subscribeditem_filter_results_and_more.py

9-11: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


13-56: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

radis/extractions/views.py

86-86: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


237-237: Avoid specifying long messages outside the exception class

(TRY003)


242-242: Unused method argument: kwargs

(ARG002)


533-533: Use explicit conversion flag

Replace with conversion flag

(RUF010)


648-648: Unused method argument: request

(ARG002)


648-648: Unused method argument: args

(ARG002)


648-648: Unused method argument: kwargs

(ARG002)

radis/extractions/models.py

103-121: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


139-139: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

radis/extractions/forms.py

50-52: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


102-105: Avoid specifying long messages outside the exception class

(TRY003)


108-108: Avoid specifying long messages outside the exception class

(TRY003)


188-194: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


268-268: Avoid specifying long messages outside the exception class

(TRY003)

Comment on lines +72 to +76
field = forms.ChoiceField(required=required)
field.choices = [(lang.code, LANGUAGE_LABELS[lang.code]) for lang in languages]
if empty_label is not None:
field.empty_label = empty_label # type: ignore
return field
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix incorrect attribute assignment on ChoiceField.

Line 75 attempts to set empty_label on a ChoiceField, but ChoiceField does not have an empty_label attribute—that's specific to ModelChoiceField. This will raise an AttributeError at runtime when use_pk=False and empty_label is not None.

To provide an empty option for ChoiceField, prepend an empty choice tuple to the choices list.

🐛 Proposed fix
     else:
         # Return ChoiceField - cleaned_data will contain code strings
         field = forms.ChoiceField(required=required)
         field.choices = [(lang.code, LANGUAGE_LABELS[lang.code]) for lang in languages]
         if empty_label is not None:
-            field.empty_label = empty_label  # type: ignore
+            field.choices = [("", empty_label)] + field.choices
         return field
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
field = forms.ChoiceField(required=required)
field.choices = [(lang.code, LANGUAGE_LABELS[lang.code]) for lang in languages]
if empty_label is not None:
field.empty_label = empty_label # type: ignore
return field
field = forms.ChoiceField(required=required)
field.choices = [(lang.code, LANGUAGE_LABELS[lang.code]) for lang in languages]
if empty_label is not None:
field.choices = [("", empty_label)] + field.choices
return field
🤖 Prompt for AI Agents
In @radis/core/form_fields.py around lines 72 - 76, The code incorrectly assigns
empty_label to a forms.ChoiceField (variable field) which will raise
AttributeError; instead, when empty_label is not None prepend an empty choice
tuple to the choices list: build choices = [('', empty_label)] + [(lang.code,
LANGUAGE_LABELS[lang.code]) for lang in languages] (or insert at index 0) and
then assign field.choices = choices; keep using forms.ChoiceField, languages,
LANGUAGE_LABELS and empty_label names as shown.

hx-include="[name='1-query'], [name='1-language'], [name='1-modalities'], [name='1-study_date_from'], [name='1-study_date_till'], [name='1-study_description'], [name='1-patient_sex'], [name='1-age_from'], [name='1-age_till']"
hx-target="this"
hx-swap="innerHTML"
x-on:htmx:after-request="isLoading = false">
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for templates that include this fragment
echo "=== Templates including _search_preview_form_section.html ==="
rg -n "{% include.*_search_preview_form_section" --type=html

echo -e "\n=== Alpine.js x-data patterns with isLoading ==="
rg -nP "x-data.*isLoading" radis/extractions/templates/

echo -e "\n=== Checking Alpine.js in extractions_layout.html ==="
rg -n "alpine" radis/extractions/templates/extractions/extractions_layout.html

echo -e "\n=== Checking for Alpine.js script inclusion in extractions templates ==="
rg -n "alpine" radis/extractions/templates/ -i

echo -e "\n=== Viewing the actual file for context ==="
cat -n radis/extractions/templates/extractions/_search_preview_form_section.html | head -20

Repository: openradx/radis

Length of output: 1409


🏁 Script executed:

#!/bin/bash
echo "=== Checking extraction_job_search_form.html for inclusion ==="
cat -n radis/extractions/templates/extractions/extraction_job_search_form.html | head -40

echo -e "\n=== Searching for _search_preview_form_section inclusion in all templates ==="
rg -n "_search_preview_form_section" radis/extractions/templates/

echo -e "\n=== Searching for Alpine.js script tag across all extractions templates ==="
rg -n "script.*src" radis/extractions/templates/

echo -e "\n=== Checking base template or layouts for Alpine.js ==="
find radis/extractions/templates -name "*base*" -o -name "*layout*" | head -10

echo -e "\n=== Checking for Alpine.js in main templates directory ==="
rg -n "alpine" radis/templates/ -i 2>/dev/null | head -20

Repository: openradx/radis

Length of output: 2077


🏁 Script executed:

#!/bin/bash
echo "=== Full content of extractions_layout.html ==="
cat -n radis/extractions/templates/extractions/extractions_layout.html

echo -e "\n=== Searching for Alpine.js CDN or import across entire codebase ==="
rg -rn "alpine" radis/ -i --type=html --type=js | grep -i "import\|cdn\|script" | head -20

echo -e "\n=== Checking if _search_preview_form_section is returned as AJAX response ==="
rg -rn "search_preview_form_section" radis/ --type=py

echo -e "\n=== Searching for the extraction_search_preview view ==="
rg -n "extraction_search_preview" radis/extractions/ --type=py

Repository: openradx/radis

Length of output: 1127


🏁 Script executed:

#!/bin/bash
echo "=== Checking core_layout.html for Alpine.js ==="
cat -n radis/templates/core/core_layout.html | grep -i alpine

echo -e "\n=== Searching for Alpine.js in base template ==="
rg -n "alpine" radis/templates/ -i

echo -e "\n=== Finding extraction_search_preview view in urls.py ==="
cat -n radis/extractions/urls.py | grep -A 3 "extraction_search_preview"

echo -e "\n=== Finding the view implementation ==="
rg -n "def extraction_search_preview" radis/extractions/

echo -e "\n=== Checking the view file for template rendering ==="
rg -B5 -A10 "extraction_search_preview" radis/extractions/views.py | head -40

echo -e "\n=== Verifying if _search_preview_form_section is actually included ==="
rg -rn "_search_preview_form_section" radis/

Repository: openradx/radis

Length of output: 740


🏁 Script executed:

#!/bin/bash
echo "=== Finding actual templates directory ==="
find radis -type d -name templates 2>/dev/null

echo -e "\n=== Checking if _search_preview_form_section.html file exists ==="
find radis -name "_search_preview_form_section.html" 2>/dev/null

echo -e "\n=== Listing extractions templates directory ==="
ls -la radis/extractions/templates/extractions/ 2>/dev/null | head -20

echo -e "\n=== Finding views.py in extractions ==="
find radis/extractions -name "views.py" -o -name "views" -type d

echo -e "\n=== Searching for extraction_search_preview in Python files ==="
rg -rn "extraction_search_preview" radis/ --type=py

echo -e "\n=== Checking urls.py around line 48 for view reference ==="
sed -n '45,55p' radis/extractions/urls.py

Repository: openradx/radis

Length of output: 2458


🏁 Script executed:

#!/bin/bash
echo "=== Finding ExtractionSearchPreviewView in views.py ==="
rg -n "class ExtractionSearchPreviewView" radis/extractions/views.py -A 15

echo -e "\n=== Checking _search_preview.html to see if it includes the fragment ==="
cat -n radis/extractions/templates/extractions/_search_preview.html

echo -e "\n=== Checking core templates for Alpine.js ==="
cat -n radis/core/templates/core/core_layout.html | head -30

echo -e "\n=== Checking if Alpine.js is included in base.html or similar ==="
find radis -name "base.html" -type f -exec cat -n {} \; | head -50

Repository: openradx/radis

Length of output: 3133


🏁 Script executed:

#!/bin/bash
echo "=== Checking common_layout.html for Alpine.js ==="
cat -n radis/templates/common/common_layout.html | head -40

echo -e "\n=== Getting complete ExtractionSearchPreviewView implementation ==="
rg -n "class ExtractionSearchPreviewView" radis/extractions/views.py -A 50 | head -60

echo -e "\n=== Verifying where _search_preview_form_section.html is actually used ==="
rg -rn "_search_preview_form_section" radis/ --type=html --type=py

echo -e "\n=== Checking if Alpine.js is loaded from CDN anywhere ==="
rg -rn "cdn\|unpkg\|jsdelivr" radis/ -i --type=html

Repository: openradx/radis

Length of output: 2724


🏁 Script executed:

#!/bin/bash
echo "=== Getting complete ExtractionSearchPreviewView - looking for template_name or render ==="
sed -n '284,350p' radis/extractions/views.py

echo -e "\n=== Searching for template_name in ExtractionSearchPreviewView ==="
rg -n "template_name" radis/extractions/views.py

echo -e "\n=== Finding root base template ==="
find radis -name "*.html" -path "*/templates/*" -exec grep -l "<!DOCTYPE\|<html" {} \; | head -5

echo -e "\n=== Checking if any HTML file loads Alpine.js ==="
rg -rn "alpine\.js\|@alpinejs\|x-data" radis/templates radis/extractions/templates --type=html | grep -v "x-data\|x-on" | head -20

echo -e "\n=== Viewing extraction_job_search_form.html to understand context ==="
cat -n radis/extractions/templates/extractions/extraction_job_search_form.html

Repository: openradx/radis

Length of output: 5122


🏁 Script executed:

#!/bin/bash
echo "=== Checking git history of the fragment ==="
cd radis && git log --oneline radis/extractions/templates/extractions/_search_preview_form_section.html 2>/dev/null | head -5

echo -e "\n=== Verifying extraction_job_search_form.html is actually used ==="
rg -rn "extraction_job_search_form" radis/ --type=py

echo -e "\n=== Check if _search_preview_form_section should be included in _search_preview.html ==="
grep -n "search_preview_form_section\|search-preview" radis/extractions/templates/extractions/_search_preview.html

echo -e "\n=== Check the HTMX target in extraction_job_search_form.html ==="
grep -n "id.*search-preview\|hx-target.*search-preview" radis/extractions/templates/extractions/extraction_job_search_form.html

Repository: openradx/radis

Length of output: 589


🏁 Script executed:

#!/bin/bash
echo "=== Current working directory ==="
pwd

echo -e "\n=== Verifying file exists ==="
ls -la radis/extractions/templates/extractions/_search_preview_form_section.html

echo -e "\n=== Checking extraction_job_search_form.html for search-preview references ==="
cat radis/extractions/templates/extractions/extraction_job_search_form.html | grep -n "search-preview"

echo -e "\n=== Checking if there's embedded search-preview HTML ==="
cat radis/extractions/templates/extractions/extraction_job_search_form.html

echo -e "\n=== Looking for where _search_preview_form_section might be included dynamically ==="
rg -n "search.preview\|search-preview\|_search_preview" radis/extractions/static/ --type=js 2>/dev/null | head -20

Repository: openradx/radis

Length of output: 1704


Alpine.js library is missing from the application. The template uses Alpine.js directives (including x-on:htmx:after-request="isLoading = false" in this fragment), but the Alpine.js library is not loaded anywhere in the template hierarchy. This will cause runtime errors when these directives are evaluated.

Additionally, this fragment appears to be unused—it is not included in any template. The actual search preview response renders _search_preview.html instead. Either integrate this fragment into the search preview flow or remove it as dead code.

Also note that extraction_job_search_form.html uses Alpine.js directives (x-data and @input.debounce) and will have the same Alpine.js loading issue.

"""

group: int # TODO: Rename to group_id
group: int | None = None # TODO: Rename to group_id
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the SearchFilters class definition
cat -n radis/search/site.py | head -60

Repository: openradx/radis

Length of output: 2560


🏁 Script executed:

# Search for how group field is used in filtering logic
rg -n "\.group" radis/search/ --type=py -B2 -A2

Repository: openradx/radis

Length of output: 390


🏁 Script executed:

# Check for None handling of group in the codebase
rg -n "group.*None\|None.*group" radis/search/ --type=py

Repository: openradx/radis

Length of output: 40


🏁 Script executed:

# Look for SearchFilters instantiation and usage
rg -n "SearchFilters" radis/search/ --type=py | head -20

Repository: openradx/radis

Length of output: 299


🏁 Script executed:

# Look at views.py where SearchFilters is instantiated
sed -n '60,80p' radis/search/views.py

Repository: openradx/radis

Length of output: 824


🏁 Script executed:

# Find the search_provider and how it uses group
rg -n "search_provider|def search" radis/search/site.py | head -20

Repository: openradx/radis

Length of output: 239


🏁 Script executed:

# Check how filters with group are actually used
rg -n "filters\.group\|if.*group\|where.*group" radis/search/ --type=py -B2 -A2

Repository: openradx/radis

Length of output: 40


🏁 Script executed:

# Look for search provider implementations
fd "search.*provider" radis/ --type f

Repository: openradx/radis

Length of output: 40


🏁 Script executed:

# Look for SearchProvider class definition
rg -n "class SearchProvider" radis/ --type=py -A10

Repository: openradx/radis

Length of output: 681


🏁 Script executed:

# Look for Search class implementation
rg -n "class Search" radis/search/site.py -A20

Repository: openradx/radis

Length of output: 2586


🏁 Script executed:

# Check if there are other places where SearchFilters is instantiated
rg -n "SearchFilters(" radis/ --type=py -B2 -A5

Repository: openradx/radis

Length of output: 136


🏁 Script executed:

# Look for where group filter might be used in queries
rg -n "group" radis/search/ --type=py | grep -v "test\|active_group\|GroupFactory\|\.add(group)"

Repository: openradx/radis

Length of output: 567


🏁 Script executed:

# Look for search provider implementations (not just the abstract definition)
find radis -name "*.py" -type f | xargs grep -l "SearchProvider\|def search" | grep -v test | grep -v __pycache__

Repository: openradx/radis

Length of output: 129


🏁 Script executed:

# Check reports app for search implementation
fd "search" radis/reports/ --type f

Repository: openradx/radis

Length of output: 40


🏁 Script executed:

# Look for Elasticsearch or database query building with group filter
rg -n "group" radis/reports/ --type=py -A2 -B2 | head -40

Repository: openradx/radis

Length of output: 2610


🏁 Script executed:

# Check if there's a search implementation that uses filters
rg -n "filters\." radis/ --type=py | grep -v "test\|SearchFilters" | head -20

Repository: openradx/radis

Length of output: 1702


🏁 Script executed:

# Examine the pgsearch provider implementation
cat -n radis/pgsearch/providers.py

Repository: openradx/radis

Length of output: 7150


Add group filtering to _build_filter_query function before making group optional.

The group filter is not applied in radis/pgsearch/providers.py (see _build_filter_query, lines 51-78). While SearchFilters.group is passed from views, it's never used in the actual search query. Making this field optional with a default of None masks a critical permission control gap: users could access reports from groups they don't belong to.

Add a check like:

if filters.group is not None:
    fq &= Q(report__groups__pk=filters.group)
🤖 Prompt for AI Agents
In @radis/search/site.py at line 48, The _build_filter_query function currently
ignores the SearchFilters.group field, so add a conditional filter to apply
group-based access control: inside _build_filter_query (used by
radis/pgsearch/providers.py) check if filters.group is not None and, if so, AND
a clause filtering reports by group primary key (e.g.,
Q(report__groups__pk=filters.group)); ensure you reference the
SearchFilters.group property and combine with the existing fq variable so
queries respect group membership when group is provided.

Comment on lines +81 to +84
logger.debug(
f"LLM returned None for question {question.pk} ",
f"on report {report.pk}",
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bug: Logger arguments not concatenated.

The logger.debug() call passes two separate f-strings as arguments. Only the first string will be logged; the second is silently ignored. This will produce incomplete debug messages.

🐛 Proposed fix
                         if answer is None:
                             logger.debug(
-                                f"LLM returned None for question {question.pk} ",
-                                f"on report {report.pk}",
+                                f"LLM returned None for question {question.pk} on report {report.pk}",
                             )
                             is_accepted = False
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.debug(
f"LLM returned None for question {question.pk} ",
f"on report {report.pk}",
)
logger.debug(
f"LLM returned None for question {question.pk} on report {report.pk}",
)
🤖 Prompt for AI Agents
In @radis/subscriptions/processors.py around lines 81 - 84, The logger.debug
call is passing two separate f-strings so only the first is logged; update the
logger.debug invocation (the call that currently uses logger.debug(f"LLM
returned None for question {question.pk} ", f"on report {report.pk}",)) to pass
a single formatted message — e.g., combine the parts into one f-string or format
string that includes both question.pk and report.pk — so the full message is
emitted.

Comment on lines +102 to +116
<h4>Filter Questions</h4>
{% with questions=subscription.filter_questions.all %}
{% if questions %}
<ul>
{% for question in questions %}
<li>
{{ question.question }} — accepts reports when answer is
{{ question.get_expected_answer_display|lower }}
</li>
{% endfor %}
</ul>
{% else %}
<p class="text-muted mb-0">No filter questions defined.</p>
{% endif %}
{% endwith %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for explicit auto-escape directives or XSS-related changes in templates
rg -n -C3 'autoescape|escape|safe' --type=html radis/subscriptions/templates/

Repository: openradx/radis

Length of output: 1968


🏁 Script executed:

#!/bin/bash
# Find the question model definition to understand the question field
find . -type f -name "*.py" -path "*/subscriptions/*" | xargs rg -l "class.*Question" | head -5

Repository: openradx/radis

Length of output: 155


🏁 Script executed:

#!/bin/bash
# Check the subscription model to understand filter_questions relationship
find . -type f -name "models.py" -path "*/subscriptions/*" | xargs cat

Repository: openradx/radis

Length of output: 6028


🏁 Script executed:

#!/bin/bash
# Get the full subscription_detail.html file to see autoescape directives
cat radis/subscriptions/templates/subscriptions/subscription_detail.html | head -150

Repository: openradx/radis

Length of output: 5691


🏁 Script executed:

#!/bin/bash
# Search specifically for autoescape directives in subscription_detail.html
rg -n 'autoescape|safe' radis/subscriptions/templates/subscriptions/subscription_detail.html

Repository: openradx/radis

Length of output: 40


🏁 Script executed:

#!/bin/bash
# Check if there's a |safe filter on question field anywhere
rg -n 'question.*safe|safe.*question' radis/subscriptions/templates/subscriptions/

Repository: openradx/radis

Length of output: 40


Add explicit auto-escape directive for user-generated content in Filter Questions section.

Lines 108-109 render question.question (user-generated text field) without explicit auto-escape protection. While Django auto-escapes by default, other templates in this app (e.g., _subscribed_item_preview.html) use explicit {% autoescape on %} directives for similar user-generated content. For consistency and defense-in-depth, wrap this section:

Suggested fix
<h4>Filter Questions</h4>
{% autoescape on %}
  {% with questions=subscription.filter_questions.all %}
      {% if questions %}
          <ul>
              {% for question in questions %}
                  <li>
                      {{ question.question }} — accepts reports when answer is
                      {{ question.get_expected_answer_display|lower }}
                  </li>
              {% endfor %}
          </ul>
      {% else %}
          <p class="text-muted mb-0">No filter questions defined.</p>
      {% endif %}
  {% endwith %}
{% endautoescape %}
🤖 Prompt for AI Agents
In @radis/subscriptions/templates/subscriptions/subscription_detail.html around
lines 102 - 116, Wrap the Filter Questions block (the section that iterates
subscription.filter_questions.all and renders question.question and
question.get_expected_answer_display) with an explicit auto-escape directive:
enable autoescaping before the {% with
questions=subscription.filter_questions.all %} block and disable it after the
loop (use {% autoescape on %} ... {% endautoescape %>) so user-generated
question.question is explicitly escaped consistent with other templates.

Comment on lines +545 to +548
assert len(items) == 1 # Should match "12345"
assert item1 in items
assert item2 and item3 not in items

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incorrect assertion logic - only checks one item.

The expression item2 and item3 not in items evaluates as (item2) and (item3 not in items), which only verifies item3 is not in items (assuming item2 is truthy). Both items should be checked separately.

🐛 Proposed fix
     assert len(items) == 1  # Should match "12345"
     assert item1 in items
-    assert item2 and item3 not in items
+    assert item2 not in items
+    assert item3 not in items
🤖 Prompt for AI Agents
In @radis/subscriptions/tests/test_views.py around lines 545 - 548, The
assertion uses `item2 and item3 not in items`, which evaluates incorrectly and
only guarantees `item3` is absent; change to assert both separately by replacing
that expression with explicit checks like `assert item2 not in items` and
`assert item3 not in items` so both `item2` and `item3` are individually
verified not to be in `items` (keep `items`, `item1`, `item2`, `item3`
references intact).

Comment on lines +578 to +581
assert len(items) == 1
assert item2 in items
assert item1 and item3 not in items

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same incorrect assertion pattern.

Same issue as above - this only checks item3 not in items.

🐛 Proposed fix
     assert len(items) == 1
     assert item2 in items
-    assert item1 and item3 not in items
+    assert item1 not in items
+    assert item3 not in items
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert len(items) == 1
assert item2 in items
assert item1 and item3 not in items
assert len(items) == 1
assert item2 in items
assert item1 not in items
assert item3 not in items
🤖 Prompt for AI Agents
In @radis/subscriptions/tests/test_views.py around lines 578 - 581, The
assertion "assert item1 and item3 not in items" is logically incorrect and only
checks item3; update the assertion to explicitly assert both are absent, e.g.
replace it with "assert item1 not in items and item3 not in items" or two
separate asserts "assert item1 not in items" and "assert item3 not in items"
referencing the existing variables items, item1, and item3 in the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants