Skip to content

Conversation

@yodem
Copy link
Collaborator

@yodem yodem commented Jan 29, 2026

Description

Add Django database health check to Elasticsearch reindexing cronjob and clean up excessive debug logging.

Code Changes

scripts/scheduled/reindex_elasticsearch_cronjob.py:

  • Added check_django_database_connection() preflight check to prevent sheet indexing with broken user lookups (would produce "User {id}" instead of real names)
  • Removed truncation limits on failure/skip reporting (was capping at 50/20 items)
  • Removed excessive debug logging throughout the script

sefaria/search.py:

  • Removed excessive debug logging (countdown messages, index state checks, progress logs)
  • Cleaned up redundant log statements in clear_index(), index_all_of_type(), and index_all_of_type_by_index_name()

Notes

The Django database check is critical for Kubernetes deployments - if local-settings-secret is misconfigured, sheet indexing silently produces garbage data. The cronjob now fails fast with a clear error message instead.

dcschreiber and others added 11 commits January 19, 2026 14:19
When the Django database is unreachable, UserProfile silently falls back
to "User {id}" names, which corrupts Elasticsearch indexes during reindexing.

Changes:
- Add logging to UserProfile.__init__ exception handler
- Add _django_user_found flag and has_django_user property
- Update public_user_data() to return None when Django User not found
- Fix: is_staff is a property, not a method (user.is_staff not user.is_staff())

This enables detection of broken database connections and prevents indexing
sheets with fallback names like "User 93082".

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a pre-flight check that verifies Django database connectivity before
starting Elasticsearch reindexing. This prevents indexing sheets with
"User {id}" names when the database is unreachable.

The check:
- Verifies DATABASES_USER, DATABASES_PASSWORD, DATABASES_HOST, DATABASES_PORT
  environment variables are set
- Tests actual database connectivity with a query
- Provides clear error messages pointing to local-settings-secret K8s secret
- Aborts the cronjob early if the check fails

This addresses the root cause of search results showing "User 93082" instead
of actual author names when the indexing cronjob runs with broken DB credentials.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Log owner_id, owner_name, and title when indexing sheets in debug mode.
This helps verify the correct data is being indexed to Elasticsearch.

Run with --debug flag to see the indexed data:
  python scripts/scheduled/reindex_elasticsearch_cronjob.py --debug

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… reindexing scripts

- Removed excessive debug logging statements to clean up the code.
- Enhanced error handling in various functions to ensure smoother execution.
- Adjusted the way failed and skipped text versions are reported for better clarity.
- Eliminated logging for failed Django User lookups to streamline the code.
- Removed the _django_user_found flag and has_django_user property, simplifying user profile handling.
- Updated public_user_data function to focus on returning user data without signaling Django User presence.
- Adjusted error handling to avoid unnecessary complexity during user profile initialization.
…nto bug/sc-40665/-voices-search-results-display-user-id-instead
@yodem yodem requested a review from stevekaplan123 January 29, 2026 10:13
@mergify
Copy link

mergify bot commented Jan 29, 2026

🧪 CI Insights

Here's what we observed from your CI run for 7ebdfd8.

❌ Job Failures

Pipeline Job Health on master Retries 🔍 CI Insights 📄 Logs
Continuous Continuous Testing: PyTest Broken 0 View View

@yodem yodem requested a review from YishaiGlasner February 1, 2026 08:40
…nto bug/sc-40665/-voices-search-results-display-user-id-instead
f"name: {first_user.first_name} {first_user.last_name}")

return True
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if instead of catching a general Exception, we should try...catch User.objects.count separately from User.objects.first.

lines.append(f"\nFailed Text Versions: {len(self.failed_text_versions)}")
lines.append("-" * 40)
for i, failure in enumerate(self.failed_text_versions[:50], 1):
for i, failure in enumerate(self.failed_text_versions, 1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this change or what the original code intended. Why were we stopping at 50 failed_text_versions before and why are we no longer doing this? Same question about only looking at 20 skipped_text_versions below.

@yodem yodem enabled auto-merge February 1, 2026 09:16
except NotFoundError:
# Index doesn't exist - handle race condition where index is deleted between exists() check and delete() call
logger.debug(f"Index not found when attempting to delete - index_name: {index_name}")
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why we are removing so many logger statements. Couldn't they be useful? Especially, here... it seems bad to have a "pass" in an exception

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants