-
-
Notifications
You must be signed in to change notification settings - Fork 311
Bug/sc 40665/ voices search results display user id instead #3063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Bug/sc 40665/ voices search results display user id instead #3063
Conversation
When the Django database is unreachable, UserProfile silently falls back
to "User {id}" names, which corrupts Elasticsearch indexes during reindexing.
Changes:
- Add logging to UserProfile.__init__ exception handler
- Add _django_user_found flag and has_django_user property
- Update public_user_data() to return None when Django User not found
- Fix: is_staff is a property, not a method (user.is_staff not user.is_staff())
This enables detection of broken database connections and prevents indexing
sheets with fallback names like "User 93082".
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a pre-flight check that verifies Django database connectivity before
starting Elasticsearch reindexing. This prevents indexing sheets with
"User {id}" names when the database is unreachable.
The check:
- Verifies DATABASES_USER, DATABASES_PASSWORD, DATABASES_HOST, DATABASES_PORT
environment variables are set
- Tests actual database connectivity with a query
- Provides clear error messages pointing to local-settings-secret K8s secret
- Aborts the cronjob early if the check fails
This addresses the root cause of search results showing "User 93082" instead
of actual author names when the indexing cronjob runs with broken DB credentials.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Log owner_id, owner_name, and title when indexing sheets in debug mode. This helps verify the correct data is being indexed to Elasticsearch. Run with --debug flag to see the indexed data: python scripts/scheduled/reindex_elasticsearch_cronjob.py --debug Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… reindexing scripts - Removed excessive debug logging statements to clean up the code. - Enhanced error handling in various functions to ensure smoother execution. - Adjusted the way failed and skipped text versions are reported for better clarity.
…job configuration
- Eliminated logging for failed Django User lookups to streamline the code. - Removed the _django_user_found flag and has_django_user property, simplifying user profile handling. - Updated public_user_data function to focus on returning user data without signaling Django User presence. - Adjusted error handling to avoid unnecessary complexity during user profile initialization.
…nto bug/sc-40665/-voices-search-results-display-user-id-instead
…nto bug/sc-40665/-voices-search-results-display-user-id-instead
| f"name: {first_user.first_name} {first_user.last_name}") | ||
|
|
||
| return True | ||
| except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if instead of catching a general Exception, we should try...catch User.objects.count separately from User.objects.first.
| lines.append(f"\nFailed Text Versions: {len(self.failed_text_versions)}") | ||
| lines.append("-" * 40) | ||
| for i, failure in enumerate(self.failed_text_versions[:50], 1): | ||
| for i, failure in enumerate(self.failed_text_versions, 1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this change or what the original code intended. Why were we stopping at 50 failed_text_versions before and why are we no longer doing this? Same question about only looking at 20 skipped_text_versions below.
| except NotFoundError: | ||
| # Index doesn't exist - handle race condition where index is deleted between exists() check and delete() call | ||
| logger.debug(f"Index not found when attempting to delete - index_name: {index_name}") | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why we are removing so many logger statements. Couldn't they be useful? Especially, here... it seems bad to have a "pass" in an exception
Description
Add Django database health check to Elasticsearch reindexing cronjob and clean up excessive debug logging.
Code Changes
scripts/scheduled/reindex_elasticsearch_cronjob.py:check_django_database_connection()preflight check to prevent sheet indexing with broken user lookups (would produce "User {id}" instead of real names)sefaria/search.py:clear_index(),index_all_of_type(), andindex_all_of_type_by_index_name()Notes
The Django database check is critical for Kubernetes deployments - if
local-settings-secretis misconfigured, sheet indexing silently produces garbage data. The cronjob now fails fast with a clear error message instead.