-
-
Notifications
You must be signed in to change notification settings - Fork 311
Improvements to Disambiguator Logic, Additional LinkerOutput Fileds, Tests #3049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
yonadavGit
wants to merge
35
commits into
master
Choose a base branch
from
chore/sc-40225/create-task-for-running-disambiguator
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Improvements to Disambiguator Logic, Additional LinkerOutput Fileds, Tests #3049
yonadavGit
wants to merge
35
commits into
master
from
chore/sc-40225/create-task-for-running-disambiguator
+866
−106
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
🧪 CI InsightsHere's what we observed from your CI run for 9795f99. 🟢 All jobs passed!But CI Insights is watching 👀 |
… responses - Introduce DictaAPIError for non-200 Dicta API responses - Add error handling in disambiguation functions to raise and propagate DictaAPIError - Implement recording of Dicta API failures to a dedicated collection in tasks.py - Log relevant request and payload details for failed Dicta API calls
…h with start-from offsets
…gment disambiguation tasks
… of matches and adjust slop parameter
…n and enhance confirmation function
…Result and implement phrase extraction function
…hrase and update linker output fields
…segment references
…ion functionality
… commentary ambiguity
…task dispatch order
…ambiguator # Conflicts: # sefaria/helper/marked_up_text_chunk_generator.py
…unning-disambiguator' into chore/sc-40225/create-task-for-running-disambiguator
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces several improvements and new features to the library links disambiguation pipeline, focusing on enhanced handling and tracking of ambiguous and non-segment-level resolutions, improved failure logging, and expanded integration testing. The changes add more robust metadata tracking, error handling, and developer tools for debugging and quality assurance.
Enhancements to Disambiguation Resolution Tracking:
Added new metadata fields (such as
llm_resolved_ref_ambiguous,llm_resolved_method_ambiguous,llm_resolved_phrase_ambiguous, andllm_ambiguous_option_valid) to linker output spans to better track the details and validity of ambiguous and non-segment-level resolutions. This includes updating these fields in both ambiguous and non-segment resolution flows and ensuring persistence in the database. [1] [2] [3] [4]Improved logic for handling ambiguous spans: now, ambiguous spans with a valid LLM option are processed, and if a matched segment is found, additional links and MUTC spans are created or updated accordingly. [1] [2] [3] [4]
Failure Logging and Error Handling:
_record_dicta_failurefunction and a standardized error payload structure to record non-200 responses from the Dicta API, improving traceability of external API issues. [1] [2]Developer Tools and Usability Improvements:
dispatch_library_links_disambiguation_tasks.pyscript with command-line arguments for skipping or starting at specific indices for ambiguous and non-segment resolutions, making it easier to debug or resume large dispatch jobs. Also, changed the debug mode to limit to 10 examples by default for quicker testing. [1] [2] [3] [4]Testing Improvements:
ambiguous_disambiguator_test.pythat parametrizes ambiguous disambiguation cases and skips tests gracefully if required API keys are missing, supporting better test coverage and reliability for ambiguous reference resolution.Minor Logging and Output Improvements:
A brief description of the PR
Code Changes
The following changes were made to the files below
Notes
Any additional notes go here