Skip to content

Conversation

@yonadavGit
Copy link
Contributor

@yonadavGit yonadavGit commented Jan 26, 2026

This pull request introduces several improvements and new features to the library links disambiguation pipeline, focusing on enhanced handling and tracking of ambiguous and non-segment-level resolutions, improved failure logging, and expanded integration testing. The changes add more robust metadata tracking, error handling, and developer tools for debugging and quality assurance.

Enhancements to Disambiguation Resolution Tracking:

  • Added new metadata fields (such as llm_resolved_ref_ambiguous, llm_resolved_method_ambiguous, llm_resolved_phrase_ambiguous, and llm_ambiguous_option_valid) to linker output spans to better track the details and validity of ambiguous and non-segment-level resolutions. This includes updating these fields in both ambiguous and non-segment resolution flows and ensuring persistence in the database. [1] [2] [3] [4]

  • Improved logic for handling ambiguous spans: now, ambiguous spans with a valid LLM option are processed, and if a matched segment is found, additional links and MUTC spans are created or updated accordingly. [1] [2] [3] [4]

Failure Logging and Error Handling:

  • Introduced detailed failure logging for Dicta API errors, including a new _record_dicta_failure function and a standardized error payload structure to record non-200 responses from the Dicta API, improving traceability of external API issues. [1] [2]

Developer Tools and Usability Improvements:

  • Enhanced the dispatch_library_links_disambiguation_tasks.py script with command-line arguments for skipping or starting at specific indices for ambiguous and non-segment resolutions, making it easier to debug or resume large dispatch jobs. Also, changed the debug mode to limit to 10 examples by default for quicker testing. [1] [2] [3] [4]

Testing Improvements:

  • Added a new integration test file ambiguous_disambiguator_test.py that parametrizes ambiguous disambiguation cases and skips tests gracefully if required API keys are missing, supporting better test coverage and reliability for ambiguous reference resolution.

Minor Logging and Output Improvements:

  • Improved developer-facing print statements to display resolved phrases when available, aiding in debugging and analysis of LLM-driven resolutions. [1] [2]## Description
    A brief description of the PR

Code Changes

The following changes were made to the files below

Notes

Any additional notes go here

@yonadavGit yonadavGit requested a review from nsantacruz January 26, 2026 14:30
@mergify
Copy link

mergify bot commented Jan 26, 2026

🧪 CI Insights

Here's what we observed from your CI run for 9795f99.

🟢 All jobs passed!

But CI Insights is watching 👀

… responses

- Introduce DictaAPIError for non-200 Dicta API responses
- Add error handling in disambiguation functions to raise and propagate DictaAPIError
- Implement recording of Dicta API failures to a dedicated collection in tasks.py
- Log relevant request and payload details for failed Dicta API calls
…Result and implement phrase extraction function
@nsantacruz nsantacruz marked this pull request as ready for review February 9, 2026 10:00
@yonadavGit yonadavGit changed the title dummy push Improvements to Disambiguator Logic, Additional LinkerOutput Fileds, Tests Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants