Improvements to Disambiguator Logic, Additional LinkerOutput Fileds, Tests #3049

yonadavGit · 2026-01-26T14:30:12Z

This pull request introduces several improvements and new features to the library links disambiguation pipeline, focusing on enhanced handling and tracking of ambiguous and non-segment-level resolutions, improved failure logging, and expanded integration testing. The changes add more robust metadata tracking, error handling, and developer tools for debugging and quality assurance.

Enhancements to Disambiguation Resolution Tracking:

Added new metadata fields (such as llm_resolved_ref_ambiguous, llm_resolved_method_ambiguous, llm_resolved_phrase_ambiguous, and llm_ambiguous_option_valid) to linker output spans to better track the details and validity of ambiguous and non-segment-level resolutions. This includes updating these fields in both ambiguous and non-segment resolution flows and ensuring persistence in the database. [1] [2] [3] [4]
Improved logic for handling ambiguous spans: now, ambiguous spans with a valid LLM option are processed, and if a matched segment is found, additional links and MUTC spans are created or updated accordingly. [1] [2] [3] [4]

Failure Logging and Error Handling:

Introduced detailed failure logging for Dicta API errors, including a new _record_dicta_failure function and a standardized error payload structure to record non-200 responses from the Dicta API, improving traceability of external API issues. [1] [2]

Developer Tools and Usability Improvements:

Enhanced the dispatch_library_links_disambiguation_tasks.py script with command-line arguments for skipping or starting at specific indices for ambiguous and non-segment resolutions, making it easier to debug or resume large dispatch jobs. Also, changed the debug mode to limit to 10 examples by default for quicker testing. [1] [2] [3] [4]

Testing Improvements:

Added a new integration test file ambiguous_disambiguator_test.py that parametrizes ambiguous disambiguation cases and skips tests gracefully if required API keys are missing, supporting better test coverage and reliability for ambiguous reference resolution.

Minor Logging and Output Improvements:

Improved developer-facing print statements to display resolved phrases when available, aiding in debugging and analysis of LLM-driven resolutions. [1] [2]## Description
A brief description of the PR

Code Changes

The following changes were made to the files below

Notes

Any additional notes go here

mergify · 2026-01-26T14:38:45Z

🧪 CI Insights

Here's what we observed from your CI run for 9795f99.

🟢 All jobs passed!

But CI Insights is watching 👀

…atch

… segment

… responses - Introduce DictaAPIError for non-200 Dicta API responses - Add error handling in disambiguation functions to raise and propagate DictaAPIError - Implement recording of Dicta API failures to a dedicated collection in tasks.py - Log relevant request and payload details for failed Dicta API calls

…h with start-from offsets

…uation

…gment disambiguation tasks

…onality

… of matches and adjust slop parameter

…0250929

…rom Hebrew text

…n and enhance confirmation function

…olution

…Result and implement phrase extraction function

…hrase and update linker output fields

…segment references

…lution handling

…ion functionality

… commentary ambiguity

… types

…date debug mode

…task dispatch order

…ambiguator # Conflicts: # sefaria/helper/marked_up_text_chunk_generator.py

…unning-disambiguator' into chore/sc-40225/create-task-for-running-disambiguator

dummy push

9f7e49a

yonadavGit requested a review from nsantacruz January 26, 2026 14:30

yonadavGit added 27 commits January 26, 2026 17:21

chore(tasks): add tqdm progress bars to bulk disambiguation task disp…

0c0b601

…atch

chore(disambiguator): fix SEFARIA_SEARCH_URL to remove redundant /api…

95bffcd

… segment

dummy commit

0d5d8dd

chore(tasks): add resume support for bulk disambiguation task dispatc…

e609e25

…h with start-from offsets

chore(tasks): update ambiguous payload resume point for bulk disambig…

c78bf0e

…uation

chore(tasks): add CLI args for skipping/resuming ambiguous and non-se…

6148a77

…gment disambiguation tasks

chore(tests): add integration tests for non-segment disambiguator

9c7d7cc

chore(disambiguator): add LLM prior formation and confirmation functi…

68430ff

…onality

chore(disambiguator): update Sefaria search functions to return lists…

7a56a13

… of matches and adjust slop parameter

chore(disambiguator): update default LLM model to claude-sonnet-4-5-2…

6f1f4f3

…0250929

chore(disambiguator): add function to strip cantillation and vowels f…

6c5263e

…rom Hebrew text

chore(tests): add additional test cases for non-segment disambiguator

f1adc84

chore(disambiguator): refine LLM prompt for verbatim phrase extractio…

739899c

…n and enhance confirmation function

chore(tests): add test case for ownerless property reference resolution

1394c1a

chore(tests): comment out outdated test case for Hebrew reference res…

967070f

…olution

chore(disambiguator): add llm_resolved_phrase to NonSegmentResolution…

ab3aa29

…Result and implement phrase extraction function

chore(disambiguator): enhance resolution metadata with llm_resolved_p…

5ab5c57

…hrase and update linker output fields

merge master

5db1942

chore(disambiguator): update resolution fields for ambiguous and non-…

5ce06cb

…segment references

chore(disambiguator): reduce debug limit and enhance non-segment reso…

446b407

…lution handling

chore(disambiguator): add integration tests for ambiguous disambiguat…

1f9f626

…ion functionality

chore(disambiguator): implement LLM-based resolution for base text vs…

f81cd5d

… commentary ambiguity

chore(disambiguator): update resolution result fields to use optional…

773005f

… types

chore(disambiguator): enhance handling of ambiguous references and up…

1a91b41

…date debug mode

chore(disambiguator): improve logging for resolution data and adjust …

f2f431c

…task dispatch order

dummy push

0d5e642

fix(disambiguator): add line break

07a7117

nsantacruz marked this pull request as ready for review February 9, 2026 10:00

nsantacruz and others added 6 commits February 9, 2026 12:01

Merge branch 'master' into chore/sc-40225/create-task-for-running-dis…

c39ce12

…ambiguator # Conflicts: # sefaria/helper/marked_up_text_chunk_generator.py

chore: update gunicorn version to 25.0.3

5094ca5

chore: downgrade gunicorn version to 23.0.0

5bd6af6

chore: downgrade gunicorn version to 23.0.0

b1a7e16

merge master

d634a97

Merge remote-tracking branch 'origin/chore/sc-40225/create-task-for-r…

9795f99

…unning-disambiguator' into chore/sc-40225/create-task-for-running-disambiguator

yonadavGit changed the title ~~dummy push~~ Improvements to Disambiguator Logic, Additional LinkerOutput Fileds, Tests Feb 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improvements to Disambiguator Logic, Additional LinkerOutput Fileds, Tests #3049

Improvements to Disambiguator Logic, Additional LinkerOutput Fileds, Tests #3049

yonadavGit commented Jan 26, 2026 •

edited

Loading

Uh oh!

mergify bot commented Jan 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Improvements to Disambiguator Logic, Additional LinkerOutput Fileds, Tests #3049

Are you sure you want to change the base?

Improvements to Disambiguator Logic, Additional LinkerOutput Fileds, Tests #3049

Conversation

yonadavGit commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Changes

Notes

Uh oh!

mergify bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 CI Insights

🟢 All jobs passed!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yonadavGit commented Jan 26, 2026 •

edited

Loading

mergify bot commented Jan 26, 2026 •

edited

Loading