-
Notifications
You must be signed in to change notification settings - Fork 3.3k
fix: improvements (layout element shuffling and allow more strategies) in experimental two-stages pipeline #2723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…itial prompt Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
|
✅ DCO Check Passed Thanks @ElHachem02, all your commits are properly signed off. 🎉 |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces changes to the prompt building logic in the threaded layout VLM pipeline, modifying how layout information is incorporated into VLM prompts. However, the implementation appears to contain incomplete experimental code that needs refinement before merging.
Key changes:
- Adds logic to filter prompts and only augment "Convert this page to docling." base prompts
- Introduces random shuffling of layout elements and experimental table tag replacement
- Changes prompt composition to use only layout information, discarding the base instruction text
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
|
|
||
| if tag_name == DocumentToken.TABLE: | ||
| print("Found a table!") |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug print statement should be removed before merging to production. Consider using the existing _log.debug() instead if this logging is needed.
| print("Found a table!") | |
| _log.debug("Found a table!") |
| ) | ||
|
|
||
| if tag_name == DocumentToken.TABLE: | ||
| print("Found a table!") |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of replacing table tags with "otsl" is unclear and lacks documentation. This appears to be experimental code that should either be explained with a comment or removed if not needed.
| print("Found a table!") | |
| print("Found a table!") | |
| # Experimental: Replace 'table' tag with 'otsl' for downstream processing. | |
| # TODO: Document the reason for this replacement or remove if not needed. |
| # Shuffle elements | ||
| random.shuffle(layout_elements) |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Randomly shuffling layout elements will produce non-deterministic results and makes the output unreproducible. This is problematic for testing, debugging, and consistent user experience. If shuffling is intentional for training/testing purposes, it should be: (1) documented with a clear explanation, (2) controlled by a configuration parameter, and (3) use a seeded random generator for reproducibility.
| layout_injection = f"{layout_xml}" | ||
|
|
||
| augmented_prompt = base_prompt + layout_injection | ||
| augmented_prompt = layout_injection |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change completely discards the base_prompt and replaces it with only the layout information. The original implementation would concatenate them (base_prompt + layout_injection). This is a breaking change that removes the instruction text "Convert this page to docling." from the prompt. If intentional, this should be documented and the implications explained.
| augmented_prompt = layout_injection | |
| augmented_prompt = base_prompt + "\n" + layout_injection |
|
|
||
| augmented_prompt = base_prompt + layout_injection | ||
| augmented_prompt = layout_injection | ||
| print(f"final prompt is {augmented_prompt}") |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug print statement should be removed before merging to production. The existing _log.debug() at line 148-150 already logs this information appropriately.
| print(f"final prompt is {augmented_prompt}") |
| if TYPE_CHECKING: | ||
| from docling_core.types.doc.page import SegmentedPage | ||
|
|
||
| import random |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The random module import should be placed with the other standard library imports at the top of the file (after line 12, before the third-party imports), following Python's PEP 8 style guide for import ordering.
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Checklist: