-
Notifications
You must be signed in to change notification settings - Fork 770
fix: Support plain text input and reuse the processor in trtllm multimodal mode #5271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: Support plain text input and reuse the processor in trtllm multimodal mode #5271
Conversation
|
👋 Hi shpgy-shpgy! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughThe changes introduce comprehensive multimodal processing support by adding three new functions ( Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/src/dynamo/trtllm/multimodal_processor.py (1)
1-490: Fix pipeline failures: formatting issues block CI.The pre-commit hooks for
isortandblackhave failed, indicating that the code does not meet the project's formatting standards. These must be resolved before the PR can be merged.Run the following commands to fix the formatting:
#!/bin/bash # Fix import sorting and code formatting pre-commit run isort --files components/src/dynamo/trtllm/multimodal_processor.py pre-commit run black --files components/src/dynamo/trtllm/multimodal_processor.py
🧹 Nitpick comments (3)
components/src/dynamo/trtllm/multimodal_processor.py (3)
343-359: Add logging for exception handling in image_audio modality detection.The silent exception handling in the
image_audiomodality detection logic makes debugging difficult when media loading fails. Consider logging the exceptions before passing.♻️ Proposed fix to add logging
if _modal is None: try: data = load_image(m, format=image_data_format, device=device) _modal = "image" - except Exception: + except Exception as e: + logging.debug(f"Failed to load as image: {e}") pass if _modal is None: try: data = load_audio(m, device=device) _modal = "audio" - except Exception: + except Exception as e: + logging.debug(f"Failed to load as audio: {e}") pass
396-397: Address static analysis warnings for loop variable and zip.The loop variable
prompt_idxis unused, andzip()should include an explicitstrict=parameter for better error detection.♻️ Proposed fix
- for prompt_idx, (prompt, - media) in enumerate(zip(prompts, media_or_embeddings)): + for prompt, media in zip(prompts, media_or_embeddings, strict=True):
271-436: Consider refactoring this complex function.The
default_multimodal_input_loaderfunction is 165 lines long and handles multiple modalities with significant complexity. Consider extracting the modality-specific conversion logic (lines 286-371) into separate helper functions for better maintainability.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/src/dynamo/trtllm/multimodal_processor.pylib/llm/src/preprocessor.rs
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: tanmayv25
Repo: ai-dynamo/dynamo PR: 5143
File: examples/multimodal/components/processor.py:207-215
Timestamp: 2026-01-03T01:27:16.084Z
Learning: In examples/multimodal/components/processor.py, the Dynamo multimodal processor intentionally supports flexible content ordering in multimodal requests, allowing audio, image, or video content to appear before text content. This design aligns with the OpenAI multimodal API specification which permits content items in any order, treating the text-first recommendation as advisory rather than mandatory.
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.
Learnt from: tanmayv25
Repo: ai-dynamo/dynamo PR: 5143
File: examples/multimodal/components/processor.py:207-214
Timestamp: 2026-01-03T01:29:50.237Z
Learning: In examples/multimodal/components/processor.py, the processor example is designed to support only a single text field across all messages in a request. Supporting multiple text fields is out of scope for the current implementation.
📚 Learning: 2026-01-03T01:27:16.084Z
Learnt from: tanmayv25
Repo: ai-dynamo/dynamo PR: 5143
File: examples/multimodal/components/processor.py:207-215
Timestamp: 2026-01-03T01:27:16.084Z
Learning: In examples/multimodal/components/processor.py, the Dynamo multimodal processor intentionally supports flexible content ordering in multimodal requests, allowing audio, image, or video content to appear before text content. This design aligns with the OpenAI multimodal API specification which permits content items in any order, treating the text-first recommendation as advisory rather than mandatory.
Applied to files:
components/src/dynamo/trtllm/multimodal_processor.py
📚 Learning: 2026-01-03T01:29:50.237Z
Learnt from: tanmayv25
Repo: ai-dynamo/dynamo PR: 5143
File: examples/multimodal/components/processor.py:207-214
Timestamp: 2026-01-03T01:29:50.237Z
Learning: In examples/multimodal/components/processor.py, the processor example is designed to support only a single text field across all messages in a request. Supporting multiple text fields is out of scope for the current implementation.
Applied to files:
components/src/dynamo/trtllm/multimodal_processor.py
📚 Learning: 2025-10-28T04:09:48.264Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_handlers/processor_handler.py:66-72
Timestamp: 2025-10-28T04:09:48.264Z
Learning: In components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, the AutoTokenizer.from_pretrained call with trust_remote_code=True is intentional and expected for the vLLM multimodal handler implementation.
Applied to files:
components/src/dynamo/trtllm/multimodal_processor.py
📚 Learning: 2025-09-16T19:47:30.312Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.
Applied to files:
components/src/dynamo/trtllm/multimodal_processor.pylib/llm/src/preprocessor.rs
🧬 Code graph analysis (2)
components/src/dynamo/trtllm/multimodal_processor.py (1)
examples/multimodal/utils/audio_loader.py (1)
load_audio(61-80)
lib/llm/src/preprocessor.rs (2)
lib/llm/src/preprocessor/prompt/template/oai.rs (2)
messages(211-214)messages(268-279)lib/llm/src/preprocessor/prompt.rs (1)
messages(54-54)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/5271/merge) by shpgy-shpgy.
components/src/dynamo/trtllm/multimodal_processor.py
[error] pre-commit: isort hook failed; files were modified by this hook. The hook will fail CI until changes are committed.
[error] pre-commit: black hook failed; files were reformatted by this hook. The hook will fail CI until changes are committed.
[error] pre-commit hooks made changes. Re-run pre-commit with 'pre-commit run --all-files'.
🪛 Ruff (0.14.10)
components/src/dynamo/trtllm/multimodal_processor.py
314-316: Avoid specifying long messages outside the exception class
(TRY003)
326-328: Avoid specifying long messages outside the exception class
(TRY003)
335-337: Avoid specifying long messages outside the exception class
(TRY003)
349-350: try-except-pass detected, consider logging the exception
(S110)
349-349: Do not catch blind exception: Exception
(BLE001)
355-356: try-except-pass detected, consider logging the exception
(S110)
355-355: Do not catch blind exception: Exception
(BLE001)
358-358: Avoid specifying long messages outside the exception class
(TRY003)
370-370: Avoid specifying long messages outside the exception class
(TRY003)
396-396: Loop control variable prompt_idx not used within loop body
Rename unused prompt_idx to _prompt_idx
(B007)
397-397: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Build and Test - dynamo
- GitHub Check: clippy (lib/runtime/examples)
- GitHub Check: tests (launch/dynamo-run)
- GitHub Check: tests (.)
- GitHub Check: clippy (launch/dynamo-run)
- GitHub Check: clippy (.)
- GitHub Check: tests (lib/runtime/examples)
- GitHub Check: clippy (lib/bindings/python)
- GitHub Check: tests (lib/bindings/python)
🔇 Additional comments (6)
lib/llm/src/preprocessor.rs (1)
373-379: LGTM! Consistent handling of original messages.Moving the
extra_argspreparation outside the multimodal-only conditional ensures that downstream processors (like TRT-LLM multimodal) can always access the original messages, regardless of whether multimodal data was detected. This aligns well with the PR objective to support plain text inputs in multimodal scenarios.components/src/dynamo/trtllm/multimodal_processor.py (5)
20-33: LGTM! Necessary imports for multimodal support.The new imports from
transformersandtensorrt_llmare required for the multimodal processing functionality being added.
40-50: LGTM! Good use of Protocol for type safety.The
TokenizerProtocolprovides clean type hinting for tokenizers, resolving mypy errors related to the decode method.
76-80: LGTM! Processor initialization supports request reuse.The processor is now initialized once at startup and reused across requests, eliminating repeated initialization overhead as intended by the PR objectives. The use of
trust_remote_code=Trueis consistent with established patterns in the Dynamo multimodal implementation.Based on learnings,
trust_remote_code=Trueis intentional for multimodal handlers.
191-228: LGTM! Refactored to support both pure-text and multimodal inputs.The branching logic correctly handles pure-text requests (using
get_multimodal_inputs) and multimodal requests (usingdefault_multimodal_input_loader), while reusing the pre-initializedprocessorandtokenizerto eliminate per-request overhead.
439-462: LGTM! Clean async implementation for multimodal input assembly.The function properly orchestrates the parsing of chat messages, application of chat templates, and asynchronous retrieval of multimodal data.
ee5c597 to
5cf3f29
Compare
|
Hi @shpgy-shpgy, thanks for raising this - @indrajit96 @KrishnanPrash please review |
|
/ok to test 5cf3f29 |
Signed-off-by: shpgy-shpgy <875664365@qq.com>
Signed-off-by: shpgy-shpgy <875664365@qq.com>
5cf3f29 to
e7a8b75
Compare
Overview:
The fix modifies the initialization logic to create and reuse a single processor across multiple requests. This eliminates the repeated initialization cost.
The fix also enables the router to handle plain text inputs in multi-modal scenarios.
Details:
Created a new default_multimodal_input_loader function in Dynamo that mirrors the necessary steps originally handled by the external trtllm module. Modified the function signature to accept a pre-initialized processor object as a parameter. This allows the caller to instantiate the processor once and pass the same processor for all subsequent requests.
The handling of plain text inputs is aligned with the method used in the tensorrt_llm.
Where should the reviewer start?
components/src/dynamo/trtllm/multimodal_processor.py
lib/llm/src/preprocessor.rs
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.