Skip to content

Added multimodal query support for VLM Embed#362

Open
smasurekar wants to merge 1 commit intodevelopfrom
dev/smasurekar/vlm-embed-multimodal-query
Open

Added multimodal query support for VLM Embed#362
smasurekar wants to merge 1 commit intodevelopfrom
dev/smasurekar/vlm-embed-multimodal-query

Conversation

@smasurekar
Copy link
Collaborator

@smasurekar smasurekar commented Feb 17, 2026

Concatenate multimodal content for VLM Embed

Multimodal retriever queries (text + image) are now concatenated into a single string for VLM embedding instead of returning only the image URL, so the embed model receives both text and image in one query. Text is joined with \n\n, then the image URL is appended (one image supported). Unit tests updated for the new concatenated format.

@smasurekar smasurekar added the enhancement New feature or request label Feb 17, 2026
@smasurekar smasurekar closed this Feb 17, 2026
@smasurekar smasurekar reopened this Feb 17, 2026
@smasurekar smasurekar force-pushed the dev/smasurekar/vlm-embed-multimodal-query branch from 5bda440 to 4564039 Compare February 17, 2026 09:19
@smasurekar smasurekar force-pushed the dev/smasurekar/vlm-embed-multimodal-query branch from 4564039 to 7487c01 Compare February 19, 2026 05:20
@smasurekar smasurekar force-pushed the dev/smasurekar/vlm-embed-multimodal-query branch from 7487c01 to 95f7dcd Compare February 27, 2026 06:33
Signed-off-by: Swapnil Masurekar <smasurekar@nvidia.com>
@smasurekar smasurekar force-pushed the dev/smasurekar/vlm-embed-multimodal-query branch from 95f7dcd to 4a78d0d Compare February 27, 2026 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request release-26.03

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants