-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
import dashscope
import json
import os
from http import HTTPStatus
# 多模态融合向量:将文本、图片、视频融合成一个融合向量
# 适用于跨模态检索、图搜等场景
text = "这是一段测试文本,用于生成多模态融合向量"
image = "https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png"
video = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250107/lbcemt/new+video.mp4"
# 输入包含文本、图片、视频,模型会将它们融合成一个融合向量
input_data = [
{
"text": text,
"image": image,
"video": video
}
]
# 使用 qwen3-vl-embedding 生成融合向量
resp = dashscope.MultiModalEmbedding.call(
# 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen3-vl-embedding",
input=input_data,
# 可选参数:指定向量维度(支持 2560, 2048, 1536, 1024, 768, 512, 256,默认 2560)
# parameters={"dimension": 1024}
)
print(json.dumps(resp.output, indent=4))官方文档给出的示例文档只有单个 image 和 单个 video , 是否能支持多图多视频?
qwen3-vl-embedding 模型底层是支持多图多视频的(QwenLM/Qwen3-VL-Embedding#10), 通过 dashscope 调用时也应该支持吧
期待的调用方式:
images = ["https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_1.png", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/256_2.png"]
videos = ["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250107/lbcemt/new+video.mp4", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250108/lbcemt/new+video.mp4"]Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels