【5分钟】搭建本地多模态大模型Qwen2.5-VL

针对交通场景的场景理解和识别一直是一个挑战。如何像人类一样理解场景中主车和交通参与者博弈行为，就需要一个多模态的大模型来承担此任务。最近多模态开源较不错效果是QWen-2.5VL，准备拿来小试牛刀，看看此模型效果如何。1、本地模型搭建过程3、由于在国内，安装modelscope并下载模型权重相关文件4、启动Web推理服务5、模型推理效果6、基于openai 接口形式推理安装依赖启动本地推理API服

程序员笑武

1853人浏览 · 2025-04-14 20:07:52

程序员笑武 · 2025-04-14 20:07:52 发布

针对交通场景的场景理解和识别一直是一个挑战。如何像人类一样理解场景中主车和交通参与者博弈行为，就需要一个多模态的大模型来承担此任务。最近多模态开源较不错效果是QWen-2.5VL，准备拿来小试牛刀，看看此模型效果如何。

一、环境准备

#创建环境conda create -n qwen-2.5 python=3.10#激活环境conda activate qwen-2.5

二、模型部署

1、本地模型搭建过程

下载模型：

git clone https://github.com/QwenLM/Qwen2.5-VLcd Qwen2.5-VL

# 文件列表如下：cookbooks  docker  LICENSE  qwen-vl-utils  README.md  requirements_web_demo.txt  web_demo_mm.py  web_demo_streaming

2、安装依赖：

pip install -r requirements_web_demo.txt

3、由于在国内，安装modelscope并下载模型权重相关文件

pip install modelscope

modelscope download --model Qwen/Qwen2.5-VL-3B-Instruct --local_dir ./model

4、启动Web推理服务

$  python web_demo_mm.py --checkpoint-path "Qwen/Qwen2.5-VL-3B-Instruct"Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:08<00:00,  4.13s/it]Some parameters are on the meta device because they were offloaded to the cpu.Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`./home/pony/文档/workspace/Qwen2.5-VL/web_demo_mm.py:258: UserWarning: You have not specified a value for the `type` parameter. Defaulting to the 'tuples' format for chatbot messages, but this is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style dictionaries with 'role' and 'content' keys.  chatbot = gr.Chatbot(label='Qwen2.5-VL', elem_classes='control-height', height=500)* Running on local URL:  http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.

5、模型推理效果

6、基于openai 接口形式推理

安装依赖

pip install git+https://github.com/huggingface/transformers@f3f6c86582611976e72be054675e2bf0abb5f775pip install acceleratepip install qwen-vl-utilspip install 'vllm>0.7.2'

启动本地推理API服务

$ vllm serve Qwen/Qwen2.5-VL-3B-Instruct --port 8000 --host 0.0.0.0 --dtype bfloat16 --limit-mm-per-prompt image=1

API推理

curl http://localhost:8000/v1/chat/completions \    -H "Content-Type: application/json" \    -d '{    "model": "Qwen/Qwen2.5-VL-3B-Instruct",    "messages": [    {"role": "system", "content": "You are a helpful assistant."},    {"role": "user", "content": [        {"type": "image_url", "image_url": {"url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"}},        {"type": "text", "text": "What is the text in the illustrate?"}    ]}    ]    }'

三、显卡性能

RTX 3070跑3B模型

三、挑战与未来应用

挑战方面

本地显存太小了，无法做7B 32B 72B的本体部署以及后续模型任务微调。
模型量化：需要针对此模型进行量化，以减少对高配显存的依赖。

模型微调

四、未来应用

在此多模态模型基础之上可以做微调，后续适配特定多模态任务，后续逐步分享针对Qwen2.5-VL的模型微调套路。

如何学习大模型 AI ？

由于新岗位的生产效率，要优于被取代岗位的生产效率，所以实际上整个社会的生产效率是提升的。

但是具体到个人，只能说是：

“最先掌握AI的人，将会比较晚掌握AI的人有竞争优势”。

这句话，放在计算机、互联网、移动互联网的开局时期，都是一样的道理。

我在一线互联网企业工作十余年里，指导过不少同行后辈。帮助很多人得到了学习和成长。

我意识到有很多经验和知识值得分享给大家，也可以通过我们的能力和经验解答大家在人工智能学习中的很多困惑，所以在工作繁忙的情况下还是坚持各种整理和分享。但苦于知识传播途径有限，很多互联网行业朋友无法获得正确的资料得到学习提升，故此将并将重要的AI大模型资料包括AI大模型入门学习思维导图、精品AI大模型学习书籍手册、视频教程、实战学习等录播视频免费分享出来。

在这里插入图片描述