vLLM 部署 InternVL2_5

模型下载的渠道很多，这里使用 modelscope 进行下载，等，简单部署命令如下，默认使用。中给定了初始化的超参数，例如。可以看到各种路由信息。

DeepHao

1394人浏览 · 2025-03-27 01:09:19

DeepHao · 2025-03-27 01:09:19 发布

vllm 中文文档
 OpenAI 兼容服务器部署参数

模型下载

模型下载的渠道很多，这里使用 modelscope 进行下载，InternVL2_5-1B首页，

安装 modelscope

pip install modelscope

下载模型

from modelscope import snapshot_download
model_dir = snapshot_download('OpenGVLab/InternVL2_5-1B', local_dir="xxx/OpenGVLab/InternVL2_5-1B")

服务部署与请求

在 OpenGVLab/InternVL2_5-1B/config.json 中给定了初始化的超参数，例如temperature、top_p、top_k等，简单部署命令如下，默认使用 8000 端口

vllm serve OpenGVLab/InternVL2_5-1B
or
python -m vllm.entrypoints.openai.api_server --model=OpenGVLab/InternVL2_5-1B

在 http://127.0.0.1:8000/docs 可以看到各种路由信息

请求脚本如下

import base64
import requests
from openai import OpenAI

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id


def encode_image_base64_from_url(image_url: str) -> str:
    """Encode an image retrieved from a remote url to base64 format."""

    with requests.get(image_url) as response:
        response.raise_for_status()
        result = base64.b64encode(response.content).decode('utf-8')

    return result


def image_to_base64(image_path):
    with open(image_path, "rb") as image_file:
        image_data = image_file.read()
        base64_str = base64.b64encode(image_data).decode('utf-8')
        return base64_str  # 添加MIME类型前缀[7](@ref)


def single_image_call(image_path):
    image_base64 = image_to_base64(image_path=image_path)
    chat_completion_from_base64 = client.chat.completions.create(
        messages=[{
            "role":
                "user",
            "content": [
                {
                    "type": "text",
                    "text": "What’s in this image?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{image_base64}"
                    },
                },
            ],
        }],
        model=model,
        max_tokens=8192,
        top_p=0.9,
        temperature=0.0,
    )
    return chat_completion_from_base64.choices[0].message.content


total_result = []
for i in range(20):
    result = single_image_call("demo.jpg")
    total_result.append(result)

# 验证多次推理结果是否相同
if len(set(total_result)) == 1:
    print(True)
else:
    print(False)

请求脚本使用 temperature=0.0 保证每次推理结果相同

vllm 服务刚启动了，前几个请求始终会出现差异，可能是 bug

–trust-remote-code 加载用户自己训练的模型，需要该参数

–port 8765 指定端口号

–tensor-parallel-size 张量并行数，部署服务需要的显卡数量

–seed 42 指定随机种子，使用 temperature=0.0，无需该参数也能保证每次推理结果相同

vLLM 示例命令

vllm serve xxx/checkpoint-yyy --port 8567 --trust-remote-code --max-num-batched-tokens 8192 --seed 42 --tensor-parallel-size 8

火山引擎开发者社区

火山引擎开发者社区是火山引擎打造的AI技术生态平台，聚焦Agent与大模型开发，提供豆包系列模型（图像/视频/视觉）、智能分析与会话工具，并配套评测集、动手实验室及行业案例库。社区通过技术沙龙、挑战赛等活动促进开发者成长，新用户可领50万Tokens权益，助力构建智能应用。

更多推荐

OBS Studio音频分离：人声与背景音乐分离全攻略

你是否曾在直播或录屏时遇到这样的困境：想要单独调整人声音量却影响了背景音乐，或是后期剪辑时无法消除环境噪音？OBS Studio（Open Broadcaster Software Studio，开放广播软件工作室）作为免费开源的音视频录制与直播工具，提供了强大的音频处理框架，通过合理配置滤镜链与外部工具组合，可实现专业级别的人声与背景音乐分离。本文将系统讲解3种分离方案，从基础声道分离到AI驱动

火山引擎开发者社区

lmstudio-python：简化LLM操作的强大Python SDK

lmstudio-python 是一款功能强大的 Python SDK，旨在帮助开发者轻松地使用大型语言模型（LLM）进行文本生成、对话系统搭建以及其他相关应用。通过简单易用的API，lmstudio-python 能够让用户快速集成 LLM 功能，无论是进行基础文本补全还是复杂的对话系统设计。## 项目技术分析lmstudio-python SDK 以 Python 为基础，提供了一个同

火山引擎开发者社区

OBS Studio AI增强：智能场景识别与自动优化全攻略

你是否曾在直播切换场景时手忙脚乱？是否因复杂的参数配置而错失最佳直播时机？OBS Studio作为开源直播软件的佼佼者，虽提供强大的自定义功能，但传统手动操作已难以满足专业创作者对效率和质量的双重需求。本文将系统介绍如何通过AI技术增强OBS Studio的核心能力，重点实现智能场景识别与自动参数优化，让你的直播制作流程效率提升300%。读完本文你将获得：- 基于OpenCV的实时场景分析插