语音识别 SenseVoice与FunASR对比

语音识别 SenseVoice与FunASR对比

AI视觉网奇

2124人浏览 · 2025-08-08 17:14:25

AI视觉网奇 · 2025-08-08 17:14:25 发布

目录

阿里SenseVoice与FunASR功能对比

FunASR开源了。

1. 功能定位

2. 技术特点

阿里SenseVoice与FunASR功能对比

https://github.com/FunAudioLLM/SenseVoice/blob/main/finetune.sh

SenseVoice small 可以开源调用，large版没开源，需要调用sdk进行处理

SenseVoice

from transformers import pipeline
import torch
import torchaudio

def sensevoice_asr(audio_path):
    """
    使用 SenseVoice 进行中文语音识别
    """
    # 创建语音识别pipeline
    pipe = pipeline(
        "automatic-speech-recognition",
        model="deepseek-ai/sensevoice",
        torch_dtype=torch.float16,
        device="cuda" if torch.cuda.is_available() else "cpu"
    )
    
    # 进行语音识别
    result = pipe(
        audio_path,
        max_new_tokens=128,
        generate_kwargs={"language": "zh"}
    )
    
    return result["text"]

def sensevoice_asr_with_timestamps(audio_path):
    """
    使用 SenseVoice 进行带时间戳的语音识别
    """
    pipe = pipeline(
        "automatic-speech-recognition",
        model="deepseek-ai/sensevoice",
        torch_dtype=torch.float16,
        device="cuda" if torch.cuda.is_available() else "cpu"
    )
    
    # 带时间戳的识别
    result = pipe(
        audio_path,
        max_new_tokens=128,
        generate_kwargs={"language": "zh"},
        return_timestamps=True
    )
    
    return result

def batch_asr(audio_paths):
    """
    批量处理多个音频文件
    """
    pipe = pipeline(
        "automatic-speech-recognition",
        model="deepseek-ai/sensevoice",
        torch_dtype=torch.float16,
        device="cuda" if torch.cuda.is_available() else "cpu"
    )
    
    results = []
    for audio_path in audio_paths:
        result = pipe(
            audio_path,
            max_new_tokens=128,
            generate_kwargs={"language": "zh"}
        )
        results.append({
            "file": audio_path,
            "text": result["text"]
        })
    
    return results

if __name__ == "__main__":
    # 示例用法
    audio_file = "path/to/your/audio.wav"  # 替换为你的音频文件路径
    
    # 基本语音识别
    print("=== 基本语音识别 ===")
    text = sensevoice_asr(audio_file)
    print(f"识别结果: {text}")
    
    # 带时间戳的识别
    print("\n=== 带时间戳的识别 ===")
    result_with_timestamps = sensevoice_asr_with_timestamps(audio_file)
    print(f"完整结果: {result_with_timestamps}")
    
    # 如果是chunked结果（包含时间戳）
    if "chunks" in result_with_timestamps:
        print("\n时间戳详情:")
        for chunk in result_with_timestamps["chunks"]:
            print(f"{chunk['timestamp'][0]:.2f}s - {chunk['timestamp'][1]:.2f}s: {chunk['text']}")

FunASR开源了。

阿里云的SenseVoice和FunASR虽然同属语音处理技术领域，但它们在功能定位、技术特点和适用场景上有显著差异。以下是两者的主要区别：

1. 功能定位

SenseVoice
是FunAudioLLM项目中的语音理解模型，专注于多任务语音处理，包括自动语音识别（ASR）、情感识别（SER）、声学事件检测（AED）和语种识别（LID）。其核心优势在于多语言支持（50+语言）和低延迟推理（10秒音频仅需70毫秒）369。
示例应用：客服录音情感分析、会议语音事件检测（如笑声、掌声）58。
FunASR
是阿里巴巴达摩院开源的端到端语音识别框架，主打工业级ASR全链路处理，包括语音活动检测（VAD）、标点恢复、说话人分离等。其核心模型如Paraformer-streaming专为实时转录优化（延迟<200ms）247。
示例应用：企业会议实时转写、多说话人分离场景47。

2. 技术特点

对比项	SenseVoice	FunASR
模型架构	非自回归端到端（Small版）或编码器-解码器（Large版）	非自回归Paraformer架构
多语言支持	50+语言（Large版）	12种语言（中文、英文、粤语等）
延迟性能	10秒音频70ms（Small版）	流式模型延迟<200ms
扩展功能	情感识别、事件检测	VAD、标点恢复、说话人分离
部署场景	适合多模态交互、情感分析	适合企业级实时转录、长音频处理

火山引擎开发者社区

更多推荐

cover

Dify 知识库构建实战指南

火山引擎开发者社区

cover

论文AIGC痕迹太重？试试2个免费降AI率工具，还有免费ai查重！

火山引擎开发者社区

cover

脉脉独家【AI创作者xAMA】｜当豆包手机遭遇“全网封杀”：AI学会操作手机，我们的饭碗还保得住吗？

火山引擎开发者社区

所有评论(0)

查看更多评论

AI视觉网奇

已为社区贡献8条内容