Skip to content

12分钟音频识别,报错. #94

@evenZh

Description

@evenZh

笔记本2060显卡.用paraformer-zh,iic/SenseVoiceSmall 都可以正常识别
但是FunAudioLLM/Fun-ASR-Nano-2512,总是报这个错误.调整配置也没用,关上之后爆显存.-.-,问AI说,是VAD 模型把音频切成了几段,但 ASR 模型(Nano-2512)在处理其中某一段(通常是纯静音段)时返回了空结果。 FunASR 的合并逻辑没有做判空处理,直接去取第 0 个索引,就炸了。
vad_model="fsmn-vad",
vad_kwargs={"max_single_segment_time": 60000},

Traceback (most recent call last):███████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.61s/it]
  File "D:\code\workbuddy\asr\funnano.py", line 38, in <module>
    res = model.generate(
          ^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\funasr\auto\auto_model.py", line 329, in generate
    return self.inference_with_vad(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\funasr\auto\auto_model.py", line 558, in inference_with_vad
    t[0] += vadsegments[j][0]
    ~^^^
KeyError: 0
  0%|                                                                                            | 0/1 [02:23<?, ?it/s]
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions