Fugu-MT 論文翻訳(概要): SENS-ASR: Semantic Embedding injection in Neural-transducer for Streaming Automatic Speech Recognition

論文の概要: SENS-ASR: Semantic Embedding injection in Neural-transducer for Streaming Automatic Speech Recognition

arxiv url: http://arxiv.org/abs/2603.10005v2
Date: Thu, 12 Mar 2026 12:36:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-15 16:38:22.554801
Title: SENS-ASR: Semantic Embedding injection in Neural-transducer for Streaming Automatic Speech Recognition
Title（参考訳）: SENS-ASR: 音声認識のストリーム化のためのニューラルトランスデューサにおける意味的埋め込み注入
Authors: Youness Dkhissi, Valentin Vielzeuf, Elys Allesiardo, Anthony Larcher,
Abstract要約: 本稿では,SENS-ASRを提案する。SENS-ASRは意味情報を用いて音響情報を補強することにより,ストリームASRの転写品質を向上させる手法である。標準データセットの実験では、SENS-ASRは小さなチャンクストリーミングシナリオでワードエラー率を大幅に改善している。
参考スコア（独自算出の注目度）: 3.0406449751520754
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many Automatic Speech Recognition (ASR) applications require streaming processing of the audio data. In streaming mode, ASR systems need to start transcribing the input stream before it is complete, i.e., the systems have to process a stream of inputs with a limited (or no) future context. Compared to offline mode, this reduction of the future context degrades the performance of Streaming-ASR systems, especially while working with low-latency constraint. In this work, we present SENS-ASR, an approach to enhance the transcription quality of Streaming-ASR by reinforcing the acoustic information with semantic information. This semantic information is extracted from the available past frame-embeddings by a context module. This module is trained using knowledge distillation from a sentence embedding Language Model fine-tuned on the training dataset transcriptions. Experiments on standard datasets show that SENS-ASR significantly improves the Word Error Rate on small-chunk streaming scenarios.
Abstract（参考訳）: 多くの自動音声認識(ASR)アプリケーションは音声データのストリーミング処理を必要とする。ストリーミングモードでは、ASRシステムは入力ストリームが完了する前に入力ストリームの書き起こしを開始する必要がある。オフラインモードと比較して、将来的なコンテキストの削減は、特に低遅延制約で動作する場合、Streaming-ASRシステムの性能を低下させる。本研究では,SENS-ASRについて述べる。SENS-ASRは意味情報を用いて音響情報を補強することにより,ストリームASRの転写品質を向上させる手法である。この意味情報は、コンテキストモジュールによって利用可能な過去のフレーム埋め込みから抽出される。このモジュールは、トレーニングデータセットの転写に基づいて微調整された文埋め込み言語モデルからの知識蒸留を用いて訓練される。標準データセットの実験では、SENS-ASRは小さなチャンクストリーミングシナリオでワードエラー率を大幅に改善している。

論文の概要: SENS-ASR: Semantic Embedding injection in Neural-transducer for Streaming Automatic Speech Recognition

関連論文リスト