Fugu-MT 論文翻訳(概要): CV-18 NER: Augmented Common Voice for Named Entity Recognition from Arabic Speech

論文の概要: CV-18 NER: Augmented Common Voice for Named Entity Recognition from Arabic Speech

arxiv url: http://arxiv.org/abs/2604.02209v1
Date: Thu, 02 Apr 2026 16:02:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.90772
Title: CV-18 NER: Augmented Common Voice for Named Entity Recognition from Arabic Speech
Title（参考訳）: CV-18 NER:アラビア語音声から名前付きエンティティ認識のための拡張共通音声
Authors: Youssef Saidi, Haroun Elleuch, Fethi Bougares,
Abstract要約: アラビア語音声からのNERのための最初の公開データセットであるCV-18 NERを紹介する。我々はWhisperとAraBEST-RQに基づくパイプラインシステム(ASR + text NER)とE2Eモデルの両方をベンチマークする。 E2Eシステムはテストセット上で最高のパイプライン構成を大幅に上回り、CoER(AraBEST-RQ 300M)が37.0%、CVER(Whisper-medium)が38.0%に達した。
参考スコア（独自算出の注目度）: 0.6168349254390701
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: End-to-end speech Named Entity Recognition (NER) aims to directly extract entities from speech. Prior work has shown that end-to-end (E2E) approaches can outperform cascaded pipelines for English, French, and Chinese, but Arabic remains under-explored due to its morphological complexity, the absence of short vowels, and limited annotated resources. We introduce CV-18 NER, the first publicly available dataset for NER from Arabic speech, created by augmenting the Arabic Common Voice 18 corpus with manual NER annotations following the fine-grained Wojood schema (21 entity types). We benchmark both pipeline systems (ASR + text NER) and E2E models based on Whisper and AraBEST-RQ. E2E systems substantially outperform the best pipeline configuration on the test set, reaching 37.0% CoER (AraBEST-RQ 300M) and 38.0% CVER (Whisper-medium). Further analysis shows that Arabic-specific self-supervised pretraining yields strong ASR performance, while multilingual weak supervision transfers more effectively to joint speech-to-entity learning, and that larger models may be harder to adapt in this low-resource setting. Our dataset and models are publicly released, providing the first open benchmark for end-to-end named entity recognition from Arabic speech https://huggingface.co/datasets/Elyadata/CV18-NER.
Abstract（参考訳）: エンドツーエンド音声名前付きエンティティ認識(NER)は、音声から直接エンティティを抽出することを目的としている。以前の研究で、E2Eアプローチは英語、フランス語、中国語のカスケードパイプラインよりも優れていることが示されているが、アラビア語は、その形態的複雑さ、短い母音の欠如、限られた注釈付き資源のために未発見のままである。我々は,アラビア音声からNERのための最初の公開データセットであるCV-18 NERを紹介し,Wojoodスキーマ(21のエンティティタイプ)に従って手動のNERアノテーションをアラビア共通音声18コーパスに付加することによって作成した。我々はWhisperとAraBEST-RQに基づくパイプラインシステム(ASR + text NER)とE2Eモデルの両方をベンチマークする。 E2Eシステムはテストセットにおける最高のパイプライン構成を大幅に上回り、CoER(AraBEST-RQ 300M)37.0%、CVER(Whisper-medium)38.0%に達した。さらに分析したところ、アラビア固有の自己教師付き事前学習は強いASR性能をもたらす一方、多言語による弱監督は、より効果的に共同発話から遠心学習に移行し、この低リソース環境ではより大きなモデルに適応することが困難であることが示された。我々のデータセットとモデルは公開されており、アラビア語の https://huggingface.co/datasets/Elyadata/CV18-NER からエンド・ツー・エンドのエンティティ認識のための最初のオープンベンチマークを提供する。

論文の概要: CV-18 NER: Augmented Common Voice for Named Entity Recognition from Arabic Speech

関連論文リスト