Fugu-MT 論文翻訳(概要): CommonVoice-SpeechRE and RPG-MoGe: Advancing Speech Relation Extraction with a New Dataset and Multi-Order Generative Framework

論文の概要: CommonVoice-SpeechRE and RPG-MoGe: Advancing Speech Relation Extraction with a New Dataset and Multi-Order Generative Framework

arxiv url: http://arxiv.org/abs/2509.08438v1
Date: Wed, 10 Sep 2025 09:35:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-11 15:16:52.374944
Title: CommonVoice-SpeechRE and RPG-MoGe: Advancing Speech Relation Extraction with a New Dataset and Multi-Order Generative Framework
Title（参考訳）: CommonVoice-SpeechREとRPG-MoGe:新しいデータセットと多階生成フレームワークによる音声関係抽出の改善
Authors: Jinzhong Ning, Paerhati Tulajiang, Yingying Le, Yijia Zhang, Yuanyuan Sun, Hongfei Lin, Haifeng Liu,
Abstract要約: 音声関係抽出(SpeechRE)は、音声から直接関係三重項を抽出することを目的としている。既存のベンチマークデータセットは、合成データに大きく依存しており、実際の人間の発話の十分な量と多様性を欠いている。 CommonVoice-SpeechREは、多様な話者から2万近い実際の音声サンプルからなる大規模なデータセットである。
参考スコア（独自算出の注目度）: 21.853908675421504
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Speech Relation Extraction (SpeechRE) aims to extract relation triplets directly from speech. However, existing benchmark datasets rely heavily on synthetic data, lacking sufficient quantity and diversity of real human speech. Moreover, existing models also suffer from rigid single-order generation templates and weak semantic alignment, substantially limiting their performance. To address these challenges, we introduce CommonVoice-SpeechRE, a large-scale dataset comprising nearly 20,000 real-human speech samples from diverse speakers, establishing a new benchmark for SpeechRE research. Furthermore, we propose the Relation Prompt-Guided Multi-Order Generative Ensemble (RPG-MoGe), a novel framework that features: (1) a multi-order triplet generation ensemble strategy, leveraging data diversity through diverse element orders during both training and inference, and (2) CNN-based latent relation prediction heads that generate explicit relation prompts to guide cross-modal alignment and accurate triplet generation. Experiments show our approach outperforms state-of-the-art methods, providing both a benchmark dataset and an effective solution for real-world SpeechRE. The source code and dataset are publicly available at https://github.com/NingJinzhong/SpeechRE_RPG_MoGe.
Abstract（参考訳）: 音声関係抽出(SpeechRE)は、音声から直接関係三重項を抽出することを目的としている。しかし、既存のベンチマークデータセットは合成データに大きく依存しており、実際の人間の発話の量と多様性が不足している。さらに、既存のモデルは厳格な単階生成テンプレートと弱いセマンティックアライメントに悩まされ、パフォーマンスが著しく制限される。これらの課題に対処するために,多様な話者から2万近い実際の音声サンプルからなる大規模データセットであるCommonVoice-SpeechREを導入し,SpeechRE研究のための新しいベンチマークを構築した。さらに,(1)多階トリプルト生成アンサンブル戦略と,(2)CNNに基づく有意な関係を生成する潜在関係予測ヘッドにより,クロスモーダルアライメントと正確な三重項生成を導出する,新しいフレームワークであるリレーショナル・プロンプトガイド型マルチオーダ生成アンサンブル(RPG-MoGe)を提案する。実験により、我々のアプローチは最先端の手法よりも優れており、ベンチマークデータセットと実世界のSpeechREに効果的なソリューションを提供する。ソースコードとデータセットはhttps://github.com/NingJinzhong/SpeechRE_RPG_MoGeで公開されている。

論文の概要: CommonVoice-SpeechRE and RPG-MoGe: Advancing Speech Relation Extraction with a New Dataset and Multi-Order Generative Framework

関連論文リスト