Fugu-MT 論文翻訳(概要): SpokenUS: A Spoken User Simulator for Task-Oriented Dialogue

論文の概要: SpokenUS: A Spoken User Simulator for Task-Oriented Dialogue

arxiv url: http://arxiv.org/abs/2603.16783v1
Date: Tue, 17 Mar 2026 16:58:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:07.433742
Title: SpokenUS: A Spoken User Simulator for Task-Oriented Dialogue
Title（参考訳）: Spokenus:タスク指向対話のための音声ユーザシミュレータ
Authors: Jonggeun Lee, Junseong Pyo, Jeongmin Park, Yohan Jo,
Abstract要約: 我々は,52,390対話と1,034時間音声を4つの音声ユーザ行動で拡張した音声TODデータセットであるtextbfSpokenTODを紹介した。 ToDをベースとした音声シミュレータ textbfSpokenus を,バージイン専用のアーキテクチャで提案する。
参考スコア（独自算出の注目度）: 11.90483692004643
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Robust task-oriented spoken dialogue agents require exposure to the full diversity of how people interact through speech. Building spoken user simulators that address this requires large-scale spoken task-oriented dialogue (TOD) data encompassing spoken user behaviors, yet existing datasets are limited in scale and domain coverage, with no systematic pipeline for augmenting them. To address this, we introduce \textbf{SpokenTOD}, a spoken TOD dataset of 52,390 dialogues and 1,034 hours of speech augmented with four spoken user behaviors -- cross-turn slots, barge-in, disfluency, and emotional prosody -- across diverse speakers and domains. Building on SpokenTOD, we present \textbf{SpokenUS}, a spoken user simulator grounded in TOD with a dedicated architecture for barge-in. SpokenUS achieves comparable goal coverage to significantly larger models while substantially outperforming all baselines in Human MOS, disclosing slot values gradually across the dialogue as humans do rather than front-loading them. Further analysis confirms that SpokenUS's spoken behaviors pose meaningful challenges to downstream agents, making it a practical tool for training and evaluating more robust spoken dialogue systems.
Abstract（参考訳）: ロバストなタスク指向の音声対話エージェントは、人々が音声を通して対話する方法の完全な多様性に曝露する必要がある。これに対応する音声ユーザシミュレータの構築には、音声ユーザの振る舞いを含む大規模音声タスク指向対話(TOD)データが必要であるが、既存のデータセットはスケールとドメインカバレッジに制限されており、拡張のための体系的なパイプラインは存在しない。これを解決するために,52,390の対話と1,034時間の音声による音声TODデータセットである \textbf{SpokenTOD} を紹介した。 SpokenTOD 上に構築された音声シミュレータ \textbf{SpokenUS} について述べる。 Spokenusは、はるかに大きなモデルに匹敵する目標カバレッジを達成しつつ、人間のMOSのすべてのベースラインを著しく上回り、人間が前もってロードするのではなく、対話全体でスロットの値を徐々に開示する。さらなる分析により、スポケナスの発声行動が下流のエージェントに有意義な課題をもたらすことが確認され、より堅牢な音声対話システムの訓練と評価の実践的ツールとなった。

論文の概要: SpokenUS: A Spoken User Simulator for Task-Oriented Dialogue

関連論文リスト