Fugu-MT 論文翻訳(概要): Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

論文の概要: Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

arxiv url: http://arxiv.org/abs/2309.09510v1
Date: Mon, 18 Sep 2023 06:43:30 GMT
ステータス: 翻訳完了
システム内更新日: 2023-09-19 14:50:56.609629
Title: Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
Title（参考訳）: Dynamic-SUPERB:音声の動的・協調的・包括的指導調整ベンチマークを目指して
Authors: Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-yi Lee
Abstract要約: テキスト言語モデルは、よく整形された命令が与えられたときに、目に見えないタスクに一般化する際、顕著なゼロショット能力を示している。ゼロショット方式で複数のタスクを実行するための命令チューニングを活用できるユニバーサル音声モデルを構築するためのベンチマークであるDynamic-SUPERBを提案する。
参考スコア（独自算出の注目度）: 110.03854819655098
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text language models have shown remarkable zero-shot capability in generalizing to unseen tasks when provided with well-formulated instructions. However, existing studies in speech processing primarily focus on limited or specific tasks. Moreover, the lack of standardized benchmarks hinders a fair comparison across different approaches. Thus, we present Dynamic-SUPERB, a benchmark designed for building universal speech models capable of leveraging instruction tuning to perform multiple tasks in a zero-shot fashion. To achieve comprehensive coverage of diverse speech tasks and harness instruction tuning, we invite the community to collaborate and contribute, facilitating the dynamic growth of the benchmark. To initiate, Dynamic-SUPERB features 55 evaluation instances by combining 33 tasks and 22 datasets. This spans a broad spectrum of dimensions, providing a comprehensive platform for evaluation. Additionally, we propose several approaches to establish benchmark baselines. These include the utilization of speech models, text language models, and the multimodal encoder. Evaluation results indicate that while these baselines perform reasonably on seen tasks, they struggle with unseen ones. We also conducted an ablation study to assess the robustness and seek improvements in the performance. We release all materials to the public and welcome researchers to collaborate on the project, advancing technologies in the field together.
Abstract（参考訳）: テキスト言語モデルは、十分に定式化された命令が提供されたとき、見当たらないタスクに一般化する顕著なゼロショット能力を示している。しかし、音声処理における既存の研究は、主に限定的あるいは特定のタスクに焦点を当てている。さらに、標準ベンチマークの欠如は、異なるアプローチ間の公正な比較を妨げる。そこで本稿では,命令チューニングを活用し,ゼロショット方式で複数のタスクを実行するユニバーサル音声モデル構築のためのベンチマークであるdynamic-superbを提案する。多様な音声タスクの包括的なカバレッジと命令チューニングを実現するため、コミュニティに協力と貢献を呼びかけ、ベンチマークのダイナミックな成長を促進します。 Dynamic-SUPERBは、33のタスクと22のデータセットを組み合わせて55の評価インスタンスを特徴とする。これは幅広い次元にまたがり、評価のための包括的なプラットフォームを提供する。さらに,ベンチマークベースラインを確立するためのいくつかのアプローチを提案する。これには、音声モデル、テキスト言語モデル、マルチモーダルエンコーダの利用が含まれる。評価の結果、これらのベースラインは見かけのタスクで合理的に機能するが、目に見えないタスクに苦しむことがわかった。また,ロバスト性の評価と性能改善のためのアブレーション調査を行った。我々はすべての資料を一般に公開し、研究者にプロジェクトの共同研究を歓迎し、この分野のテクノロジーを進歩させます。

論文の概要: Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

関連論文リスト