Fugu-MT 論文翻訳(概要): Beyond Transcripts: Iterative Peer-Editing with Audio Unlocks High-Quality Human Summaries of Conversational Speech

論文の概要: Beyond Transcripts: Iterative Peer-Editing with Audio Unlocks High-Quality Human Summaries of Conversational Speech

arxiv url: http://arxiv.org/abs/2605.17652v1
Date: Sun, 17 May 2026 21:07:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:48.299341
Title: Beyond Transcripts: Iterative Peer-Editing with Audio Unlocks High-Quality Human Summaries of Conversational Speech
Title（参考訳）: 音声をアンロックした反復的なピア編集による会話音声の高精度な人文要約
Authors: Kaavya Chaparala, Thomas Thebaud, Jesús Villalba López, Laureano Moro-Velazquez, Peter Viechnicki, Najim Dehak,
Abstract要約: 人間の音声に基づく要約と人間の書き起こしに基づく要約を比較し、異なる情報モダリティが要約品質に与える影響を追跡する。音声に基づく要約は、書き起こし要約よりも情報が少なく、圧縮も少ないことが判明した。これらの結果は, 語彙情報と韻律情報の両方から得られるベンチマークを作成するために, 人間のアノテーション間のピア編集を検証した。
参考スコア（独自算出の注目度）: 15.050836014853017
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There are not enough established benchmarks for the task fo speech summarization. Creating new benchmarks demands human annotation, as LLMs could embed systemic errors and bias into datasets. We test ten annotation workflows varying input modality (audio, transcript, or both) and the inclusion of editing (self or peer-editing) to investigate potential quality tradeoffs from using human annotators to summarize audio. We compare human audio-based summaries to human transcript-based summaries to track the impact of the different information modalities on summary quality. We also compare the human outputs against four LLM benchmarks (three text, one audio) to examine whether human-written summaries are less informative than highly fluent automated outputs. We find that audio-based summaries are less informative and more compressed than transcript summaries. However, iterative peer-editing with audio mitigates this difference, enabling audio-based summaries to be as informative as their transcript counterparts and LLM summaries. These findings validate iterative peer-editing among human annotators for the creation of benchmarks informed by both lexical and prosodic information. This enables crucial dataset collection even in setting where transcripts are unavailable.
Abstract（参考訳）: タスクフォワードの要約のための確立されたベンチマークは十分ではない。 LLMはシステムエラーやバイアスをデータセットに埋め込むことができるため、新しいベンチマークを作成するには人間のアノテーションが必要である。音声の要約にヒトのアノテータを用いることで、入力モダリティ(オーディオ、トランスクリプト、またはその両方)や編集(自己またはピア編集)を含む10のアノテーションワークフローをテストする。人間の音声に基づく要約と人間の書き起こしに基づく要約を比較し、異なる情報モダリティが要約品質に与える影響を追跡する。また、4つのLCMベンチマーク(3つのテキスト、1つのオーディオ)に対して人間の出力を比較し、人書き要約が高度に流用した自動出力よりも情報に乏しいかどうかを調べる。音声に基づく要約は、書き起こし要約よりも情報が少なく、圧縮も少ないことが判明した。しかし、音声による反復的なピア編集は、この違いを緩和し、音声に基づく要約は、その書き起こしとLLM要約と同程度に情報的になる。これらの知見は, 語彙情報と韻律情報の両方から得られるベンチマークを作成するために, 人間のアノテーション間の反復的ピア編集を検証した。これにより、スクリプティングが利用できない設定でも、重要なデータセットの収集が可能になる。

論文の概要: Beyond Transcripts: Iterative Peer-Editing with Audio Unlocks High-Quality Human Summaries of Conversational Speech

関連論文リスト