Related papers: Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?

Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?

URL: http://arxiv.org/abs/2408.07277v1
Date: Mon, 12 Aug 2024 13:25:53 GMT
Title: Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?
Authors: Roshan Sharma, Suwon Shon, Mark Lindsey, Hira Dhamyal, Rita Singh, Bhiksha Raj,
Abstract summary: We examine whether summaries based on annotators listening to the recordings differ from those based on annotators reading transcripts. We find that summaries are indeed different based on the source modality, and that speech-based summaries are more factually consistent and information-selective than transcript-based summaries.
Score: 35.71047777304832
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Reference summaries for abstractive speech summarization require human annotation, which can be performed by listening to an audio recording or by reading textual transcripts of the recording. In this paper, we examine whether summaries based on annotators listening to the recordings differ from those based on annotators reading transcripts. Using existing intrinsic evaluation based on human evaluation, automatic metrics, LLM-based evaluation, and a retrieval-based reference-free method. We find that summaries are indeed different based on the source modality, and that speech-based summaries are more factually consistent and information-selective than transcript-based summaries. Meanwhile, transcript-based summaries are impacted by recognition errors in the source, and expert-written summaries are more informative and reliable. We make all the collected data and analysis code public(https://github.com/cmu-mlsp/interview_humanssum) to facilitate the reproduction of our work and advance research in this area.

Related papers

AugSumm: towards generalizable speech summarization using synthetic labels from large language model [61.73741195292997]
Abstractive speech summarization (SSUM) aims to generate human-like summaries from speech. conventional SSUM models are mostly trained and evaluated with a single ground-truth (GT) human-annotated deterministic summary. We propose AugSumm, a method to leverage large language models (LLMs) as a proxy for human annotators to generate augmented summaries.
arXiv Detail & Related papers (2024-01-10T18:39:46Z)
On Context Utilization in Summarization with Large Language Models [83.84459732796302]
Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries. Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens. We conduct the first comprehensive study on context utilization and position bias in summarization.
arXiv Detail & Related papers (2023-10-16T16:45:12Z)
ESSumm: Extractive Speech Summarization from Untranscribed Meeting [7.309214379395552]
We propose a novel architecture for direct extractive speech-to-speech summarization, ESSumm. We leverage the off-the-shelf self-supervised convolutional neural network to extract the deep speech features from raw audio. Our approach automatically predicts the optimal sequence of speech segments that capture the key information with a target summary length.
arXiv Detail & Related papers (2022-09-14T20:13:15Z)
Towards Abstractive Grounded Summarization of Podcast Transcripts [33.268079036601634]
Summarization of podcast transcripts is of practical benefit to both content providers and consumers. It helps consumers to quickly decide whether they will listen to the podcasts and reduces the load of content providers to write summaries. However, podcast summarization faces significant challenges including factual inconsistencies with respect to the inputs.
arXiv Detail & Related papers (2022-03-22T02:44:39Z)
StreamHover: Livestream Transcript Summarization and Annotation [54.41877742041611]
We present StreamHover, a framework for annotating and summarizing livestream transcripts. With a total of over 500 hours of videos annotated with both extractive and abstractive summaries, our benchmark dataset is significantly larger than currently existing annotated corpora. We show that our model generalizes better and improves performance over strong baselines.
arXiv Detail & Related papers (2021-09-11T02:19:37Z)
Extractive Summarization of Call Transcripts [77.96603959765577]
This paper presents an indigenously developed method that combines topic modeling and sentence selection with punctuation restoration in ill-punctuated or un-punctuated call transcripts. Extensive testing, evaluation and comparisons have demonstrated the efficacy of this summarizer for call transcript summarization.
arXiv Detail & Related papers (2021-03-19T02:40:59Z)
Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language. We generate abstractive summaries of narrated instructional videos across a wide variety of topics. We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.