Towards Abstractive Grounded Summarization of Podcast Transcripts
- URL: http://arxiv.org/abs/2203.11425v1
- Date: Tue, 22 Mar 2022 02:44:39 GMT
- Title: Towards Abstractive Grounded Summarization of Podcast Transcripts
- Authors: Kaiqiang Song, Chen Li, Xiaoyang Wang, Dong Yu, Fei Liu
- Abstract summary: Summarization of podcast transcripts is of practical benefit to both content providers and consumers.
It helps consumers to quickly decide whether they will listen to the podcasts and reduces the load of content providers to write summaries.
However, podcast summarization faces significant challenges including factual inconsistencies with respect to the inputs.
- Score: 33.268079036601634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Podcasts have recently shown a rapid rise in popularity. Summarization of
podcast transcripts is of practical benefit to both content providers and
consumers. It helps consumers to quickly decide whether they will listen to the
podcasts and reduces the cognitive load of content providers to write
summaries. Nevertheless, podcast summarization faces significant challenges
including factual inconsistencies with respect to the inputs. The problem is
exacerbated by speech disfluencies and recognition errors in transcripts of
spoken language. In this paper, we explore a novel abstractive summarization
method to alleviate these challenges. Specifically, our approach learns to
produce an abstractive summary while grounding summary segments in specific
portions of the transcript to allow for full inspection of summary details. We
conduct a series of analyses of the proposed approach on a large podcast
dataset and show that the approach can achieve promising results. Grounded
summaries bring clear benefits in locating the summary and transcript segments
that contain inconsistent information, and hence significantly improve
summarization quality in both automatic and human evaluation metrics.
Related papers
- Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization? [35.71047777304832]
We examine whether summaries based on annotators listening to the recordings differ from those based on annotators reading transcripts.
We find that summaries are indeed different based on the source modality, and that speech-based summaries are more factually consistent and information-selective than transcript-based summaries.
arXiv Detail & Related papers (2024-08-12T13:25:53Z) - Factually Consistent Summarization via Reinforcement Learning with
Textual Entailment Feedback [57.816210168909286]
We leverage recent progress on textual entailment models to address this problem for abstractive summarization systems.
We use reinforcement learning with reference-free, textual entailment rewards to optimize for factual consistency.
Our results, according to both automatic metrics and human evaluation, show that our method considerably improves the faithfulness, salience, and conciseness of the generated summaries.
arXiv Detail & Related papers (2023-05-31T21:04:04Z) - Salience Allocation as Guidance for Abstractive Summarization [61.31826412150143]
We propose a novel summarization approach with a flexible and reliable salience guidance, namely SEASON (SaliencE Allocation as Guidance for Abstractive SummarizatiON)
SEASON utilizes the allocation of salience expectation to guide abstractive summarization and adapts well to articles in different abstractiveness.
arXiv Detail & Related papers (2022-10-22T02:13:44Z) - StreamHover: Livestream Transcript Summarization and Annotation [54.41877742041611]
We present StreamHover, a framework for annotating and summarizing livestream transcripts.
With a total of over 500 hours of videos annotated with both extractive and abstractive summaries, our benchmark dataset is significantly larger than currently existing annotated corpora.
We show that our model generalizes better and improves performance over strong baselines.
arXiv Detail & Related papers (2021-09-11T02:19:37Z) - Learning Opinion Summarizers by Selecting Informative Reviews [81.47506952645564]
We collect a large dataset of summaries paired with user reviews for over 31,000 products, enabling supervised training.
The content of many reviews is not reflected in the human-written summaries, and, thus, the summarizer trained on random review subsets hallucinates.
We formulate the task as jointly learning to select informative subsets of reviews and summarizing the opinions expressed in these subsets.
arXiv Detail & Related papers (2021-09-09T15:01:43Z) - Extractive Summarization of Call Transcripts [77.96603959765577]
This paper presents an indigenously developed method that combines topic modeling and sentence selection with punctuation restoration in ill-punctuated or un-punctuated call transcripts.
Extensive testing, evaluation and comparisons have demonstrated the efficacy of this summarizer for call transcript summarization.
arXiv Detail & Related papers (2021-03-19T02:40:59Z) - Detecting Extraneous Content in Podcasts [6.335863593761816]
We present a model that leverage both textual and listening patterns to detect extraneous content in podcast descriptions and audio transcripts.
We show that our models can substantively improve ROUGE scores and reduce the extraneous content generated in the summaries.
arXiv Detail & Related papers (2021-03-03T18:30:50Z) - A Two-Phase Approach for Abstractive Podcast Summarization [18.35061145103997]
podcast summarization is different from summarization of other data formats.
We propose a two-phase approach: sentence selection and seq2seq learning.
Our approach achieves promising results regarding both ROUGE-based measures and human evaluations.
arXiv Detail & Related papers (2020-11-16T21:31:28Z) - Automatic Summarization of Open-Domain Podcast Episodes [33.268079036601634]
We present implementation details of our abstractive summarizers that achieve competitive results on the Podcast Summarization task of TREC 2020.
Our best system achieves a quality rating of 1.559 judged by NIST evaluators---an absolute increase of 0.268 (+21%) over the creator descriptions.
arXiv Detail & Related papers (2020-11-09T01:31:05Z) - PodSumm -- Podcast Audio Summarization [0.0]
We propose a method to automatically construct a podcast summary via guidance from the text-domain.
Motivated by a lack of datasets for this task, we curate an internal dataset, find an effective scheme for data augmentation, and design a protocol to gather summaries from annotators.
Our method achieves ROUGE-F(1/2/L) scores of 0.63/0.53/0.63 on our dataset.
arXiv Detail & Related papers (2020-09-22T04:49:33Z) - Unsupervised Opinion Summarization with Noising and Denoising [85.49169453434554]
We create a synthetic dataset from a corpus of user reviews by sampling a review, pretending it is a summary, and generating noisy versions thereof.
At test time, the model accepts genuine reviews and generates a summary containing salient opinions, treating those that do not reach consensus as noise.
arXiv Detail & Related papers (2020-04-21T16:54:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.