Related papers: Spotify at TREC 2020: Genre-Aware Abstractive Podcast Summarization

Spotify at TREC 2020: Genre-Aware Abstractive Podcast Summarization

URL: http://arxiv.org/abs/2104.03343v1
Date: Wed, 7 Apr 2021 18:27:28 GMT
Title: Spotify at TREC 2020: Genre-Aware Abstractive Podcast Summarization
Authors: Rezvaneh Rezapour and Sravana Reddy and Ann Clifton and Rosie Jones
Abstract summary: The goal of this challenge was to generate short, informative summaries that contain the key information present in a podcast episode. We propose two summarization models that explicitly take genre and named entities into consideration. Our models are abstractive, and supervised using creator-provided descriptions as ground truth summaries.
Score: 4.456617185465443
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper contains the description of our submissions to the summarization task of the Podcast Track in TREC (the Text REtrieval Conference) 2020. The goal of this challenge was to generate short, informative summaries that contain the key information present in a podcast episode using automatically generated transcripts of the podcast audio. Since podcasts vary with respect to their genre, topic, and granularity of information, we propose two summarization models that explicitly take genre and named entities into consideration in order to generate summaries appropriate to the style of the podcasts. Our models are abstractive, and supervised using creator-provided descriptions as ground truth summaries. The results of the submitted summaries show that our best model achieves an aggregate quality score of 1.58 in comparison to the creator descriptions and a baseline abstractive system which both score 1.49 (an improvement of 9%) as assessed by human evaluators.

Related papers

Rhapsody: A Dataset for Highlight Detection in Podcasts [49.1662517033426]
We introduce Rhapsody, a feature paired with segment-level highlight from YouTube's'most replayed' episodes.<n>We frame the podcast highlight detection as a segment-level binary classification task.<n>Models finetuned with in-domain data significantly outperform their zero-shot performance.<n>These findings highlight the challenges for fine-grained information access in long-form spoken media.
arXiv Detail & Related papers (2025-05-26T02:39:34Z)
MoonCast: High-Quality Zero-Shot Podcast Generation [81.29927724674602]
MoonCast is a solution for high-quality zero-shot podcast generation. It aims to synthesize natural podcast-style speech from text-only sources. Experiments demonstrate that MoonCast outperforms baselines.
arXiv Detail & Related papers (2025-03-18T15:25:08Z)
Movie101v2: Improved Movie Narration Benchmark [53.54176725112229]
Automatic movie narration aims to generate video-aligned plot descriptions to assist visually impaired audiences. We introduce Movie101v2, a large-scale, bilingual dataset with enhanced data quality specifically designed for movie narration. Based on our new benchmark, we baseline a range of large vision-language models, including GPT-4V, and conduct an in-depth analysis of the challenges in narration generation.
arXiv Detail & Related papers (2024-04-20T13:15:27Z)
AugSumm: towards generalizable speech summarization using synthetic labels from large language model [61.73741195292997]
Abstractive speech summarization (SSUM) aims to generate human-like summaries from speech. conventional SSUM models are mostly trained and evaluated with a single ground-truth (GT) human-annotated deterministic summary. We propose AugSumm, a method to leverage large language models (LLMs) as a proxy for human annotators to generate augmented summaries.
arXiv Detail & Related papers (2024-01-10T18:39:46Z)
Fine-grained Audible Video Description [61.81122862375985]
We construct the first fine-grained audible video description benchmark (FAVDBench) For each video clip, we first provide a one-sentence summary of the video, followed by 4-6 sentences describing the visual details and 1-2 audio-related descriptions at the end. We demonstrate that employing fine-grained video descriptions can create more intricate videos than using captions.
arXiv Detail & Related papers (2023-03-27T22:03:48Z)
Towards Abstractive Grounded Summarization of Podcast Transcripts [33.268079036601634]
Summarization of podcast transcripts is of practical benefit to both content providers and consumers. It helps consumers to quickly decide whether they will listen to the podcasts and reduces the load of content providers to write summaries. However, podcast summarization faces significant challenges including factual inconsistencies with respect to the inputs.
arXiv Detail & Related papers (2022-03-22T02:44:39Z)
StreamHover: Livestream Transcript Summarization and Annotation [54.41877742041611]
We present StreamHover, a framework for annotating and summarizing livestream transcripts. With a total of over 500 hours of videos annotated with both extractive and abstractive summaries, our benchmark dataset is significantly larger than currently existing annotated corpora. We show that our model generalizes better and improves performance over strong baselines.
arXiv Detail & Related papers (2021-09-11T02:19:37Z)
SummScreen: A Dataset for Abstractive Screenplay Summarization [52.56760815805357]
SummScreen is a dataset comprised of pairs of TV series transcripts and human written recaps. Plot details are often expressed indirectly in character dialogues and may be scattered across the entirety of the transcript. Since characters are fundamental to TV series, we also propose two entity-centric evaluation metrics.
arXiv Detail & Related papers (2021-04-14T19:37:40Z)
CUED_speech at TREC 2020 Podcast Summarisation Track [1.776746672434207]
Given a podcast episode with its transcription, the goal is to generate a summary that captures the most important information in the content. Our approach consists of two steps: (1) Filtering redundant or less informative sentences in the transcription using the attention of a hierarchical model; (2) Applying a state-of-the-art text summarisation system (BART) fine-tuned on the Podcast data using a sequence-level reward function. Our system won the Spotify Podcast Summarisation Challenge in the TREC 2020 Podcast Track in both human and automatic evaluation.
arXiv Detail & Related papers (2020-12-04T11:32:55Z)
A Two-Phase Approach for Abstractive Podcast Summarization [18.35061145103997]
podcast summarization is different from summarization of other data formats. We propose a two-phase approach: sentence selection and seq2seq learning. Our approach achieves promising results regarding both ROUGE-based measures and human evaluations.
arXiv Detail & Related papers (2020-11-16T21:31:28Z)
PodSumm -- Podcast Audio Summarization [0.0]
We propose a method to automatically construct a podcast summary via guidance from the text-domain. Motivated by a lack of datasets for this task, we curate an internal dataset, find an effective scheme for data augmentation, and design a protocol to gather summaries from annotators. Our method achieves ROUGE-F(1/2/L) scores of 0.63/0.53/0.63 on our dataset.
arXiv Detail & Related papers (2020-09-22T04:49:33Z)
A Baseline Analysis for Podcast Abstractive Summarization [18.35061145103997]
This paper presents a baseline analysis of podcast summarization using the Spotify Podcast dataset. It aims to help researchers understand current state-of-the-art pre-trained models and hence build a foundation for creating better models.
arXiv Detail & Related papers (2020-08-24T18:38:42Z)
Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language. We generate abstractive summaries of narrated instructional videos across a wide variety of topics. We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.