CUED_speech at TREC 2020 Podcast Summarisation Track
- URL: http://arxiv.org/abs/2012.02535v2
- Date: Wed, 13 Jan 2021 11:33:16 GMT
- Title: CUED_speech at TREC 2020 Podcast Summarisation Track
- Authors: Potsawee Manakul and Mark Gales
- Abstract summary: Given a podcast episode with its transcription, the goal is to generate a summary that captures the most important information in the content.
Our approach consists of two steps: (1) Filtering redundant or less informative sentences in the transcription using the attention of a hierarchical model; (2) Applying a state-of-the-art text summarisation system (BART) fine-tuned on the Podcast data using a sequence-level reward function.
Our system won the Spotify Podcast Summarisation Challenge in the TREC 2020 Podcast Track in both human and automatic evaluation.
- Score: 1.776746672434207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we describe our approach for the Podcast Summarisation
challenge in TREC 2020. Given a podcast episode with its transcription, the
goal is to generate a summary that captures the most important information in
the content. Our approach consists of two steps: (1) Filtering redundant or
less informative sentences in the transcription using the attention of a
hierarchical model; (2) Applying a state-of-the-art text summarisation system
(BART) fine-tuned on the Podcast data using a sequence-level reward function.
Furthermore, we perform ensembles of three and nine models for our submission
runs. We also fine-tune the BART model on the Podcast data as our baseline. The
human evaluation by NIST shows that our best submission achieves 1.777 in the
EGFB scale, while the score of creator-provided description is 1.291. Our
system won the Spotify Podcast Summarisation Challenge in the TREC2020 Podcast
Track in both human and automatic evaluation.
Related papers
- TokenSplit: Using Discrete Speech Representations for Direct, Refined,
and Transcript-Conditioned Speech Separation and Recognition [51.565319173790314]
TokenSplit is a sequence-to-sequence encoder-decoder model that uses the Transformer architecture.
We show that our model achieves excellent performance in terms of separation, both with or without transcript conditioning.
We also measure the automatic speech recognition (ASR) performance and provide audio samples of speech synthesis to demonstrate the additional utility of our model.
arXiv Detail & Related papers (2023-08-21T01:52:01Z) - Learning to Ground Instructional Articles in Videos through Narrations [50.3463147014498]
We present an approach for localizing steps of procedural activities in narrated how-to videos.
We source the step descriptions from a language knowledge base (wikiHow) containing instructional articles.
Our model learns to temporally ground the steps of procedural articles in how-to videos by matching three modalities.
arXiv Detail & Related papers (2023-06-06T15:45:53Z) - Generating EDU Extracts for Plan-Guided Summary Re-Ranking [77.7752504102925]
Two-step approaches, in which summary candidates are generated-then-reranked to return a single summary, can improve ROUGE scores over the standard single-step approach.
We design a novel method to generate candidates for re-ranking that addresses these issues.
We show large relevance improvements over previously published methods on widely used single document news article corpora.
arXiv Detail & Related papers (2023-05-28T17:22:04Z) - SVTS: Scalable Video-to-Speech Synthesis [105.29009019733803]
We introduce a scalable video-to-speech framework consisting of two components: a video-to-spectrogram predictor and a pre-trained neural vocoder.
We are the first to show intelligible results on the challenging LRS3 dataset.
arXiv Detail & Related papers (2022-05-04T13:34:07Z) - Spotify at TREC 2020: Genre-Aware Abstractive Podcast Summarization [4.456617185465443]
The goal of this challenge was to generate short, informative summaries that contain the key information present in a podcast episode.
We propose two summarization models that explicitly take genre and named entities into consideration.
Our models are abstractive, and supervised using creator-provided descriptions as ground truth summaries.
arXiv Detail & Related papers (2021-04-07T18:27:28Z) - Detecting Extraneous Content in Podcasts [6.335863593761816]
We present a model that leverage both textual and listening patterns to detect extraneous content in podcast descriptions and audio transcripts.
We show that our models can substantively improve ROUGE scores and reduce the extraneous content generated in the summaries.
arXiv Detail & Related papers (2021-03-03T18:30:50Z) - A Two-Phase Approach for Abstractive Podcast Summarization [18.35061145103997]
podcast summarization is different from summarization of other data formats.
We propose a two-phase approach: sentence selection and seq2seq learning.
Our approach achieves promising results regarding both ROUGE-based measures and human evaluations.
arXiv Detail & Related papers (2020-11-16T21:31:28Z) - PodSumm -- Podcast Audio Summarization [0.0]
We propose a method to automatically construct a podcast summary via guidance from the text-domain.
Motivated by a lack of datasets for this task, we curate an internal dataset, find an effective scheme for data augmentation, and design a protocol to gather summaries from annotators.
Our method achieves ROUGE-F(1/2/L) scores of 0.63/0.53/0.63 on our dataset.
arXiv Detail & Related papers (2020-09-22T04:49:33Z) - A Baseline Analysis for Podcast Abstractive Summarization [18.35061145103997]
This paper presents a baseline analysis of podcast summarization using the Spotify Podcast dataset.
It aims to help researchers understand current state-of-the-art pre-trained models and hence build a foundation for creating better models.
arXiv Detail & Related papers (2020-08-24T18:38:42Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio
Representation [51.37980448183019]
We propose Audio ALBERT, a lite version of the self-supervised speech representation model.
We show that Audio ALBERT is capable of achieving competitive performance with those huge models in the downstream tasks.
In probing experiments, we find that the latent representations encode richer information of both phoneme and speaker than that of the last layer.
arXiv Detail & Related papers (2020-05-18T10:42:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.