Related papers: A Two-Phase Approach for Abstractive Podcast Summarization

A Two-Phase Approach for Abstractive Podcast Summarization

URL: http://arxiv.org/abs/2011.08291v1
Date: Mon, 16 Nov 2020 21:31:28 GMT
Title: A Two-Phase Approach for Abstractive Podcast Summarization
Authors: Chujie Zheng, Kunpeng Zhang, Harry Jiannan Wang, Ling Fan
Abstract summary: podcast summarization is different from summarization of other data formats. We propose a two-phase approach: sentence selection and seq2seq learning. Our approach achieves promising results regarding both ROUGE-based measures and human evaluations.
Score: 18.35061145103997
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Podcast summarization is different from summarization of other data formats, such as news, patents, and scientific papers in that podcasts are often longer, conversational, colloquial, and full of sponsorship and advertising information, which imposes great challenges for existing models. In this paper, we focus on abstractive podcast summarization and propose a two-phase approach: sentence selection and seq2seq learning. Specifically, we first select important sentences from the noisy long podcast transcripts. The selection is based on sentence similarity to the reference to reduce the redundancy and the associated latent topics to preserve semantics. Then the selected sentences are fed into a pre-trained encoder-decoder framework for the summary generation. Our approach achieves promising results regarding both ROUGE-based measures and human evaluations.

Related papers

Rhapsody: A Dataset for Highlight Detection in Podcasts [49.1662517033426]
We introduce Rhapsody, a feature paired with segment-level highlight from YouTube's'most replayed' episodes.<n>We frame the podcast highlight detection as a segment-level binary classification task.<n>Models finetuned with in-domain data significantly outperform their zero-shot performance.<n>These findings highlight the challenges for fine-grained information access in long-form spoken media.
arXiv Detail & Related papers (2025-05-26T02:39:34Z)
MoonCast: High-Quality Zero-Shot Podcast Generation [81.29927724674602]
MoonCast is a solution for high-quality zero-shot podcast generation. It aims to synthesize natural podcast-style speech from text-only sources. Experiments demonstrate that MoonCast outperforms baselines.
arXiv Detail & Related papers (2025-03-18T15:25:08Z)
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation [97.54885207518946]
We introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion. We propose two separated encoders to preserve the speaker's voice characteristics and isochrony from the source speech during the translation process. Our experiments on the French-English language pair demonstrate that our model outperforms the current state-of-the-art speech-to-speech translation model.
arXiv Detail & Related papers (2024-05-28T04:11:37Z)
Aspect-based Meeting Transcript Summarization: A Two-Stage Approach with Weak Supervision on Sentence Classification [91.13086984529706]
Aspect-based meeting transcript summarization aims to produce multiple summaries. Traditional summarization methods produce one summary mixing information of all aspects. We propose a two-stage method for aspect-based meeting transcript summarization.
arXiv Detail & Related papers (2023-11-07T19:06:31Z)
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI [64.71397830291838]
Generative AI has demonstrated impressive performance in various fields, among which speech synthesis is an interesting direction. With the diffusion model as the most popular generative model, numerous works have attempted two active tasks: text to speech and speech enhancement. This work conducts a survey on audio diffusion model, which is complementary to existing surveys.
arXiv Detail & Related papers (2023-03-23T15:17:15Z)
Towards Abstractive Grounded Summarization of Podcast Transcripts [33.268079036601634]
Summarization of podcast transcripts is of practical benefit to both content providers and consumers. It helps consumers to quickly decide whether they will listen to the podcasts and reduces the load of content providers to write summaries. However, podcast summarization faces significant challenges including factual inconsistencies with respect to the inputs.
arXiv Detail & Related papers (2022-03-22T02:44:39Z)
Topic Modeling on Podcast Short-Text Metadata [0.9539495585692009]
We assess the feasibility to discover relevant topics from podcast metadata, titles and descriptions, using modeling techniques for short text. We propose a new strategy to named entities (NEs), often present in podcast metadata, in a Non-negative Matrix Factorization modeling framework. Our experiments on two existing datasets from Spotify and iTunes and Deezer, show that our proposed document representation, NEiCE, leads to improved coherence over the baselines.
arXiv Detail & Related papers (2022-01-12T11:07:05Z)
Spotify at TREC 2020: Genre-Aware Abstractive Podcast Summarization [4.456617185465443]
The goal of this challenge was to generate short, informative summaries that contain the key information present in a podcast episode. We propose two summarization models that explicitly take genre and named entities into consideration. Our models are abstractive, and supervised using creator-provided descriptions as ground truth summaries.
arXiv Detail & Related papers (2021-04-07T18:27:28Z)
Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization [72.54873655114844]
Text summarization is one of the most challenging and interesting problems in NLP. This work proposes a multi-view sequence-to-sequence model by first extracting conversational structures of unstructured daily chats from different views to represent conversations. Experiments on a large-scale dialogue summarization corpus demonstrated that our methods significantly outperformed previous state-of-the-art models via both automatic evaluations and human judgment.
arXiv Detail & Related papers (2020-10-04T20:12:44Z)
PodSumm -- Podcast Audio Summarization [0.0]
We propose a method to automatically construct a podcast summary via guidance from the text-domain. Motivated by a lack of datasets for this task, we curate an internal dataset, find an effective scheme for data augmentation, and design a protocol to gather summaries from annotators. Our method achieves ROUGE-F(1/2/L) scores of 0.63/0.53/0.63 on our dataset.
arXiv Detail & Related papers (2020-09-22T04:49:33Z)
Unsupervised Abstractive Dialogue Summarization for Tete-a-Tetes [49.901984490961624]
We propose the first unsupervised abstractive dialogue summarization model for tete-a-tetes (SuTaT) SuTaT consists of a conditional generative module and two unsupervised summarization modules. Experimental results show that SuTaT is superior on unsupervised dialogue summarization for both automatic and human evaluations.
arXiv Detail & Related papers (2020-09-15T03:27:52Z)
A Baseline Analysis for Podcast Abstractive Summarization [18.35061145103997]
This paper presents a baseline analysis of podcast summarization using the Spotify Podcast dataset. It aims to help researchers understand current state-of-the-art pre-trained models and hence build a foundation for creating better models.
arXiv Detail & Related papers (2020-08-24T18:38:42Z)
Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language. We generate abstractive summaries of narrated instructional videos across a wide variety of topics. We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.