A Baseline Analysis for Podcast Abstractive Summarization
- URL: http://arxiv.org/abs/2008.10648v2
- Date: Wed, 26 Aug 2020 01:32:36 GMT
- Title: A Baseline Analysis for Podcast Abstractive Summarization
- Authors: Chujie Zheng, Harry Jiannan Wang, Kunpeng Zhang, Ling Fan
- Abstract summary: This paper presents a baseline analysis of podcast summarization using the Spotify Podcast dataset.
It aims to help researchers understand current state-of-the-art pre-trained models and hence build a foundation for creating better models.
- Score: 18.35061145103997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Podcast summary, an important factor affecting end-users' listening
decisions, has often been considered a critical feature in podcast
recommendation systems, as well as many downstream applications. Existing
abstractive summarization approaches are mainly built on fine-tuned models on
professionally edited texts such as CNN and DailyMail news. Different from
news, podcasts are often longer, more colloquial and conversational, and
noisier with contents on commercials and sponsorship, which makes automatic
podcast summarization extremely challenging. This paper presents a baseline
analysis of podcast summarization using the Spotify Podcast Dataset provided by
TREC 2020. It aims to help researchers understand current state-of-the-art
pre-trained models and hence build a foundation for creating better models.
Related papers
- Mapping the Podcast Ecosystem with the Structured Podcast Research Corpus [23.70786221902932]
We introduce a massive dataset of over 1.1M podcast transcripts available through public RSS feeds from May and June of 2020.
This data is not limited to text, but rather includes audio features and speaker turns for a subset of 370K episodes.
Using this data, we also conduct a foundational investigation into the content, structure, and responsiveness of this popular impactful medium.
arXiv Detail & Related papers (2024-11-12T15:56:48Z) - Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization [70.13218512896032]
Generation of audio from text prompts is an important aspect of such processes in the music and film industry.
Our hypothesis is focusing on how these aspects of audio generation could improve audio generation performance in the presence of limited data.
We synthetically create a preference dataset where each prompt has a winner audio output and some loser audio outputs for the diffusion model to learn from.
arXiv Detail & Related papers (2024-04-15T17:31:22Z) - AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension [95.8442896569132]
We introduce AIR-Bench, the first benchmark to evaluate the ability of Large Audio-Language Models (LALMs) to understand various types of audio signals and interact with humans in the textual format.
Results demonstrate a high level of consistency between GPT-4-based evaluation and human evaluation.
arXiv Detail & Related papers (2024-02-12T15:41:22Z) - A Survey on Audio Diffusion Models: Text To Speech Synthesis and
Enhancement in Generative AI [64.71397830291838]
Generative AI has demonstrated impressive performance in various fields, among which speech synthesis is an interesting direction.
With the diffusion model as the most popular generative model, numerous works have attempted two active tasks: text to speech and speech enhancement.
This work conducts a survey on audio diffusion model, which is complementary to existing surveys.
arXiv Detail & Related papers (2023-03-23T15:17:15Z) - Topic Modeling on Podcast Short-Text Metadata [0.9539495585692009]
We assess the feasibility to discover relevant topics from podcast metadata, titles and descriptions, using modeling techniques for short text.
We propose a new strategy to named entities (NEs), often present in podcast metadata, in a Non-negative Matrix Factorization modeling framework.
Our experiments on two existing datasets from Spotify and iTunes and Deezer, show that our proposed document representation, NEiCE, leads to improved coherence over the baselines.
arXiv Detail & Related papers (2022-01-12T11:07:05Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - Spotify at TREC 2020: Genre-Aware Abstractive Podcast Summarization [4.456617185465443]
The goal of this challenge was to generate short, informative summaries that contain the key information present in a podcast episode.
We propose two summarization models that explicitly take genre and named entities into consideration.
Our models are abstractive, and supervised using creator-provided descriptions as ground truth summaries.
arXiv Detail & Related papers (2021-04-07T18:27:28Z) - A Two-Phase Approach for Abstractive Podcast Summarization [18.35061145103997]
podcast summarization is different from summarization of other data formats.
We propose a two-phase approach: sentence selection and seq2seq learning.
Our approach achieves promising results regarding both ROUGE-based measures and human evaluations.
arXiv Detail & Related papers (2020-11-16T21:31:28Z) - PodSumm -- Podcast Audio Summarization [0.0]
We propose a method to automatically construct a podcast summary via guidance from the text-domain.
Motivated by a lack of datasets for this task, we curate an internal dataset, find an effective scheme for data augmentation, and design a protocol to gather summaries from annotators.
Our method achieves ROUGE-F(1/2/L) scores of 0.63/0.53/0.63 on our dataset.
arXiv Detail & Related papers (2020-09-22T04:49:33Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.