Related papers: PODTILE: Facilitating Podcast Episode Browsing with Auto-generated Chapters

PODTILE: Facilitating Podcast Episode Browsing with Auto-generated Chapters

URL: http://arxiv.org/abs/2410.16148v1
Date: Mon, 21 Oct 2024 16:17:22 GMT
Title: PODTILE: Facilitating Podcast Episode Browsing with Auto-generated Chapters
Authors: Azin Ghazimatin, Ekaterina Garmash, Gustavo Penha, Kristen Sheets, Martin Achenbach, Oguz Semerci, Remi Galvez, Marcus Tannenberg, Sahitya Mantravadi, Divya Narayanan, Ofeliya Kalaydzhyan, Douglas Cole, Ben Carterette, Ann Clifton, Paul N. Bennett, Claudia Hauff, Mounia Lalmas,
Abstract summary: We introduce PODTILE, a fine-tuned encoder-decoder transformer to segment conversational data. PODTILE simultaneously generates chapter transitions and titles for the input transcript. Our findings indicate that auto-generated chapters serve as a useful tool for engaging with less popular podcasts.
Score: 15.856812659691238
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Listeners of long-form talk-audio content, such as podcast episodes, often find it challenging to understand the overall structure and locate relevant sections. A practical solution is to divide episodes into chapters--semantically coherent segments labeled with titles and timestamps. Since most episodes on our platform at Spotify currently lack creator-provided chapters, automating the creation of chapters is essential. Scaling the chapterization of podcast episodes presents unique challenges. First, episodes tend to be less structured than written texts, featuring spontaneous discussions with nuanced transitions. Second, the transcripts are usually lengthy, averaging about 16,000 tokens, which necessitates efficient processing that can preserve context. To address these challenges, we introduce PODTILE, a fine-tuned encoder-decoder transformer to segment conversational data. The model simultaneously generates chapter transitions and titles for the input transcript. To preserve context, each input text is augmented with global context, including the episode's title, description, and previous chapter titles. In our intrinsic evaluation, PODTILE achieved an 11% improvement in ROUGE score over the strongest baseline. Additionally, we provide insights into the practical benefits of auto-generated chapters for listeners navigating episode content. Our findings indicate that auto-generated chapters serve as a useful tool for engaging with less popular podcasts. Finally, we present empirical evidence that using chapter titles can enhance effectiveness of sparse retrieval in search tasks.

Related papers

Rhapsody: A Dataset for Highlight Detection in Podcasts [49.1662517033426]
We introduce Rhapsody, a feature paired with segment-level highlight from YouTube's'most replayed' episodes.<n>We frame the podcast highlight detection as a segment-level binary classification task.<n>Models finetuned with in-domain data significantly outperform their zero-shot performance.<n>These findings highlight the challenges for fine-grained information access in long-form spoken media.
arXiv Detail & Related papers (2025-05-26T02:39:34Z)
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs [59.854331104466254]
We address the task of video chaptering, i.e., partitioning a long video timeline into semantic units and generating corresponding chapter titles. We propose a lightweight speech-guided frame selection strategy based on speech transcript content, and experimentally demonstrate remarkable advantages. Our results demonstrate substantial improvements over the state of the art on the recent VidChapters-7M benchmark.
arXiv Detail & Related papers (2025-03-31T17:41:29Z)
VidChapters-7M: Video Chapters at Scale [110.19323390486775]
We present VidChapters-7M, a dataset of 817K user-chaptered videos including 7M chapters in total. VidChapters-7M is automatically created from videos online in a scalable manner by scraping user-annotated chapters. We show that pretraining on VidChapters-7M transfers well to dense video captioning tasks in both zero-shot and finetuning settings.
arXiv Detail & Related papers (2023-09-25T08:38:11Z)
Uncovering the Potential of ChatGPT for Discourse Analysis in Dialogue: An Empirical Study [51.079100495163736]
This paper systematically inspects ChatGPT's performance in two discourse analysis tasks: topic segmentation and discourse parsing. ChatGPT demonstrates proficiency in identifying topic structures in general-domain conversations yet struggles considerably in specific-domain conversations. Our deeper investigation indicates that ChatGPT can give more reasonable topic structures than human annotations but only linearly parses the hierarchical rhetorical structures.
arXiv Detail & Related papers (2023-05-15T07:14:41Z)
AudioGen: Textually Guided Audio Generation [116.57006301417306]
We tackle the problem of generating audio samples conditioned on descriptive text captions. In this work, we propose AaudioGen, an auto-regressive model that generates audio samples conditioned on text inputs.
arXiv Detail & Related papers (2022-09-30T10:17:05Z)
Visual Subtitle Feature Enhanced Video Outline Generation [23.831220964676973]
We introduce a novel video understanding task, namely video outline generation (VOG) To learn and evaluate VOG, we annotate a 10k+ dataset, called DuVOG. We propose a Visual Subtitle feature Enhanced video outline generation model (VSENet)
arXiv Detail & Related papers (2022-08-24T05:26:26Z)
Topic Modeling on Podcast Short-Text Metadata [0.9539495585692009]
We assess the feasibility to discover relevant topics from podcast metadata, titles and descriptions, using modeling techniques for short text. We propose a new strategy to named entities (NEs), often present in podcast metadata, in a Non-negative Matrix Factorization modeling framework. Our experiments on two existing datasets from Spotify and iTunes and Deezer, show that our proposed document representation, NEiCE, leads to improved coherence over the baselines.
arXiv Detail & Related papers (2022-01-12T11:07:05Z)
Identifying Introductions in Podcast Episodes from Automatically Generated Transcripts [0.0]
We build a novel dataset of complete transcriptions of over 400 podcast episodes. These introductions contain information about the episodes' topics, hosts, and guests. We train three Transformer models based on the pre-trained BERT and different augmentation strategies.
arXiv Detail & Related papers (2021-10-14T00:34:51Z)
Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization [72.54873655114844]
Text summarization is one of the most challenging and interesting problems in NLP. This work proposes a multi-view sequence-to-sequence model by first extracting conversational structures of unstructured daily chats from different views to represent conversations. Experiments on a large-scale dialogue summarization corpus demonstrated that our methods significantly outperformed previous state-of-the-art models via both automatic evaluations and human judgment.
arXiv Detail & Related papers (2020-10-04T20:12:44Z)
PodSumm -- Podcast Audio Summarization [0.0]
We propose a method to automatically construct a podcast summary via guidance from the text-domain. Motivated by a lack of datasets for this task, we curate an internal dataset, find an effective scheme for data augmentation, and design a protocol to gather summaries from annotators. Our method achieves ROUGE-F(1/2/L) scores of 0.63/0.53/0.63 on our dataset.
arXiv Detail & Related papers (2020-09-22T04:49:33Z)
Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language. We generate abstractive summaries of narrated instructional videos across a wide variety of topics. We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)
The Shmoop Corpus: A Dataset of Stories with Loosely Aligned Summaries [72.48439126769627]
We introduce the Shmoop Corpus: a dataset of 231 stories paired with detailed multi-paragraph summaries for each individual chapter. From the corpus, we construct a set of common NLP tasks, including Cloze-form question answering and a simplified form of abstractive summarization. We believe that the unique structure of this corpus provides an important foothold towards making machine story comprehension more approachable.
arXiv Detail & Related papers (2019-12-30T21:03:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.