Related papers: AI-generated podcasts: Synthetic Intimacy and Cultural Translation in NotebookLM's Audio Overviews

AI-generated podcasts: Synthetic Intimacy and Cultural Translation in NotebookLM's Audio Overviews

URL: http://arxiv.org/abs/2511.08654v1
Date: Thu, 13 Nov 2025 01:01:33 GMT
Title: AI-generated podcasts: Synthetic Intimacy and Cultural Translation in NotebookLM's Audio Overviews
Authors: Jill Walker Rettberg,
Abstract summary: This paper analyses AI-generated podcasts produced by Google's NotebookLM.<n> NotebookLM generates audio podcasts with two chatty AI hosts discussing whichever documents a user uploads.<n>I show how the podcasts' structure is built around a fixed template.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper analyses AI-generated podcasts produced by Google's NotebookLM, which generates audio podcasts with two chatty AI hosts discussing whichever documents a user uploads. While AI-generated podcasts have been discussed as tools, for instance in medical education, they have not yet been analysed as media. By uploading different types of text and analysing the generated outputs I show how the podcasts' structure is built around a fixed template. I also find that NotebookLM not only translates texts from other languages into a perky standardised Mid-Western American accent, it also translates cultural contexts to a white, educated, middle-class American default. This is a distinct development in how publics are shaped by media, marking a departure from the multiple public spheres that scholars have described in human podcasting from the early 2000s until today, where hosts spoke to specific communities and responded to listener comments, to an abstraction of the podcast genre.

Related papers

Rhapsody: A Dataset for Highlight Detection in Podcasts [49.576469262265455]
We introduce Rhapsody, a 13K podcast episodes paired with segment-level highlight.<n>We frame the podcast highlight detection as a segment-level binary classification task.<n>We explore various baseline language models and lightweight fine-tuned language models.
arXiv Detail & Related papers (2025-05-26T02:39:34Z)
MoonCast: High-Quality Zero-Shot Podcast Generation [81.29927724674602]
MoonCast is a solution for high-quality zero-shot podcast generation.<n>It aims to synthesize natural podcast-style speech from text-only sources.<n>Experiments demonstrate that MoonCast outperforms baselines.
arXiv Detail & Related papers (2025-03-18T15:25:08Z)
Long-Form Speech Generation with Spoken Language Models [64.29591880693468]
Textless spoken language models struggle to generate plausible speech past tens of seconds.<n>We derive SpeechSSM, the first speech language model family to learn from and sample long-form spoken audio.<n>SpeechSSMs leverage recent advances in linear-time sequence modeling to greatly surpass current Transformer spoken LMs in coherence and efficiency.
arXiv Detail & Related papers (2024-12-24T18:56:46Z)
Mapping the Podcast Ecosystem with the Structured Podcast Research Corpus [23.70786221902932]
We introduce a massive dataset of over 1.1M podcast transcripts available through public RSS feeds from May and June of 2020. This data is not limited to text, but rather includes audio features and speaker turns for a subset of 370K episodes. Using this data, we also conduct a foundational investigation into the content, structure, and responsiveness of this popular impactful medium.
arXiv Detail & Related papers (2024-11-12T15:56:48Z)
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations [97.75037148056367]
CoVoMix is a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation.<n>We devise a comprehensive set of metrics for measuring the effectiveness of dialogue modeling and generation.
arXiv Detail & Related papers (2024-04-10T02:32:58Z)
Can Language Models Learn to Listen? [96.01685069483025]
We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words. Our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE. We show that our generated listener motion is fluent and reflective of language semantics through quantitative metrics and a qualitative user study.
arXiv Detail & Related papers (2023-08-21T17:59:02Z)
WavJourney: Compositional Audio Creation with Large Language Models [38.39551216587242]
We present WavJourney, a novel framework that leverages Large Language Models to connect various audio models for audio creation. WavJourney allows users to create storytelling audio content with diverse audio elements simply from textual descriptions. We show that WavJourney is capable of synthesizing realistic audio aligned with textually-described semantic, spatial and temporal conditions.
arXiv Detail & Related papers (2023-07-26T17:54:04Z)
AudioPaLM: A Large Language Model That Can Speak and Listen [79.44757696533709]
We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models. It can process and generate text and speech with applications including speech recognition and speech-to-speech translation.
arXiv Detail & Related papers (2023-06-22T14:37:54Z)
NewsPod: Automatic and Interactive News Podcasts [18.968547560235347]
NewsPod is an automatically generated, interactive news podcast. The podcast is divided into segments, each centered on a news event, with each segment structured as a Question and Answer conversation. A novel aspect of NewsPod allows listeners to interact with the podcast by asking their own questions and receiving automatically generated answers.
arXiv Detail & Related papers (2022-02-15T02:37:04Z)
Topic Modeling on Podcast Short-Text Metadata [0.9539495585692009]
We assess the feasibility to discover relevant topics from podcast metadata, titles and descriptions, using modeling techniques for short text. We propose a new strategy to named entities (NEs), often present in podcast metadata, in a Non-negative Matrix Factorization modeling framework. Our experiments on two existing datasets from Spotify and iTunes and Deezer, show that our proposed document representation, NEiCE, leads to improved coherence over the baselines.
arXiv Detail & Related papers (2022-01-12T11:07:05Z)
Modeling Language Usage and Listener Engagement in Podcasts [3.8966039534272916]
We investigate how various factors -- vocabulary diversity, distinctiveness, emotion, and syntax -- correlate with engagement. We build models with different textual representations, and show that the identified features are highly predictive of engagement. Our analysis tests popular wisdom about stylistic elements in high-engagement podcasts, corroborating some aspects, and adding new perspectives on others.
arXiv Detail & Related papers (2021-06-11T20:40:15Z)
A Baseline Analysis for Podcast Abstractive Summarization [18.35061145103997]
This paper presents a baseline analysis of podcast summarization using the Spotify Podcast dataset. It aims to help researchers understand current state-of-the-art pre-trained models and hence build a foundation for creating better models.
arXiv Detail & Related papers (2020-08-24T18:38:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.