Related papers: Will AI shape the way we speak? The emerging sociolinguistic influence of synthetic voices

Will AI shape the way we speak? The emerging sociolinguistic influence of synthetic voices

URL: http://arxiv.org/abs/2504.10650v1
Date: Mon, 14 Apr 2025 19:04:32 GMT
Title: Will AI shape the way we speak? The emerging sociolinguistic influence of synthetic voices
Authors: Éva Székely, Jūra Miniota, Míša, Hejná,
Abstract summary: We argue that the socioindexical influence of AI-generated speech warrants attention and should become a focus of interdisciplinary research.<n>We argue that the socioindexical influence of AI-generated speech warrants attention and should become a focus of interdisciplinary research.
Score: 3.7777447186369786
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The growing prevalence of conversational voice interfaces, powered by developments in both speech and language technologies, raises important questions about their influence on human communication. While written communication can signal identity through lexical and stylistic choices, voice-based interactions inherently amplify socioindexical elements - such as accent, intonation, and speech style - which more prominently convey social identity and group affiliation. There is evidence that even passive media such as television is likely to influence the audience's linguistic patterns. Unlike passive media, conversational AI is interactive, creating a more immersive and reciprocal dynamic that holds a greater potential to impact how individuals speak in everyday interactions. Such heightened influence can be expected to arise from phenomena such as acoustic-prosodic entrainment and linguistic accommodation, which occur naturally during interaction and enable users to adapt their speech patterns in response to the system. While this phenomenon is still emerging, its potential societal impact could provide organisations, movements, and brands with a subtle yet powerful avenue for shaping and controlling public perception and social identity. We argue that the socioindexical influence of AI-generated speech warrants attention and should become a focus of interdisciplinary research, leveraging new and existing methodologies and technologies to better understand its implications.

Related papers

Aligning Spoken Dialogue Models from User Interactions [55.192134724622235]
We propose a novel preference alignment framework to improve spoken dialogue models on realtime conversations from user interactions.<n>We create a dataset of more than 150,000 preference pairs from raw multi-turn speech conversations annotated with AI feedback.<n>Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.
arXiv Detail & Related papers (2025-06-26T16:45:20Z)
From Personas to Talks: Revisiting the Impact of Personas on LLM-Synthesized Emotional Support Conversations [19.67703146838264]
Large Language Models (LLMs) have revolutionized the generation of emotional support conversations.<n>This paper explores the role of personas in the creation of emotional support conversations.
arXiv Detail & Related papers (2025-02-17T05:24:30Z)
Speaker effects in spoken language comprehension [0.9514940899499753]
The identity of a speaker significantly influences spoken language comprehension by affecting both perception and expectation.<n>We propose an integrative model featuring the interplay between bottom-up perception-based processes driven by acoustic details and top-down expectation-based processes driven by a speaker model.
arXiv Detail & Related papers (2024-12-10T07:03:06Z)
Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generation [0.6964027823688135]
Modern conversational systems lack emotional depth and disfluent characteristic of human interactions. To address this shortcoming, we have designed an innovative speech synthesis pipeline. Within this framework, a cutting-edge language model introduces both human-like emotion and disfluencies in a zero-shot setting.
arXiv Detail & Related papers (2024-03-31T00:38:02Z)
Attention-based Interactive Disentangling Network for Instance-level Emotional Voice Conversion [81.1492897350032]
Emotional Voice Conversion aims to manipulate a speech according to a given emotion while preserving non-emotion components. We propose an Attention-based Interactive diseNtangling Network (AINN) that leverages instance-wise emotional knowledge for voice conversion.
arXiv Detail & Related papers (2023-12-29T08:06:45Z)
Speech-Gesture GAN: Gesture Generation for Robots and Embodied Agents [5.244401764969407]
Embodied agents, in the form of virtual agents or social robots, are rapidly becoming more widespread. We propose a novel framework that can generate sequences of joint angles from the speech text and speech audio utterances.
arXiv Detail & Related papers (2023-09-17T18:46:25Z)
Expanding the Role of Affective Phenomena in Multimodal Interaction Research [57.069159905961214]
We examined over 16,000 papers from selected conferences in multimodal interaction, affective computing, and natural language processing. We identify 910 affect-related papers and present our analysis of the role of affective phenomena in these papers. We find limited research on how affect and emotion predictions might be used by AI systems to enhance machine understanding of human social behaviors and cognitive states.
arXiv Detail & Related papers (2023-05-18T09:08:39Z)
Data-driven emotional body language generation for social robotics [58.88028813371423]
In social robotics, endowing humanoid robots with the ability to generate bodily expressions of affect can improve human-robot interaction and collaboration. We implement a deep learning data-driven framework that learns from a few hand-designed robotic bodily expressions. The evaluation study found that the anthropomorphism and animacy of the generated expressions are not perceived differently from the hand-designed ones.
arXiv Detail & Related papers (2022-05-02T09:21:39Z)
E-ffective: A Visual Analytic System for Exploring the Emotion and Effectiveness of Inspirational Speeches [57.279044079196105]
E-ffective is a visual analytic system allowing speaking experts and novices to analyze both the role of speech factors and their contribution in effective speeches. Two novel visualizations include E-spiral (that shows the emotional shifts in speeches in a visually compact way) and E-script (that connects speech content with key speech delivery information.
arXiv Detail & Related papers (2021-10-28T06:14:27Z)
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability [82.39099867188547]
Emotional text-to-speech synthesis (ETTS) has seen much progress in recent years. We propose a new interactive training paradigm for ETTS, denoted as i-ETTS. We formulate an iterative training strategy with reinforcement learning to ensure the quality of i-ETTS optimization.
arXiv Detail & Related papers (2021-04-03T13:52:47Z)
Disambiguating Affective Stimulus Associations for Robot Perception and Dialogue [67.89143112645556]
We provide a NICO robot with the ability to learn the associations between a perceived auditory stimulus and an emotional expression. NICO is able to do this for both individual subjects and specific stimuli, with the aid of an emotion-driven dialogue system. The robot is then able to use this information to determine a subject's enjoyment of perceived auditory stimuli in a real HRI scenario.
arXiv Detail & Related papers (2021-03-05T20:55:48Z)
Generating Emotionally Aligned Responses in Dialogues using Affect Control Theory [15.848210524718219]
Affect Control Theory (ACT) is a socio-mathematical model of emotions for human-human interactions. We investigate how ACT can be used to develop affect-aware neural conversational agents.
arXiv Detail & Related papers (2020-03-07T19:31:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.