Related papers: Learning to mirror speaking styles incrementally

Learning to mirror speaking styles incrementally

URL: http://arxiv.org/abs/2003.04993v1
Date: Thu, 5 Mar 2020 02:54:32 GMT
Title: Learning to mirror speaking styles incrementally
Authors: Siyi Liu (1), Ziang Leng (1), Derry Wijaya (1) ((1) Boston University)
Abstract summary: Mirroring is the behavior in which one person subconsciously imitates the gesture, speech pattern, or attitude of another. In this work, we explore a method that can learn to mirror the speaking styles of a person incrementally. Our method extracts ngrams that capture a persons speaking styles and uses the ngrams to create patterns for transforming sentences to the persons speaking styles.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mirroring is the behavior in which one person subconsciously imitates the gesture, speech pattern, or attitude of another. In conversations, mirroring often signals the speakers enjoyment and engagement in their communication. In chatbots, methods have been proposed to add personas to the chatbots and to train them to speak or to shift their dialogue style to that of the personas. However, they often require a large dataset consisting of dialogues of the target personalities to train. In this work, we explore a method that can learn to mirror the speaking styles of a person incrementally. Our method extracts ngrams that capture a persons speaking styles and uses the ngrams to create patterns for transforming sentences to the persons speaking styles. Our experiments show that our method is able to capture patterns of speaking style that can be used to transform regular sentences into sentences with the target style.

Related papers

ParaMETA: Towards Learning Disentangled Paralinguistic Speaking Styles Representations from Speech [15.969757677847504]
ParaMETA is a framework for learning and controlling speaking styles directly from speech.<n>It learns disentangled, task-specific embeddings by projecting speech into dedicated subspaces for each type of style.<n>It supports both speech- and text-based prompting and allows users to modify one speaking style while preserving others.
arXiv Detail & Related papers (2026-01-18T07:05:40Z)
F-Actor: Controllable Conversational Behaviour in Full-Duplex Models [70.48189107402145]
We present first open, instruction-following full-stage conversational speech model that can be trained efficiently under typical academic resource constraints.<n>Our model requires just 2,000 hours of data, without relying on large-scale pretraining or multi-stage pretraining.<n>Both the model and training code will be released to enable reproducible research on controllable full-like controllable full-stage speech systems.
arXiv Detail & Related papers (2026-01-16T14:25:57Z)
Style Amnesia: Investigating Speaking Style Degradation and Mitigation in Multi-Turn Spoken Language Models [61.494659340367605]
When spoken language models (SLMs) are instructed to speak in a specific speaking style, they cannot maintain the required speaking styles after several turns of interaction.<n>We focus on paralinguistic speaking styles, including emotion, accent, volume, and speaking speed.<n> explicitly asking the model to recall the style instruction can partially mitigate style amnesia.
arXiv Detail & Related papers (2025-12-29T16:23:54Z)
Aligning Spoken Dialogue Models from User Interactions [55.192134724622235]
We propose a novel preference alignment framework to improve spoken dialogue models on realtime conversations from user interactions.<n>We create a dataset of more than 150,000 preference pairs from raw multi-turn speech conversations annotated with AI feedback.<n>Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.
arXiv Detail & Related papers (2025-06-26T16:45:20Z)
Enhancing Impression Change Prediction in Speed Dating Simulations Based on Speakers' Personalities [2.1740370446058708]
This paper focuses on simulating text dialogues in which impressions between speakers improve during speed dating. We believe that whether an utterance improves a dialogue partner's impression of the speaker may depend on the personalities of both parties. We propose a method that predicts whether an utterance improves a partner's impression of the speaker, considering the personalities.
arXiv Detail & Related papers (2025-02-07T07:18:32Z)
LLM Roleplay: Simulating Human-Chatbot Interaction [52.03241266241294]
We propose a goal-oriented, persona-based method to automatically generate diverse multi-turn dialogues simulating human-chatbot interaction. Our method can simulate human-chatbot dialogues with a high indistinguishability rate.
arXiv Detail & Related papers (2024-07-04T14:49:46Z)
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations [65.29513437838457]
Even if two current turns are the same sentence, their responses might still differ when they are spoken in different styles. We propose Spoken-LLM framework that can model the linguistic content and the speaking styles. We train Spoken-LLM using the StyleTalk dataset and devise a two-stage training pipeline to help the Spoken-LLM better learn the speaking styles.
arXiv Detail & Related papers (2024-02-20T07:51:43Z)
Conversation Style Transfer using Few-Shot Learning [56.43383396058639]
In this paper, we introduce conversation style transfer as a few-shot learning problem. We propose a novel in-context learning approach to solve the task with style-free dialogues as a pivot. We show that conversation style transfer can also benefit downstream tasks.
arXiv Detail & Related papers (2023-02-16T15:27:00Z)
StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles [43.12918949398099]
We propose a one-shot style-controllable talking face generation framework. We aim to attain a speaking style from an arbitrary reference speaking video. We then drive the one-shot portrait to speak with the reference speaking style and another piece of audio.
arXiv Detail & Related papers (2023-01-03T13:16:24Z)
Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS [7.384726530165295]
Style control of synthetic speech is often restricted to discrete emotion categories. We propose a text-based interface for emotional style control and cross-speaker style transfer in multi-speaker TTS.
arXiv Detail & Related papers (2022-07-13T07:05:44Z)
Imitating Arbitrary Talking Style for Realistic Audio-DrivenTalking Face Synthesis [17.650661515807993]
We propose to inject style into the talking face synthesis framework through imitating arbitrary talking style of the particular reference video. We devise a latent-style-fusion(LSF) model to synthesize stylized talking faces by imitating talking styles from the style codes.
arXiv Detail & Related papers (2021-10-30T08:15:27Z)
Stylized Dialogue Response Generation Using Stylized Unpaired Texts [63.69880979112312]
This paper proposes a stylized dialogue generation method that can capture stylistic features embedded in unpaired texts. Our method can produce dialogue responses that are both coherent to the given context and conform to the target style.
arXiv Detail & Related papers (2020-09-27T01:04:06Z)
Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach [46.50460811211031]
Key challenge is to learn a model that generates gestures for a speaking agent 'A' in the gesturing style of a target speaker 'B' We propose Mix-StAGE, which trains a single model for multiple speakers while learning unique style embeddings for each speaker's gestures. As Mix-StAGE disentangles style and content of gestures, gesturing styles for the same input speech can be altered by simply switching the style embeddings.
arXiv Detail & Related papers (2020-07-24T15:01:02Z)
I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents [69.68400056148336]
We train a goal-oriented model with reinforcement learning against an imitation-learned chit-chat'' model. We show that both models outperform an inverse model baseline and can converse naturally with their dialogue partner in order to achieve goals.
arXiv Detail & Related papers (2020-02-07T16:22:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.