Related papers: V-VAE: A Variational Auto Encoding Framework Towards Fine-Grained Control over Human-Like Chat

V-VAE: A Variational Auto Encoding Framework Towards Fine-Grained Control over Human-Like Chat

URL: http://arxiv.org/abs/2506.01524v1
Date: Mon, 02 Jun 2025 10:38:02 GMT
Title: V-VAE: A Variational Auto Encoding Framework Towards Fine-Grained Control over Human-Like Chat
Authors: Qi Lin, Weikai Xu, Lisi Chen, Bin Dai,
Abstract summary: Role-play and persona-based chat approaches rely heavily on static role descriptions, coarse-grained signal space, and low-quality synthetic data.<n>Human-like chat requires modeling subtle latent traits, such as emotional tone, situational awareness, and evolving personality.<n>To address these limitations, we propose a Verbal Auto-Bench (V-VAE) framework containing a variational auto-coding module and fine-grained, interpretable latent variables.
Score: 19.038481783630864
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the continued proliferation of Large Language Model (LLM) based chatbots, there is a growing demand for generating responses that are not only linguistically fluent but also consistently aligned with persona-specific traits in conversations. However, existing role-play and persona-based chat approaches rely heavily on static role descriptions, coarse-grained signal space, and low-quality synthetic data, which fail to capture dynamic fine-grained details in human-like chat. Human-like chat requires modeling subtle latent traits, such as emotional tone, situational awareness, and evolving personality, which are difficult to predefine and cannot be easily learned from synthetic or distillation-based data. To address these limitations, we propose a Verbal Variational Auto-Encoding (V-VAE) framework, containing a variational auto-encoding module and fine-grained control space which dynamically adapts dialogue behaviour based on fine-grained, interpretable latent variables across talking style, interaction patterns, and personal attributes. We also construct a high-quality dataset, HumanChatData, and benchmark HumanChatBench to address the scarcity of high-quality data in the human-like domain. Experiments show that LLMs based on V-VAE consistently outperform standard baselines on HumanChatBench and DialogBench, which further demonstrates the effectiveness of V-VAE and HumanChatData.

Related papers

DialogueForge: LLM Simulation of Human-Chatbot Dialogue [7.038493120049631]
We propose DialogueForge as a framework for generating AI-simulated conversations in human-chatbot style.<n>To each generated conversation, DialogueForge uses seed prompts extracted from real human-chatbot interactions.<n>We evaluate the quality of the simulated conversations and compare different models using the UniEval and GTEval evaluation protocols.
arXiv Detail & Related papers (2025-07-21T16:08:19Z)
Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset [113.25650486482762]
We introduce the Seamless Interaction dataset, a large-scale collection of over 4,000 hours of face-to-face interaction footage.<n>This dataset enables the development of AI technologies that understand dyadic embodied dynamics.<n>We develop a suite of models that utilize the dataset to generate dyadic motion gestures and facial expressions aligned with human speech.
arXiv Detail & Related papers (2025-06-27T18:09:49Z)
REALTALK: A 21-Day Real-World Dataset for Long-Term Conversation [51.97224538045096]
We introduce REALTALK, a 21-day corpus of authentic messaging app dialogues.<n>We compare EI attributes and persona consistency to understand the challenges posed by real-world dialogues.<n>Our findings reveal that models struggle to simulate a user solely from dialogue history, while fine-tuning on specific user chats improves persona emulation.
arXiv Detail & Related papers (2025-02-18T20:29:01Z)
VAGUE: Visual Contexts Clarify Ambiguous Expressions [15.140825578254324]
We introduce VAGUE, a benchmark evaluating multimodal AI systems' ability to integrate visual context for intent.<n>VAGUE consists of 1.6K ambiguous textual expressions, each paired with an image and multiple-choice interpretations.<n>Our experiments reveal that existing multimodal AI models struggle to infer the speaker's true intent.
arXiv Detail & Related papers (2024-11-21T14:01:42Z)
DiverseDialogue: A Methodology for Designing Chatbots with Human-Like Diversity [5.388338680646657]
We show that GPT-4o mini, when used as simulated human participants, systematically differ from those between actual humans across multiple linguistic features. We propose an approach that automatically generates prompts for user simulations by incorporating features derived from real human interactions. Our method of prompt optimization, tailored to target specific linguistic features, shows significant improvements.
arXiv Detail & Related papers (2024-08-30T21:33:58Z)
PersonalityChat: Conversation Distillation for Personalized Dialog Modeling with Facts and Traits [5.447308344436046]
PersonalityChat is a synthetic conversational dataset based upon the popular PersonaChat dataset. We show that the personality trait labels can be used for trait-based personalization of generative dialogue models.
arXiv Detail & Related papers (2024-01-14T20:35:33Z)
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue [71.15186328127409]
Paralinguistics-enhanced Generative Pretrained Transformer (ParalinGPT) Model takes the conversational context of text, speech embeddings, and paralinguistic attributes as input prompts within a serialized multitasking framework. We utilize the Switchboard-1 corpus, including its sentiment labels as the paralinguistic attribute, as our spoken dialogue dataset.
arXiv Detail & Related papers (2023-12-23T18:14:56Z)
Faithful Persona-based Conversational Dataset Generation with Large Language Models [10.506653172302222]
High-quality conversational datasets are essential for developing AI models that can communicate with users. We propose a Generator-Critic architecture framework to expand the initial dataset, while improving the quality of its conversations. We release Synthetic-Persona-Chat, consisting of 20k conversations seeded from Persona-Chat.
arXiv Detail & Related papers (2023-12-15T18:23:50Z)
M$^{2}$Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image Generation [45.79215260916687]
We propose textbf$M2Chat$, a novel unified multimodal LLM framework for generating interleaved text-image conversation. $M3Adapter$ integrates granular low-level visual information and high-level semantic features from multi-modality prompts. $M3FT$ fine-tuning strategy optimize disjoint groups of parameters for image-text alignment and visual-instruction.
arXiv Detail & Related papers (2023-11-29T11:30:33Z)
PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator [39.40718009289621]
We propose a paradigm to simulate human behavior better and explore the benefits of incorporating more human-like questions in multi-turn conversations. Specifically, we target human questions extracted from genuine human-machine conversations as a learning goal and provide a novel user simulator called Socratic' Our results show our response model, PlatoLM', achieves SoTA performance among LLaMA-based 7B models in MT-Bench.
arXiv Detail & Related papers (2023-08-21T06:51:56Z)
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations [91.98516412612739]
We first provide a systematically designed, diverse, informative, large-scale dataset of instructional conversations, UltraChat. Our objective is to capture the breadth of interactions that a human might have with an AI assistant. We fine-tune a LLaMA model to create a powerful conversational model, UltraLLaMA.
arXiv Detail & Related papers (2023-05-23T16:49:14Z)
PLACES: Prompting Language Models for Social Conversation Synthesis [103.94325597273316]
We use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting. We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations.
arXiv Detail & Related papers (2023-02-07T05:48:16Z)
A Probabilistic Model Of Interaction Dynamics for Dyadic Face-to-Face Settings [1.9544213396776275]
We develop a probabilistic model to capture the interaction dynamics between pairs of participants in a face-to-face setting. This interaction encoding is then used to influence the generation when predicting one agent's future dynamics. We show that our model successfully delineates between the modes, based on their interacting dynamics.
arXiv Detail & Related papers (2022-07-10T23:31:27Z)
Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion [89.01668641930206]
We present a framework for modeling interactional communication in dyadic conversations. We autoregressively output multiple possibilities of corresponding listener motion. Our method organically captures the multimodal and non-deterministic nature of nonverbal dyadic interactions.
arXiv Detail & Related papers (2022-04-18T17:58:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.