Social Agent: Mastering Dyadic Nonverbal Behavior Generation via Conversational LLM Agents
- URL: http://arxiv.org/abs/2510.04637v1
- Date: Mon, 06 Oct 2025 09:41:37 GMT
- Title: Social Agent: Mastering Dyadic Nonverbal Behavior Generation via Conversational LLM Agents
- Authors: Zeyi Zhang, Yanju Zhou, Heyuan Yao, Tenglong Ao, Xiaohang Zhan, Libin Liu,
- Abstract summary: Social Agent is a novel framework for synthesizing realistic and contextually appropriate co-speech nonverbal behaviors in dyadic conversations.<n>We develop an agentic system driven by a Large Language Model (LLM) to direct the conversation flow and determine appropriate interactive behaviors for both participants.<n>We propose a novel dual-person gesture generation model based on an auto-regressive diffusion model, which synthesizes coordinated motions from speech signals.
- Score: 13.902411927285328
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Social Agent, a novel framework for synthesizing realistic and contextually appropriate co-speech nonverbal behaviors in dyadic conversations. In this framework, we develop an agentic system driven by a Large Language Model (LLM) to direct the conversation flow and determine appropriate interactive behaviors for both participants. Additionally, we propose a novel dual-person gesture generation model based on an auto-regressive diffusion model, which synthesizes coordinated motions from speech signals. The output of the agentic system is translated into high-level guidance for the gesture generator, resulting in realistic movement at both the behavioral and motion levels. Furthermore, the agentic system periodically examines the movements of interlocutors and infers their intentions, forming a continuous feedback loop that enables dynamic and responsive interactions between the two participants. User studies and quantitative evaluations show that our model significantly improves the quality of dyadic interactions, producing natural, synchronized nonverbal behaviors.
Related papers
- MIBURI: Towards Expressive Interactive Gesture Synthesis [62.45332399212876]
Embodied Conversational Agents (ECAs) aim to emulate human face-to-face interaction through speech, gestures, and facial expressions.<n>Existing solutions for ECAs produce rigid, low-diversity motions that are unsuitable for human-like interaction.<n>We present MIBURI, the first online, causal framework for generating expressive full-body gestures and facial expressions synchronized with real-time spoken dialogue.
arXiv Detail & Related papers (2026-03-03T18:59:51Z) - Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models [80.28579390566298]
We introduce Interact2Ar, a text-conditioned autoregressive diffusion model for generating full-body, human-human interactions.<n>Hand kinematics are incorporated through dedicated parallel branches, enabling high-fidelity full-body generation.<n>Our model enables a series of downstream applications, including temporal motion composition, real-time adaptation to disturbances, and extension beyond dyadic to multi-person scenarios.
arXiv Detail & Related papers (2025-12-22T18:59:50Z) - InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation [1.7523719472700858]
We introduce InteracTalker, a novel framework that seamlessly integrates prompt-based object-aware interactions with co-speech gesture generation.<n>Our framework utilizes a Generalized Motion Adaptation Module that enables independent training, adapting to the corresponding motion condition.<n>InteracTalker successfully unifies these previously separate tasks, outperforming prior methods in both co-speech gesture generation and object-interaction synthesis.
arXiv Detail & Related papers (2025-12-14T12:29:49Z) - Chronological Thinking in Full-Duplex Spoken Dialogue Language Models [66.84843878538207]
Chronological Thinking aims to improve response quality in full SDLMs.<n>No additional latency: once the user stops speaking, the agent halts thinking and begins speaking without further delay.<n>Results: Experiments demonstrate the effectiveness of chronological thinking through both objective metrics and human evaluations.
arXiv Detail & Related papers (2025-10-02T10:28:11Z) - MoReact: Generating Reactive Motion from Textual Descriptions [57.642436102978245]
MoReact is a diffusion-based method designed to disentangle the generation of global trajectories and local motions sequentially.<n>Our experiments, utilizing data adapted from a two-person motion dataset, demonstrate the efficacy of our approach.
arXiv Detail & Related papers (2025-09-28T14:31:41Z) - InterSyn: Interleaved Learning for Dynamic Motion Synthesis in the Wild [65.29569330744056]
We present Interleaved Learning for Motion Synthesis (InterSyn), a novel framework that targets the generation of realistic interaction motions.<n>InterSyn employs an interleaved learning strategy to capture the natural, dynamic interactions and nuanced coordination inherent in real-world scenarios.
arXiv Detail & Related papers (2025-08-14T03:00:06Z) - Inter-Diffusion Generation Model of Speakers and Listeners for Effective Communication [4.49451692966442]
This paper proposes an Inter-Diffusion Generation Model of Speakers and Listeners for Effective Communication.<n>For the first time, we integrate the full-body gestures of listeners into the generation framework.
arXiv Detail & Related papers (2025-05-08T07:00:58Z) - It Takes Two: Real-time Co-Speech Two-person's Interaction Generation via Reactive Auto-regressive Diffusion Model [34.94330722832987]
We introduce an audio-driven, auto-regressive system designed to synthesize dynamic movements for two characters during a conversation.<n>To the best of our knowledge, this is the first system capable of generating interactive full-body motions for two characters from speech in an online manner.
arXiv Detail & Related papers (2024-12-03T12:31:44Z) - PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, a framework for better data construction and model tuning.<n>For insufficient data usage, we incorporate strategies such as Chain-of-Thought prompting and anti-induction.<n>For rigid behavior patterns, we design the tuning process and introduce automated DPO to enhance the specificity and dynamism of the models' personalities.
arXiv Detail & Related papers (2024-07-17T08:13:22Z) - Dyadic Interaction Modeling for Social Behavior Generation [6.626277726145613]
We present an effective framework for creating 3D facial motions in dyadic interactions.
The heart of our framework is Dyadic Interaction Modeling (DIM), a pre-training approach.
Experiments demonstrate the superiority of our framework in generating listener motions.
arXiv Detail & Related papers (2024-03-14T03:21:33Z) - Persistent-Transient Duality: A Multi-mechanism Approach for Modeling
Human-Object Interaction [58.67761673662716]
Humans are highly adaptable, swiftly switching between different modes to handle different tasks, situations and contexts.
In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline.
This work proposes to model two concurrent mechanisms that jointly control human motion.
arXiv Detail & Related papers (2023-07-24T12:21:33Z) - A Probabilistic Model Of Interaction Dynamics for Dyadic Face-to-Face
Settings [1.9544213396776275]
We develop a probabilistic model to capture the interaction dynamics between pairs of participants in a face-to-face setting.
This interaction encoding is then used to influence the generation when predicting one agent's future dynamics.
We show that our model successfully delineates between the modes, based on their interacting dynamics.
arXiv Detail & Related papers (2022-07-10T23:31:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.