Related papers: EASL: Multi-Emotion Guided Semantic Disentanglement for Expressive Sign Language Generation

EASL: Multi-Emotion Guided Semantic Disentanglement for Expressive Sign Language Generation

URL: http://arxiv.org/abs/2511.22135v1
Date: Thu, 27 Nov 2025 06:04:15 GMT
Title: EASL: Multi-Emotion Guided Semantic Disentanglement for Expressive Sign Language Generation
Authors: Yanchao Zhao, Jihao Zhu, Yu Liu, Weizhuo Chen, Yuling Yang, Kun Peng,
Abstract summary: We propose EASL (Emotion-Aware Sign Language), a multi-emotion-guided generation architecture for fine-grained emotional integration.<n>We introduce emotion-semantic disentanglement modules with progressive training to separately extract semantic and affective features.<n>During pose decoding, the emotional representations guide semantic interaction to generate sign poses with 7-class emotion confidence scores, enabling emotional expression recognition.
Score: 7.76229483761977
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models have revolutionized sign language generation by automatically transforming text into high-quality sign language videos, providing accessible communication for the Deaf community. However, existing LLM-based approaches prioritize semantic accuracy while overlooking emotional expressions, resulting in outputs that lack naturalness and expressiveness. We propose EASL (Emotion-Aware Sign Language), a multi-emotion-guided generation architecture for fine-grained emotional integration. We introduce emotion-semantic disentanglement modules with progressive training to separately extract semantic and affective features. During pose decoding, the emotional representations guide semantic interaction to generate sign poses with 7-class emotion confidence scores, enabling emotional expression recognition. Experimental results demonstrate that EASL achieves pose accuracy superior to all compared baselines by integrating multi-emotion information and effectively adapts to diffusion models to generate expressive sign language videos.

Related papers

E^2-LLM: Bridging Neural Signals and Interpretable Affective Analysis [54.763420895859035]
We present ELLM2-EEG-to-Emotion Large Language Model, first MLLM framework for interpretable emotion analysis from EEG.<n>ELLM integrates a pretrained EEG encoder with Q-based LLMs through learnable projection layers, employing a multi-stage training pipeline.<n>Experiments on the dataset across seven emotion categories demonstrate that ELLM2-EEG-to-Emotion Large Language Model achieves excellent performance on emotion classification.
arXiv Detail & Related papers (2026-01-11T13:21:20Z)
Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier [53.55996102181836]
We propose the Emotional Rationale Verifier (ERV) and an Explanation Reward.<n>Our method guides the model to produce reasoning that is explicitly consistent with the target emotion.<n>We show that our approach not only enhances alignment between explanation and prediction but also empowers MLLMs to deliver emotionally coherent, trustworthy interactions.
arXiv Detail & Related papers (2025-10-27T16:40:17Z)
EmoCAST: Emotional Talking Portrait via Emotive Text Description [56.42674612728354]
EmoCAST is a diffusion-based framework for precise text-driven emotional synthesis.<n>In appearance modeling, emotional prompts are integrated through a text-guided decoupled emotive module.<n>EmoCAST achieves state-of-the-art performance in generating realistic, emotionally expressive, and audio-synchronized talking-head videos.
arXiv Detail & Related papers (2025-08-28T10:02:06Z)
Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation [26.389793087374432]
We present an Audio-Visual Language Model (AVLM) for expressive speech generation.<n>We explore multiple visual encoders and multimodal fusion strategies during pre-training to identify the most effective integration approach.
arXiv Detail & Related papers (2025-08-22T08:08:45Z)
From Coarse to Nuanced: Cross-Modal Alignment of Fine-Grained Linguistic Cues and Visual Salient Regions for Dynamic Emotion Recognition [7.362433184546492]
Dynamic Facial Expression Recognition aims to identify human emotions from temporally evolving facial movements.<n>Our method integrates dynamic motion modeling, semantic text refinement, and token-level cross-modal alignment to facilitate the precise localization of emotionally salient features.
arXiv Detail & Related papers (2025-07-16T04:15:06Z)
DeepGesture: A conversational gesture synthesis system based on emotions and semantics [0.0]
DeepGesture is a diffusion-based gesture synthesis framework.<n>It generates expressive co-speech gestures conditioned on multimodal signals.<n>We show that DeepGesture produces gestures with improved human-likeness and contextual appropriateness.
arXiv Detail & Related papers (2025-07-03T20:04:04Z)
VAEmo: Efficient Representation Learning for Visual-Audio Emotion with Knowledge Injection [50.57849622045192]
We propose VAEmo, an efficient framework for emotion-centric joint VA representation learning with external knowledge injection.<n>VAEmo achieves state-of-the-art performance with a compact design, highlighting the benefit of unified cross-modal encoding and emotion-aware semantic guidance.
arXiv Detail & Related papers (2025-05-05T03:00:51Z)
Contrastive Decoupled Representation Learning and Regularization for Speech-Preserving Facial Expression Manipulation [58.189703277322224]
Speech-preserving facial expression manipulation (SPFEM) aims to modify a talking head to display a specific reference emotion.<n>Emotion and content information existing in reference and source inputs can provide direct and accurate supervision signals for SPFEM models.<n>We propose to learn content and emotion priors as guidance augmented with contrastive learning to learn decoupled content and emotion representation.
arXiv Detail & Related papers (2025-04-08T04:34:38Z)
GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations [35.63053777817013]
GatedxLSTM is a novel multimodal Emotion Recognition in Conversation (ERC) model.<n>It considers voice and transcripts of both the speaker and their conversational partner to identify the most influential sentences driving emotional shifts.<n>It achieves state-of-the-art (SOTA) performance among open-source methods in four-class emotion classification.
arXiv Detail & Related papers (2025-03-26T18:46:18Z)
When Words Smile: Generating Diverse Emotional Facial Expressions from Text [77.1867389815291]
We introduce an end-to-end text-to-expression model that explicitly focuses on emotional dynamics.<n>Our model learns expressive facial variations in a continuous latent space and generates expressions that are diverse, fluid, and emotionally coherent.
arXiv Detail & Related papers (2024-12-03T15:39:05Z)
Empowering Sign Language Communication: Integrating Sentiment and Semantics for Facial Expression Synthesis [0.7223509567556217]
This paper introduces a new method focused in synthesizing facial expressions for sign language. Our goal is to improve sign language production by integrating sentiment information in facial expression generation.
arXiv Detail & Related papers (2024-08-27T15:55:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.