Geometry-Aware Losses for Structure-Preserving Text-to-Sign Language Generation
- URL: http://arxiv.org/abs/2509.23011v1
- Date: Sat, 27 Sep 2025 00:06:17 GMT
- Title: Geometry-Aware Losses for Structure-Preserving Text-to-Sign Language Generation
- Authors: Zetian Wu, Tianshuo Zhou, Stefan Lee, Liang Huang,
- Abstract summary: Sign language translation plays a crucial role in enabling effective communication for Deaf and hard-of---hearing individuals.<n>Prior methods often neglect the anatomical constraints and coordination patterns of human skeletal motion.<n>We propose a novel approach that explicitly models the relationships among skeletal joints.
- Score: 14.94145299705437
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Sign language translation from text to video plays a crucial role in enabling effective communication for Deaf and hard--of--hearing individuals. A major challenge lies in generating accurate and natural body poses and movements that faithfully convey intended meanings. Prior methods often neglect the anatomical constraints and coordination patterns of human skeletal motion, resulting in rigid or biomechanically implausible outputs. To address this, we propose a novel approach that explicitly models the relationships among skeletal joints--including shoulders, arms, and hands--by incorporating geometric constraints on joint positions, bone lengths, and movement dynamics. During training, we introduce a parent-relative reweighting mechanism to enhance finger flexibility and reduce motion stiffness. Additionally, bone-pose losses and bone-length constraints enforce anatomically consistent structures. Our method narrows the performance gap between the previous best and the ground-truth oracle by 56.51%, and further reduces discrepancies in bone length and movement variance by 18.76% and 5.48%, respectively, demonstrating significant gains in anatomical realism and motion naturalness.
Related papers
- Beyond Global Alignment: Fine-Grained Motion-Language Retrieval via Pyramidal Shapley-Taylor Learning [56.6025512458557]
Motion-language retrieval aims to bridge the semantic gap between natural language and human motion.<n>Existing approaches predominantly focus on aligning entire motion sequences with global textual representations.<n>We propose a novel Pyramidal Shapley-Taylor (PST) learning framework for fine-grained motion-language retrieval.
arXiv Detail & Related papers (2026-01-29T16:00:12Z) - PALUM: Part-based Attention Learning for Unified Motion Retargeting [53.17113525688095]
Remotion between characters with different skeleton structures is a fundamental challenge in computer animation.<n>We present a novel approach that learns common motion representations across diverse skeleton topologies.<n>Experiments demonstrate superior performance in handling diverse skeletal structures while maintaining motion realism and semantic fidelity.
arXiv Detail & Related papers (2026-01-12T07:29:44Z) - Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models [80.28579390566298]
We introduce Interact2Ar, a text-conditioned autoregressive diffusion model for generating full-body, human-human interactions.<n>Hand kinematics are incorporated through dedicated parallel branches, enabling high-fidelity full-body generation.<n>Our model enables a series of downstream applications, including temporal motion composition, real-time adaptation to disturbances, and extension beyond dyadic to multi-person scenarios.
arXiv Detail & Related papers (2025-12-22T18:59:50Z) - Bridging Structural Dynamics and Biomechanics: Human Motion Estimation through Footstep-Induced Floor Vibrations [2.7180946990643466]
Existing approaches involve monitoring devices such as cameras, wearables, and pressure mats.<n>We leverage gait-induced floor vibration to estimate lower-limb joint motion.<n>Our model poses physical constraints to reduce uncertainty while allowing information sharing between the body and the floor.
arXiv Detail & Related papers (2025-02-21T20:10:15Z) - Continual Learning from Simulated Interactions via Multitask Prospective Rehearsal for Bionic Limb Behavior Modeling [0.7922558880545526]
We introduce a model for human behavior in the context of bionic prosthesis control.<n>We propose a multitasking, continually adaptive model that anticipates and refines movements over time.<n>We validate our model through experiments on real-world human gait datasets, including transtibial amputees.
arXiv Detail & Related papers (2024-05-02T09:22:54Z) - Two-Person Interaction Augmentation with Skeleton Priors [16.65884142618145]
We propose a new deep learning method for two-body skeletal interaction motion augmentation.
Our system can learn effectively from a relatively small amount of data and generalize to drastically different skeleton sizes.
arXiv Detail & Related papers (2024-04-08T13:11:57Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - Learning Realistic Joint Space Boundaries for Range of Motion Analysis of Healthy and Impaired Human Arms [0.5530212768657544]
We propose a data-driven method to learn realistic anatomically constrained upper-limb range of motion boundaries from motion capture data.<n>We also propose an impairment index (II) metric that offers a quantitative assessment of capability/impairment when comparing healthy and impaired arms.
arXiv Detail & Related papers (2023-11-17T17:14:42Z) - Imposing Temporal Consistency on Deep Monocular Body Shape and Pose
Estimation [67.23327074124855]
This paper presents an elegant solution for the integration of temporal constraints in the fitting process.
We derive parameters of a sequence of body models, representing shape and motion of a person, including jaw poses, facial expressions, and finger poses.
Our approach enables the derivation of realistic 3D body models from image sequences, including facial expression and articulated hands.
arXiv Detail & Related papers (2022-02-07T11:11:55Z) - Towards Understanding the Adversarial Vulnerability of Skeleton-based
Action Recognition [133.35968094967626]
Skeleton-based action recognition has attracted increasing attention due to its strong adaptability to dynamic circumstances.
With the help of deep learning techniques, it has also witnessed substantial progress and currently achieved around 90% accuracy in benign environment.
Research on the vulnerability of skeleton-based action recognition under different adversarial settings remains scant.
arXiv Detail & Related papers (2020-05-14T17:12:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.