A$^{2}$V-SLP: Alignment-Aware Variational Modeling for Disentangled Sign Language Production
- URL: http://arxiv.org/abs/2602.11861v1
- Date: Thu, 12 Feb 2026 12:07:32 GMT
- Title: A$^{2}$V-SLP: Alignment-Aware Variational Modeling for Disentangled Sign Language Production
- Authors: Sümeyye Meryem Taşyürek, Enis Mücahid İskender, Hacer Yalim Keles,
- Abstract summary: A$2$V-SLP learns articulator-wise disentangled latent distributions rather than deterministic embeddings.<n>A disentangled Variational Autoencoder encodes ground-truth sign pose sequences and extracts articulator-specific mean and variance vectors.
- Score: 0.9384603486206738
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Building upon recent structural disentanglement frameworks for sign language production, we propose A$^{2}$V-SLP, an alignment-aware variational framework that learns articulator-wise disentangled latent distributions rather than deterministic embeddings. A disentangled Variational Autoencoder (VAE) encodes ground-truth sign pose sequences and extracts articulator-specific mean and variance vectors, which are used as distributional supervision for training a non-autoregressive Transformer. Given text embeddings, the Transformer predicts both latent means and log-variances, while the VAE decoder reconstructs the final sign pose sequences through stochastic sampling at the decoding stage. This formulation maintains articulator-level representations by avoiding deterministic latent collapse through distributional latent modeling. In addition, we integrate a gloss attention mechanism to strengthen alignment between linguistic input and articulated motion. Experimental results show consistent gains over deterministic latent regression, achieving state-of-the-art back-translation performance and improved motion realism in a fully gloss-free setting.
Related papers
- Disentangle and Regularize: Sign Language Production with Articulator-Based Disentanglement and Channel-Aware Regularization [0.9384603486206738]
We train a pose autoencoder that encodes sign poses into a compact latent space using an articulator-based disentanglement strategy.<n>Next, a non-autoregressive transformer decoder is trained to predict these latent representations from word-level text embeddings of the input sentence.<n>Our approach does not rely on gloss supervision or pretrained models, and achieves state-of-the-art results on the PHOENIX14T and CSL-Daily datasets.
arXiv Detail & Related papers (2025-04-09T06:14:19Z) - Latent Lexical Projection in Large Language Models: A Novel Approach to Implicit Representation Refinement [0.0]
Latent Lexical Projection (LLP) is introduced to refine lexical representations through a structured transformation into a latent space.<n>LLP integrates an optimized projection mechanism within an existing language model architecture.<n> Evaluations indicate a reduction in perplexity and an increase in BLEU scores, suggesting improvements in predictive accuracy and fluency.
arXiv Detail & Related papers (2025-02-03T23:18:53Z) - PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE enhances global feature representation of point cloud masked autoencoders by making them both discriminative and sensitive to transformations.<n>We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations.
arXiv Detail & Related papers (2024-09-24T07:57:21Z) - Autoregressive Speech Synthesis without Vector Quantization [135.4776759536272]
We present MELLE, a novel continuous-valued token based language modeling approach for text-to-speech synthesis (TTS)<n>MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition.<n>MELLE mitigates robustness issues by avoiding the inherent flaws of sampling vector-quantized codes.
arXiv Detail & Related papers (2024-07-11T14:36:53Z) - How to train your VAE [0.0]
Variational Autoencoders (VAEs) have become a cornerstone in generative modeling and representation learning within machine learning.
This paper explores interpreting the Kullback-Leibler (KL) Divergence, a critical component within the Evidence Lower Bound (ELBO)
The proposed method redefines the ELBO with a mixture of Gaussians for the posterior probability, introduces a regularization term, and employs a PatchGAN discriminator to enhance texture realism.
arXiv Detail & Related papers (2023-09-22T19:52:28Z) - Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in
Transformer-Based Variational AutoEncoder for Diverse Text Generation [85.5379146125199]
Variational Auto-Encoder (VAE) has been widely adopted in text generation.
We propose TRACE, a Transformer-based recurrent VAE structure.
arXiv Detail & Related papers (2022-10-22T10:25:35Z) - Towards Unsupervised Content Disentanglement in Sentence Representations
via Syntactic Roles [0.9582466286528458]
We develop an Attention-Driven Variational Autoencoder (ADVAE)
We show that it is possible to obtain representations of sentences where different syntactic roles correspond to clearly identified latent variables.
Our work constitutes a first step towards unsupervised controllable content generation.
arXiv Detail & Related papers (2022-06-22T15:50:01Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z) - Unsupervised Controllable Generation with Self-Training [90.04287577605723]
controllable generation with GANs remains a challenging research problem.
We propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training.
Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder.
arXiv Detail & Related papers (2020-07-17T21:50:35Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.