Related papers: A$^{2}$V-SLP: Alignment-Aware Variational Modeling for Disentangled Sign Language Production

A$^{2}$V-SLP: Alignment-Aware Variational Modeling for Disentangled Sign Language Production

URL: http://arxiv.org/abs/2602.11861v1
Date: Thu, 12 Feb 2026 12:07:32 GMT
Title: A$^{2}$V-SLP: Alignment-Aware Variational Modeling for Disentangled Sign Language Production
Authors: Sümeyye Meryem Taşyürek, Enis Mücahid İskender, Hacer Yalim Keles,
Abstract summary: A$2$V-SLP learns articulator-wise disentangled latent distributions rather than deterministic embeddings.<n>A disentangled Variational Autoencoder encodes ground-truth sign pose sequences and extracts articulator-specific mean and variance vectors.
Score: 0.9384603486206738
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Building upon recent structural disentanglement frameworks for sign language production, we propose A$^{2}$V-SLP, an alignment-aware variational framework that learns articulator-wise disentangled latent distributions rather than deterministic embeddings. A disentangled Variational Autoencoder (VAE) encodes ground-truth sign pose sequences and extracts articulator-specific mean and variance vectors, which are used as distributional supervision for training a non-autoregressive Transformer. Given text embeddings, the Transformer predicts both latent means and log-variances, while the VAE decoder reconstructs the final sign pose sequences through stochastic sampling at the decoding stage. This formulation maintains articulator-level representations by avoiding deterministic latent collapse through distributional latent modeling. In addition, we integrate a gloss attention mechanism to strengthen alignment between linguistic input and articulated motion. Experimental results show consistent gains over deterministic latent regression, achieving state-of-the-art back-translation performance and improved motion realism in a fully gloss-free setting.

Related papers

Disentangle and Regularize: Sign Language Production with Articulator-Based Disentanglement and Channel-Aware Regularization [0.9384603486206738]
We train a pose autoencoder that encodes sign poses into a compact latent space using an articulator-based disentanglement strategy.<n>Next, a non-autoregressive transformer decoder is trained to predict these latent representations from word-level text embeddings of the input sentence.<n>Our approach does not rely on gloss supervision or pretrained models, and achieves state-of-the-art results on the PHOENIX14T and CSL-Daily datasets.
arXiv Detail & Related papers (2025-04-09T06:14:19Z)
Latent Lexical Projection in Large Language Models: A Novel Approach to Implicit Representation Refinement [0.0]
Latent Lexical Projection (LLP) is introduced to refine lexical representations through a structured transformation into a latent space.<n>LLP integrates an optimized projection mechanism within an existing language model architecture.<n> Evaluations indicate a reduction in perplexity and an increase in BLEU scores, suggesting improvements in predictive accuracy and fluency.
arXiv Detail & Related papers (2025-02-03T23:18:53Z)
PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE enhances global feature representation of point cloud masked autoencoders by making them both discriminative and sensitive to transformations.<n>We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations.
arXiv Detail & Related papers (2024-09-24T07:57:21Z)
Autoregressive Speech Synthesis without Vector Quantization [135.4776759536272]
We present MELLE, a novel continuous-valued token based language modeling approach for text-to-speech synthesis (TTS)<n>MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition.<n>MELLE mitigates robustness issues by avoiding the inherent flaws of sampling vector-quantized codes.
arXiv Detail & Related papers (2024-07-11T14:36:53Z)
How to train your VAE [0.0]
Variational Autoencoders (VAEs) have become a cornerstone in generative modeling and representation learning within machine learning. This paper explores interpreting the Kullback-Leibler (KL) Divergence, a critical component within the Evidence Lower Bound (ELBO) The proposed method redefines the ELBO with a mixture of Gaussians for the posterior probability, introduces a regularization term, and employs a PatchGAN discriminator to enhance texture realism.
arXiv Detail & Related papers (2023-09-22T19:52:28Z)
Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in Transformer-Based Variational AutoEncoder for Diverse Text Generation [85.5379146125199]
Variational Auto-Encoder (VAE) has been widely adopted in text generation. We propose TRACE, a Transformer-based recurrent VAE structure.
arXiv Detail & Related papers (2022-10-22T10:25:35Z)
Towards Unsupervised Content Disentanglement in Sentence Representations via Syntactic Roles [0.9582466286528458]
We develop an Attention-Driven Variational Autoencoder (ADVAE) We show that it is possible to obtain representations of sentences where different syntactic roles correspond to clearly identified latent variables. Our work constitutes a first step towards unsupervised controllable content generation.
arXiv Detail & Related papers (2022-06-22T15:50:01Z)
Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency. We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z)
Unsupervised Controllable Generation with Self-Training [90.04287577605723]
controllable generation with GANs remains a challenging research problem. We propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training. Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder.
arXiv Detail & Related papers (2020-07-17T21:50:35Z)
Improve Variational Autoencoder for Text Generationwith Discrete Latent Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. VAEs tend to ignore latent variables with a strong auto-regressive decoder. We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.