Related papers: G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model

G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model

URL: http://arxiv.org/abs/2208.09141v3
Date: Mon, 18 Dec 2023 16:45:30 GMT
Title: G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model
Authors: Pan Xie, Qipeng Zhang, Taiyi Peng, Hao Tang, Yao Du, Zexian Li
Abstract summary: The Sign Language Production project aims to automatically translate spoken languages into sign sequences. We present a novel solution by converting the continuous pose space generation problem into a discrete sequence generation problem. Our results show that our model outperforms state-of-the-art G2P models on the public SLP evaluation benchmark.
Score: 8.047896755805981
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Sign Language Production (SLP) project aims to automatically translate spoken languages into sign sequences. Our approach focuses on the transformation of sign gloss sequences into their corresponding sign pose sequences (G2P). In this paper, we present a novel solution for this task by converting the continuous pose space generation problem into a discrete sequence generation problem. We introduce the Pose-VQVAE framework, which combines Variational Autoencoders (VAEs) with vector quantization to produce a discrete latent representation for continuous pose sequences. Additionally, we propose the G2P-DDM model, a discrete denoising diffusion architecture for length-varied discrete sequence data, to model the latent prior. To further enhance the quality of pose sequence generation in the discrete space, we present the CodeUnet model to leverage spatial-temporal information. Lastly, we develop a heuristic sequential clustering method to predict variable lengths of pose sequences for corresponding gloss sequences. Our results show that our model outperforms state-of-the-art G2P models on the public SLP evaluation benchmark. For more generated results, please visit our project page: \textcolor{blue}{\url{https://slpdiffusier.github.io/g2p-ddm}}

Related papers

Unifying Autoregressive and Diffusion-Based Sequence Generation [2.3923884480793673]
We present extensions to diffusion-based sequence generation models, blurring the line with autoregressive language models. We introduce hyperschedules, which assign distinct noise schedules to individual token positions. Second, we propose two hybrid token-wise noising processes that interpolate between absorbing and uniform processes, enabling the model to fix past mistakes.
arXiv Detail & Related papers (2025-04-08T20:32:10Z)
Text-Driven Diffusion Model for Sign Language Production [13.671593137551268]
We introduce the hfut-lmc team's solution to the SLRTP Sign Production Challenge. The challenge aims to generate semantically aligned sign language pose sequences from text inputs. Our solution achieves a BLEU-1 score of 20.17, placing second in the challenge.
arXiv Detail & Related papers (2025-03-20T07:45:27Z)
Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration [53.63593099509471]
We propose a scheduler-exploiter S2S-Diffusion paradigm designed to overcome the limitations of existing S2S-Diffusion models. We employ Meta-Exploration to train an additional scheduler model dedicated to scheduling contextualized noise for each sentence. Our exploiter model, an S2S-Diffusion model, leverages the noise scheduled by our scheduler model for updating and generation.
arXiv Detail & Related papers (2024-10-17T04:06:02Z)
IFH: a Diffusion Framework for Flexible Design of Graph Generative Models [53.219279193440734]
Graph generative models can be classified into two prominent families: one-shot models, which generate a graph in one go, and sequential models, which generate a graph by successive additions of nodes and edges. This paper proposes a graph generative model, called Insert-Fill-Halt (IFH), that supports the specification of a sequentiality degree.
arXiv Detail & Related papers (2024-08-23T16:24:40Z)
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion [61.03681839276652]
Diffusion Forcing is a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels. We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens.
arXiv Detail & Related papers (2024-07-01T15:43:25Z)
Discrete Graph Auto-Encoder [52.50288418639075]
We introduce a new framework named Discrete Graph Auto-Encoder (DGAE) We first use a permutation-equivariant auto-encoder to convert graphs into sets of discrete latent node representations. In the second step, we sort the sets of discrete latent representations and learn their distribution with a specifically designed auto-regressive model.
arXiv Detail & Related papers (2023-06-13T12:40:39Z)
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences. We formulate sequence generation as an imitation learning (IL) problem. This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset. Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z)
A Framework for Bidirectional Decoding: Case Study in Morphological Inflection [4.602447284133507]
We propose a framework for decoding sequences from the "outside-in" At each step, the model chooses to generate a token on the left, on the right, or join the left and right sequences. Our model sets state-of-the-art (SOTA) on the 2022 and 2023 shared tasks, beating the next best systems by over 4.7 and 2.7 points in average accuracy respectively.
arXiv Detail & Related papers (2023-05-21T22:08:31Z)
Toeplitz Neural Network for Sequence Modeling [46.04964190407727]
We show that a Toeplitz matrix-vector production trick can reduce the space-time complexity of the sequence modeling to log linear. A lightweight sub-network called relative position encoder is proposed to generate relative position coefficients with a fixed budget of parameters. Despite being trained on 512-token sequences, our model can extrapolate input sequence length up to 14K tokens in inference with consistent performance.
arXiv Detail & Related papers (2023-05-08T14:49:01Z)
Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation [21.886973310718457]
We propose a novel Sequence-to-Action(S2A) module for Grammatical Error Correction. The S2A module jointly takes the source and target sentences as input, and is able to automatically generate a token-level action sequence. Our model consistently outperforms the seq2seq baselines, while being able to significantly alleviate the over-correction problem.
arXiv Detail & Related papers (2022-05-22T17:47:06Z)
A Contextual Latent Space Model: Subsequence Modulation in Melodic Sequence [0.0]
Some generative models for sequences such as music and text allow us to edit only subsequences, given surrounding context sequences. We propose a contextual latent space model (M) in order for users to be able to explore subsequence generation with a sense of direction in the generation space. A context-informed prior and decoder constitute the generative model of CLSM, and a context position-informed is the inference model.
arXiv Detail & Related papers (2021-11-23T07:51:39Z)
Structured Reordering for Modeling Latent Alignments in Sequence Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations. The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.