G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete
Diffusion Model
- URL: http://arxiv.org/abs/2208.09141v3
- Date: Mon, 18 Dec 2023 16:45:30 GMT
- Title: G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete
Diffusion Model
- Authors: Pan Xie, Qipeng Zhang, Taiyi Peng, Hao Tang, Yao Du, Zexian Li
- Abstract summary: The Sign Language Production project aims to automatically translate spoken languages into sign sequences.
We present a novel solution by converting the continuous pose space generation problem into a discrete sequence generation problem.
Our results show that our model outperforms state-of-the-art G2P models on the public SLP evaluation benchmark.
- Score: 8.047896755805981
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Sign Language Production (SLP) project aims to automatically translate
spoken languages into sign sequences. Our approach focuses on the
transformation of sign gloss sequences into their corresponding sign pose
sequences (G2P). In this paper, we present a novel solution for this task by
converting the continuous pose space generation problem into a discrete
sequence generation problem. We introduce the Pose-VQVAE framework, which
combines Variational Autoencoders (VAEs) with vector quantization to produce a
discrete latent representation for continuous pose sequences. Additionally, we
propose the G2P-DDM model, a discrete denoising diffusion architecture for
length-varied discrete sequence data, to model the latent prior. To further
enhance the quality of pose sequence generation in the discrete space, we
present the CodeUnet model to leverage spatial-temporal information. Lastly, we
develop a heuristic sequential clustering method to predict variable lengths of
pose sequences for corresponding gloss sequences. Our results show that our
model outperforms state-of-the-art G2P models on the public SLP evaluation
benchmark. For more generated results, please visit our project page:
\textcolor{blue}{\url{https://slpdiffusier.github.io/g2p-ddm}}
Related papers
- Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration [53.63593099509471]
We propose a scheduler-exploiter S2S-Diffusion paradigm designed to overcome the limitations of existing S2S-Diffusion models.
We employ Meta-Exploration to train an additional scheduler model dedicated to scheduling contextualized noise for each sentence.
Our exploiter model, an S2S-Diffusion model, leverages the noise scheduled by our scheduler model for updating and generation.
arXiv Detail & Related papers (2024-10-17T04:06:02Z) - IFH: a Diffusion Framework for Flexible Design of Graph Generative Models [53.219279193440734]
Graph generative models can be classified into two prominent families: one-shot models, which generate a graph in one go, and sequential models, which generate a graph by successive additions of nodes and edges.
This paper proposes a graph generative model, called Insert-Fill-Halt (IFH), that supports the specification of a sequentiality degree.
arXiv Detail & Related papers (2024-08-23T16:24:40Z) - Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion [61.03681839276652]
Diffusion Forcing is a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels.
We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens.
arXiv Detail & Related papers (2024-07-01T15:43:25Z) - Discrete Graph Auto-Encoder [52.50288418639075]
We introduce a new framework named Discrete Graph Auto-Encoder (DGAE)
We first use a permutation-equivariant auto-encoder to convert graphs into sets of discrete latent node representations.
In the second step, we sort the sets of discrete latent representations and learn their distribution with a specifically designed auto-regressive model.
arXiv Detail & Related papers (2023-06-13T12:40:39Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - A Framework for Bidirectional Decoding: Case Study in Morphological
Inflection [4.602447284133507]
We propose a framework for decoding sequences from the "outside-in"
At each step, the model chooses to generate a token on the left, on the right, or join the left and right sequences.
Our model sets state-of-the-art (SOTA) on the 2022 and 2023 shared tasks, beating the next best systems by over 4.7 and 2.7 points in average accuracy respectively.
arXiv Detail & Related papers (2023-05-21T22:08:31Z) - Toeplitz Neural Network for Sequence Modeling [46.04964190407727]
We show that a Toeplitz matrix-vector production trick can reduce the space-time complexity of the sequence modeling to log linear.
A lightweight sub-network called relative position encoder is proposed to generate relative position coefficients with a fixed budget of parameters.
Despite being trained on 512-token sequences, our model can extrapolate input sequence length up to 14K tokens in inference with consistent performance.
arXiv Detail & Related papers (2023-05-08T14:49:01Z) - Sequence-to-Action: Grammatical Error Correction with Action Guided
Sequence Generation [21.886973310718457]
We propose a novel Sequence-to-Action(S2A) module for Grammatical Error Correction.
The S2A module jointly takes the source and target sentences as input, and is able to automatically generate a token-level action sequence.
Our model consistently outperforms the seq2seq baselines, while being able to significantly alleviate the over-correction problem.
arXiv Detail & Related papers (2022-05-22T17:47:06Z) - A Contextual Latent Space Model: Subsequence Modulation in Melodic
Sequence [0.0]
Some generative models for sequences such as music and text allow us to edit only subsequences, given surrounding context sequences.
We propose a contextual latent space model (M) in order for users to be able to explore subsequence generation with a sense of direction in the generation space.
A context-informed prior and decoder constitute the generative model of CLSM, and a context position-informed is the inference model.
arXiv Detail & Related papers (2021-11-23T07:51:39Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.