Related papers: BAD: Bidirectional Auto-regressive Diffusion for Text-to-Motion Generation

BAD: Bidirectional Auto-regressive Diffusion for Text-to-Motion Generation

URL: http://arxiv.org/abs/2409.10847v1
Date: Tue, 17 Sep 2024 02:28:19 GMT
Title: BAD: Bidirectional Auto-regressive Diffusion for Text-to-Motion Generation
Authors: S. Rohollah Hosseyni, Ali Ahmad Rahmani, S. Jamal Seyedmohammadi, Sanaz Seyedin, Arash Mohammadi,
Abstract summary: Bidirectional Autoregressive Diffusion (BAD) is a novel approach that unifies the strengths of autoregressive and mask-based generative models. BAD utilizes a permutation-based corruption technique that preserves the natural sequence structure while enforcing causal dependencies. Comprehensive experiments show that BAD outperforms autoregressive and mask-based models in text-to-motion generation.
Score: 4.945357788617835
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autoregressive models excel in modeling sequential dependencies by enforcing causal constraints, yet they struggle to capture complex bidirectional patterns due to their unidirectional nature. In contrast, mask-based models leverage bidirectional context, enabling richer dependency modeling. However, they often assume token independence during prediction, which undermines the modeling of sequential dependencies. Additionally, the corruption of sequences through masking or absorption can introduce unnatural distortions, complicating the learning process. To address these issues, we propose Bidirectional Autoregressive Diffusion (BAD), a novel approach that unifies the strengths of autoregressive and mask-based generative models. BAD utilizes a permutation-based corruption technique that preserves the natural sequence structure while enforcing causal dependencies through randomized ordering, enabling the effective capture of both sequential and bidirectional relationships. Comprehensive experiments show that BAD outperforms autoregressive and mask-based models in text-to-motion generation, suggesting a novel pre-training strategy for sequence modeling. The codebase for BAD is available on https://github.com/RohollahHS/BAD.

Related papers

Auto-Regressive Masked Diffusion Models [9.239507801466322]
Masked diffusion models (MDMs) have emerged as a promising approach for language modeling.<n>They face a performance gap compared to autoregressive models (ARMs) and require more training iterations.<n>We present the Auto-Regressive Masked Diffusion model, which unifies the training efficiency of autoregressive models with the parallel generation capabilities of diffusion-based models.
arXiv Detail & Related papers (2026-01-23T18:42:30Z)
Autoregressive Models Rival Diffusion Models at ANY-ORDER Generation [35.63237650402896]
We propose Any-order Any-subset Autoregressive modeling (A3)<n>A3 is a framework that extends the standard AR factorization to arbitrary token groups and generation orders.<n> Experiments on question answering, commonsense reasoning, and story infilling demonstrate that A3 outperforms diffusion-based models.
arXiv Detail & Related papers (2026-01-19T17:03:48Z)
Bridging the Discrete-Continuous Gap: Unified Multimodal Generation via Coupled Manifold Discrete Absorbing Diffusion [60.186310080523135]
Bifurcation of generative modeling into autoregressive approaches for discrete data (text) and diffusion approaches for continuous data (images) hinders development of truly unified multimodal systems.<n>We propose textbfCoM-DAD, a novel probabilistic framework that reformulates multimodal generation as a hierarchical dual-process.<n>Our method demonstrates superior stability over standard masked modeling, establishing a new paradigm for scalable, unified text-image generation.
arXiv Detail & Related papers (2026-01-07T16:21:19Z)
Hybrid Autoregressive-Diffusion Model for Real-Time Streaming Sign Language Production [0.0]
We introduce a hybrid approach combining autoregressive and diffusion models to generate Sign Language Production (SLP) models.<n>To capture fine-grained body movements, we design a Multi-Scale Pose Representation module that separately extracts detailed features from distinct arttors.<n>We also introduce a Confidence-Aware Causal Attention mechanism that utilizes joint-level confidence scores to dynamically guide the pose generation process.
arXiv Detail & Related papers (2025-07-12T01:34:50Z)
CtrlDiff: Boosting Large Diffusion Language Models with Dynamic Block Prediction and Controllable Generation [7.250878248686215]
Diffusion-based language models have emerged as a compelling alternative due to their powerful parallel generation capabilities and inherent editability.<n>We propose CtrlDiff, a dynamic and controllable semi-autoregressive framework that adaptively determines the size of each generation block based on local semantics.
arXiv Detail & Related papers (2025-05-20T14:52:41Z)
BBQRec: Behavior-Bind Quantization for Multi-Modal Sequential Recommendation [15.818669767036592]
We propose a Behavior-Bind multi-modal Quantization for Sequential Recommendation (BBQRec) featuring dual-aligned quantization and semantics-aware sequence modeling. BBQRec disentangles modality-agnostic behavioral patterns from noisy modality-specific features through contrastive codebook learning. We design a discretized similarity reweighting mechanism that dynamically adjusts self-attention scores using quantized semantic relationships.
arXiv Detail & Related papers (2025-04-09T07:19:48Z)
Unifying Autoregressive and Diffusion-Based Sequence Generation [2.3923884480793673]
We present extensions to diffusion-based sequence generation models, blurring the line with autoregressive language models. We introduce hyperschedules, which assign distinct noise schedules to individual token positions. Second, we propose two hybrid token-wise noising processes that interpolate between absorbing and uniform processes, enabling the model to fix past mistakes.
arXiv Detail & Related papers (2025-04-08T20:32:10Z)
Learning-Order Autoregressive Models with Application to Molecular Graph Generation [52.44913282062524]
We introduce a variant of ARM that generates high-dimensional data using a probabilistic ordering that is sequentially inferred from data. We demonstrate experimentally that our method can learn meaningful autoregressive orderings in image and graph generation.
arXiv Detail & Related papers (2025-03-07T23:24:24Z)
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference. Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable. We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z)
Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. We aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z)
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs) We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z)
Non-autoregressive Sequence-to-Sequence Vision-Language Models [63.77614880533488]
We propose a parallel decoding sequence-to-sequence vision-language model that marginalizes over multiple inference paths in the decoder. The model achieves performance on-par with its state-of-the-art autoregressive counterpart, but is faster at inference time.
arXiv Detail & Related papers (2024-03-04T17:34:59Z)
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences. We formulate sequence generation as an imitation learning (IL) problem. This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset. Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z)
Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models. We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples. We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.