Related papers: Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning

Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning

URL: http://arxiv.org/abs/2410.14157v3
Date: Tue, 18 Feb 2025 03:52:31 GMT
Title: Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
Authors: Jiacheng Ye, Jiahui Gao, Shansan Gong, Lin Zheng, Xin Jiang, Zhenguo Li, Lingpeng Kong,
Abstract summary: We show how diffusion models learn difficult subgoals that elude autoregressive approaches.<n>We propose Multi-Granularity Diffusion Modeling (MGDM), which prioritizes subgoals based on difficulty during learning.<n>MGDM significantly outperforms autoregressive models without using search techniques.
Score: 89.96284387376119
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autoregressive language models, despite their impressive capabilities, struggle with complex reasoning and long-term planning tasks. We introduce discrete diffusion models as a novel solution to these challenges. Through the lens of subgoal imbalance, we demonstrate how diffusion models effectively learn difficult subgoals that elude autoregressive approaches. We propose Multi-Granularity Diffusion Modeling (MGDM), which prioritizes subgoals based on difficulty during learning. On complex tasks like Countdown, Sudoku, and Boolean Satisfiability Problems, MGDM significantly outperforms autoregressive models without using search techniques. For instance, MGDM achieves 91.5\% and 100\% accuracy on Countdown and Sudoku, respectively, compared to 45.8\% and 20.7\% for autoregressive models. Our work highlights the potential of diffusion-based approaches in advancing AI capabilities for sophisticated language understanding and problem-solving tasks. All associated codes are available at \href{https://github.com/HKUNLP/diffusion-vs-ar}{https://github.com/HKUNLP/diffusion-vs-ar}.

Related papers

Generalized Interpolating Discrete Diffusion [65.74168524007484]
Masked diffusion is a popular choice due to its simplicity and effectiveness. We derive the theoretical backbone of a family of general interpolating discrete diffusion processes. Exploiting GIDD's flexibility, we explore a hybrid approach combining masking and uniform noise.
arXiv Detail & Related papers (2025-03-06T14:30:55Z)
Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions [14.85882273040068]
Masked diffusion models (MDMs) have emerged as a promising alternative approach for generative modeling over discrete domains. We show that adaptive inference can boost solving accuracy in pretrained MDMs from $7$% to $approx 90$%, even outperforming ARMs with $7times$ as many parameters.
arXiv Detail & Related papers (2025-02-10T18:47:21Z)
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression [9.923268972395107]
DiffusionVLA is a framework that seamlessly combines the autoregression model with the diffusion model for learning visuomotor policy. To enhance policy learning through self-reasoning, we introduce a novel reasoning injection module. We conduct extensive experiments using multiple real robots to validate the effectiveness of DiffusionVLA.
arXiv Detail & Related papers (2024-12-04T13:11:38Z)
Scaling Diffusion Language Models via Adaptation from Autoregressive Models [105.70889434492143]
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling. We show that we can convert AR models ranging from 127M to 7B parameters into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training. Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts.
arXiv Detail & Related papers (2024-10-23T14:04:22Z)
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework. Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss. We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z)
Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z)
Adv-KD: Adversarial Knowledge Distillation for Faster Diffusion Sampling [2.91204440475204]
Diffusion Probabilistic Models (DPMs) have emerged as a powerful class of deep generative models. They rely on sequential denoising steps during sample generation. We propose a novel method that integrates denoising phases directly into the model's architecture.
arXiv Detail & Related papers (2024-05-31T08:19:44Z)
Model-Based Diffusion for Trajectory Optimization [8.943418808959494]
We introduce Model-Based Diffusion (MBD), an optimization approach using the diffusion process to solve trajectory optimization (TO) problems without data. Although MBD does not require external data, it can be naturally integrated with data of diverse qualities to steer the diffusion process. MBD outperforms state-of-the-art reinforcement learning and sampling-based TO methods in challenging contact-rich tasks.
arXiv Detail & Related papers (2024-05-28T22:14:25Z)
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models [100.53662473219806]
Diffusion-of-Thought (DoT) is a novel approach that integrates diffusion models with Chain-of-Thought. DoT allows reasoning steps to diffuse over time through a diffusion language model. Our results demonstrate the effectiveness of DoT in multi-digit multiplication, logic, and grade school math problems.
arXiv Detail & Related papers (2024-02-12T16:23:28Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
Eliminating Lipschitz Singularities in Diffusion Models [51.806899946775076]
We show that diffusion models frequently exhibit the infinite Lipschitz near the zero point of timesteps. This poses a threat to the stability and accuracy of the diffusion process, which relies on integral operations. We propose a novel approach, dubbed E-TSDM, which eliminates the Lipschitz of the diffusion model near zero.
arXiv Detail & Related papers (2023-06-20T03:05:28Z)
MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
MADiff is a diffusion-based multi-agent learning framework. It works as both a decentralized policy and a centralized controller. Our experiments demonstrate that MADiff outperforms baseline algorithms across various multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z)
Efficient Diffusion Models for Vision: A Survey [34.610299976294904]
Diffusion Models (DMs) have demonstrated state-of-the-art performance in content generation without requiring adversarial training. DMs are inspired by non-equilibrium thermodynamics and have inherent high computational complexity. DMs incur considerable computational overhead during both training and inference stages.
arXiv Detail & Related papers (2022-10-07T06:46:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.