Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
- URL: http://arxiv.org/abs/2410.14157v3
- Date: Tue, 18 Feb 2025 03:52:31 GMT
- Title: Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
- Authors: Jiacheng Ye, Jiahui Gao, Shansan Gong, Lin Zheng, Xin Jiang, Zhenguo Li, Lingpeng Kong,
- Abstract summary: We show how diffusion models learn difficult subgoals that elude autoregressive approaches.
We propose Multi-Granularity Diffusion Modeling (MGDM), which prioritizes subgoals based on difficulty during learning.
MGDM significantly outperforms autoregressive models without using search techniques.
- Score: 89.96284387376119
- License:
- Abstract: Autoregressive language models, despite their impressive capabilities, struggle with complex reasoning and long-term planning tasks. We introduce discrete diffusion models as a novel solution to these challenges. Through the lens of subgoal imbalance, we demonstrate how diffusion models effectively learn difficult subgoals that elude autoregressive approaches. We propose Multi-Granularity Diffusion Modeling (MGDM), which prioritizes subgoals based on difficulty during learning. On complex tasks like Countdown, Sudoku, and Boolean Satisfiability Problems, MGDM significantly outperforms autoregressive models without using search techniques. For instance, MGDM achieves 91.5\% and 100\% accuracy on Countdown and Sudoku, respectively, compared to 45.8\% and 20.7\% for autoregressive models. Our work highlights the potential of diffusion-based approaches in advancing AI capabilities for sophisticated language understanding and problem-solving tasks. All associated codes are available at \href{https://github.com/HKUNLP/diffusion-vs-ar}{https://github.com/HKUNLP/diffusion-vs-ar}.
Related papers
- Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions [14.85882273040068]
Masked diffusion models (MDMs) have emerged as a promising alternative approach for generative modeling over discrete domains.
We show that adaptive inference can boost solving accuracy in pretrained MDMs from $7$% to $approx 90$%, even outperforming ARMs with $7times$ as many parameters.
arXiv Detail & Related papers (2025-02-10T18:47:21Z) - ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer [95.80384464922147]
Continuous visual generation requires the full-sequence diffusion-based approach.
We present ACDiT, an Autoregressive blockwise Conditional Diffusion Transformer.
We demonstrate that ACDiT can be seamlessly used in visual understanding tasks despite being trained on the diffusion objective.
arXiv Detail & Related papers (2024-12-10T18:13:20Z) - Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression [9.923268972395107]
DiffusionVLA is a framework that seamlessly combines the autoregression model with the diffusion model for learning visuomotor policy.
To enhance policy learning through self-reasoning, we introduce a novel reasoning injection module.
We conduct extensive experiments using multiple real robots to validate the effectiveness of DiffusionVLA.
arXiv Detail & Related papers (2024-12-04T13:11:38Z) - Scaling Diffusion Language Models via Adaptation from Autoregressive Models [105.70889434492143]
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling.
We show that we can convert AR models ranging from 127M to 7B parameters into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training.
Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts.
arXiv Detail & Related papers (2024-10-23T14:04:22Z) - Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models [100.53662473219806]
Diffusion-of-Thought (DoT) is a novel approach that integrates diffusion models with Chain-of-Thought.
DoT allows reasoning steps to diffuse over time through a diffusion language model.
Our results demonstrate the effectiveness of DoT in multi-digit multiplication, logic, and grade school math problems.
arXiv Detail & Related papers (2024-02-12T16:23:28Z) - MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
MADiff is a diffusion-based multi-agent learning framework.
It works as both a decentralized policy and a centralized controller.
Our experiments demonstrate that MADiff outperforms baseline algorithms across various multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.