Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models
- URL: http://arxiv.org/abs/2402.07754v3
- Date: Thu, 05 Dec 2024 06:49:06 GMT
- Title: Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models
- Authors: Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong,
- Abstract summary: Diffusion-of-Thought (DoT) is a novel approach that integrates diffusion models with Chain-of-Thought.
DoT allows reasoning steps to diffuse over time through a diffusion language model.
Our results demonstrate the effectiveness of DoT in multi-digit multiplication, logic, and grade school math problems.
- Score: 100.53662473219806
- License:
- Abstract: Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models. In this work, we propose Diffusion-of-Thought (DoT), a novel approach that integrates diffusion models with Chain-of-Thought, a well-established technique for improving the reasoning ability of autoregressive language models. In contrast to autoregressive language models that make decisions in a left-to-right, token-by-token manner, DoT allows reasoning steps to diffuse over time through a diffusion language model and offers greater flexibility in trading-off computation for reasoning performance. Our experimental results demonstrate the effectiveness of DoT in multi-digit multiplication, boolean logic, and grade school math problems, with a small diffusion model outperforming a much larger autoregressive model in both efficiency and accuracy. In addition to that, DoT showcases promising self-correction abilities and benefits from existing reasoning-enhancing techniques like self-consistency decoding. Our findings contribute to the understanding and development of reasoning with diffusion language models.
Related papers
- DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation [24.85655658070008]
Diffusion Transformer Autoregressive Modeling (DiTAR) is a patch-based autoregressive framework combining a language model with a diffusion transformer.
In zero-shot speech generation, DiTAR achieves state-of-the-art performance in robustness, speaker similarity, and naturalness.
arXiv Detail & Related papers (2025-02-06T10:09:49Z) - ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer [95.80384464922147]
Continuous visual generation requires the full-sequence diffusion-based approach.
We present ACDiT, an Autoregressive blockwise Conditional Diffusion Transformer.
We demonstrate that ACDiT can be seamlessly used in visual understanding tasks despite being trained on the diffusion objective.
arXiv Detail & Related papers (2024-12-10T18:13:20Z) - Text Diffusion with Reinforced Conditioning [92.17397504834825]
This paper thoroughly analyzes text diffusion models and uncovers two significant limitations: degradation of self-conditioning during training and misalignment between training and sampling.
Motivated by our findings, we propose a novel Text Diffusion model called TREC, which mitigates the degradation with Reinforced Conditioning and the misalignment by Time-Aware Variance Scaling.
arXiv Detail & Related papers (2024-02-19T09:24:02Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - Likelihood-Based Diffusion Language Models [13.916640262862215]
We take the first steps towards closing the likelihood gap between autoregressive and diffusion-based language models.
We pursue this goal through algorithmic improvements, scaling laws, and increased compute.
We release Plaid 1B, a large diffusion language model which outperforms GPT-2 124M in likelihood on benchmark datasets.
arXiv Detail & Related papers (2023-05-30T16:43:31Z) - A Survey of Diffusion Models in Natural Language Processing [11.233768932957771]
Diffusion models capture the diffusion of information or signals across a network or manifold.
This paper discusses the different formulations of diffusion models used in NLP, their strengths and limitations, and their applications.
arXiv Detail & Related papers (2023-05-24T03:25:32Z) - DiffusionBERT: Improving Generative Masked Language Models with
Diffusion Models [81.84866217721361]
DiffusionBERT is a new generative masked language model based on discrete diffusion models.
We propose a new noise schedule for the forward diffusion process that controls the degree of noise added at each step.
Experiments on unconditional text generation demonstrate that DiffusionBERT achieves significant improvement over existing diffusion models for text.
arXiv Detail & Related papers (2022-11-28T03:25:49Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.