T-SCEND: Test-time Scalable MCTS-enhanced Diffusion Model
- URL: http://arxiv.org/abs/2502.01989v2
- Date: Wed, 05 Feb 2025 02:38:55 GMT
- Title: T-SCEND: Test-time Scalable MCTS-enhanced Diffusion Model
- Authors: Tao Zhang, Jia-Shu Pan, Ruiqi Feng, Tailin Wu,
- Abstract summary: Test-time Scalable MCTS-enhanced Diffusion Model (T-SCEND) is a novel framework that significantly improves diffusion model's reasoning capabilities.
T-SCEND integrates the denoising process with a novel hybrid Monte Carlo Tree Search.
We demonstrate the effectiveness of T-SCEND's training objective and scalable inference method.
- Score: 7.250494262573953
- License:
- Abstract: We introduce Test-time Scalable MCTS-enhanced Diffusion Model (T-SCEND), a novel framework that significantly improves diffusion model's reasoning capabilities with better energy-based training and scaling up test-time computation. We first show that na\"ively scaling up inference budget for diffusion models yields marginal gain. To address this, the training of T-SCEND consists of a novel linear-regression negative contrastive learning objective to improve the performance-energy consistency of the energy landscape, and a KL regularization to reduce adversarial sampling. During inference, T-SCEND integrates the denoising process with a novel hybrid Monte Carlo Tree Search (hMCTS), which sequentially performs best-of-N random search and MCTS as denoising proceeds. On challenging reasoning tasks of Maze and Sudoku, we demonstrate the effectiveness of T-SCEND's training objective and scalable inference method. In particular, trained with Maze sizes of up to $6\times6$, our T-SCEND solves $88\%$ of Maze problems with much larger sizes of $15\times15$, while standard diffusion completely fails. Code to reproduce the experiments can be found at https://github.com/AI4Science-WestlakeU/t_scend.
Related papers
- Monte Carlo Tree Diffusion for System 2 Planning [57.50512800900167]
We introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of Monte Carlo Tree Search (MCTS)
MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework.
arXiv Detail & Related papers (2025-02-11T02:51:42Z) - Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning [32.45574194957491]
We show that training with cross-entropy loss can be misaligned with pass@N in that pass@N accuracy $it decreases$ with longer training.
We suggest a principled, modified training loss that is better aligned to pass@N by limiting model confidence and rescuing pass@N test performance.
arXiv Detail & Related papers (2025-02-11T00:33:31Z) - Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling [52.34735382627312]
Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.
Existing approaches mainly rely on imitation learning and struggle to achieve effective test-time scaling.
We present T1 to scale reinforcement learning by encouraging exploration and understand inference scaling.
arXiv Detail & Related papers (2025-01-20T18:33:33Z) - Mitigating Embedding Collapse in Diffusion Models for Categorical Data [52.90687881724333]
We introduce CATDM, a continuous diffusion framework within the embedding space that stabilizes training.
Experiments on benchmarks show that CATDM mitigates embedding collapse, yielding superior results on FFHQ, LSUN Churches, and LSUN Bedrooms.
arXiv Detail & Related papers (2024-10-18T09:12:33Z) - Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo [55.452453947359736]
We introduce a novel verification method based on Twisted Sequential Monte Carlo (TSMC)
We apply TSMC to Large Language Models by estimating the expected future rewards at partial solutions.
This approach results in a more straightforward training target that eliminates the need for step-wise human annotations.
arXiv Detail & Related papers (2024-10-02T18:17:54Z) - Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution [67.9215891673174]
We propose score entropy as a novel loss that naturally extends score matching to discrete spaces.
We test our Score Entropy Discrete Diffusion models on standard language modeling tasks.
arXiv Detail & Related papers (2023-10-25T17:59:12Z) - Learning Energy-Based Prior Model with Diffusion-Amortized MCMC [89.95629196907082]
Common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progress.
We introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it.
arXiv Detail & Related papers (2023-10-05T00:23:34Z) - Optimizing DDPM Sampling with Shortcut Fine-Tuning [16.137936204766692]
Shortcut Fine-Tuning (SFT) is a new approach for addressing the challenge of fast sampling of pretrained Denoising Diffusion Probabilistic Models (DDPMs)
SFT advocates for the fine-tuning of DDPM samplers through the direct minimization of Integral Probability Metrics (IPM)
Inspired by a control perspective, we propose a new algorithm SFT-PG: Shortcut Fine-Tuning with Policy Gradient.
arXiv Detail & Related papers (2023-01-31T01:37:48Z) - Mitigating Catastrophic Forgetting in Scheduled Sampling with Elastic
Weight Consolidation in Neural Machine Translation [15.581515781839656]
Autoregressive models trained with maximum likelihood estimation suffer from exposure bias.
We propose using Elastic Weight Consolidation as trade-off between mitigating exposure bias and retaining output quality.
Experiments on two IWSLT'14 translation tasks demonstrate that our approach alleviates catastrophic forgetting and significantly improves BLEU.
arXiv Detail & Related papers (2021-09-13T20:37:58Z) - On Minimum Word Error Rate Training of the Hybrid Autoregressive
Transducer [40.63693071222628]
We study the minimum word error rate (MWER) training of Hybrid Autoregressive Transducer (HAT)
From experiments with around 30,000 hours of training data, we show that MWER training can improve the accuracy of HAT models.
arXiv Detail & Related papers (2020-10-23T21:16:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.