Related papers: GTS: Inference-Time Scaling of Latent Reasoning with a Learnable Gaussian Thought Sampler

GTS: Inference-Time Scaling of Latent Reasoning with a Learnable Gaussian Thought Sampler

URL: http://arxiv.org/abs/2602.14077v1
Date: Sun, 15 Feb 2026 09:57:47 GMT
Title: GTS: Inference-Time Scaling of Latent Reasoning with a Learnable Gaussian Thought Sampler
Authors: Minghan Wang, Ye Bai, Thuy-Trang Vu, Ehsan Shareghi, Gholamreza Haffari,
Abstract summary: We model latent thought exploration as conditional sampling from learnable densities and instantiate this idea as a Gaussian Thought Sampler (GTS)<n>GTS predicts context-dependent perturbation distributions over continuous reasoning states and is trained with GRPO-style policy optimization while keeping the backbone frozen.
Score: 54.10960908347221
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Inference-time scaling (ITS) in latent reasoning models typically introduces stochasticity through heuristic perturbations, such as dropout or fixed Gaussian noise. While these methods increase trajectory diversity, their exploration behavior is not explicitly modeled and can be inefficient under finite sampling budgets. We observe that stronger perturbations do not necessarily translate into more effective candidate trajectories, as unguided noise may disrupt internal decision structure rather than steer it. To provide a more structured alternative, we model latent thought exploration as conditional sampling from learnable densities and instantiate this idea as a Gaussian Thought Sampler (GTS). GTS predicts context-dependent perturbation distributions over continuous reasoning states and is trained with GRPO-style policy optimization while keeping the backbone frozen. Experiments on GSM8K with two latent reasoning architectures show that GTS achieves more reliable inference-time scaling than heuristic baselines. These findings indicate that improving latent ITS requires structured and optimizable exploration mechanisms rather than simply amplifying stochasticity.

Related papers

Generative Bayesian Filtering and Parameter Learning [0.0]
Generative Bayesian Filtering (GBF) provides a powerful framework for performing posterior inference in complex nonlinear and non-Gaussian state-space models.<n>GBF does not require explicit density evaluations, making it particularly effective when observation or transition distributions are analytically intractable.<n>We introduce the Generative-Gibbs sampler, which bypasses explicit density evaluation by iteratively sampling each variable from its implicit full conditional distribution.
arXiv Detail & Related papers (2025-11-06T17:04:48Z)
Parallel Test-Time Scaling for Latent Reasoning Models [58.428340345068214]
Parallel test-time scaling (TTS) is a pivotal approach for enhancing large language models (LLMs)<n>Recent advances in latent reasoning, where intermediate reasoning unfolds in continuous vector spaces, offer a more efficient alternative to explicit Chain-of-Thought.<n>This work enables parallel TTS for latent reasoning models by addressing the above issues.
arXiv Detail & Related papers (2025-10-09T03:33:00Z)
G$^2$RPO: Granular GRPO for Precise Reward in Flow Models [74.21206048155669]
We propose a novel Granular-GRPO (G$2$RPO) framework that achieves precise and comprehensive reward assessments of sampling directions.<n>We introduce a Multi-Granularity Advantage Integration module that aggregates advantages computed at multiple diffusion scales.<n>Our G$2$RPO significantly outperforms existing flow-based GRPO baselines.
arXiv Detail & Related papers (2025-10-02T12:57:12Z)
Locally Adaptive Conformal Inference for Operator Models [5.78532405664684]
We introduce Local Sliced Conformal Inference (LSCI), a distribution-free framework for generating function-valued locally adaptive prediction sets for operator models.<n>We prove finite-sample validity and derive a data-dependent upper bound on the coverage gap under local exchangeability.<n>We empirically demonstrate spaces against biased predictions and certain out-of-distribution noise regimes.
arXiv Detail & Related papers (2025-07-28T16:37:56Z)
Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z)
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL [20.177871969184004]
Chain-of-thought (CoT) reasoning can be formalized as a latent variable problem, where the model needs to generate intermediate reasoning steps.<n>Prior approaches such as iterative reward-ranked fine-tuning fail to account for variability in difficulty and convergence behavior.<n>We propose GVMRAFT, a prompt-specific Dynamic Sample Allocation Strategy to minimize gradient variance under a computational budget constraint.
arXiv Detail & Related papers (2025-05-05T06:26:00Z)
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling. We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z)
Training Discrete Deep Generative Models via Gapped Straight-Through Estimator [72.71398034617607]
We propose a Gapped Straight-Through ( GST) estimator to reduce the variance without incurring resampling overhead. This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax. Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks.
arXiv Detail & Related papers (2022-06-15T01:46:05Z)
BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery [97.79015388276483]
A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG) Recent advances enabled effective maximum-likelihood point estimation of DAGs from observational data. We propose BCD Nets, a variational framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM.
arXiv Detail & Related papers (2021-12-06T03:35:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.