AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
- URL: http://arxiv.org/abs/2505.24863v1
- Date: Fri, 30 May 2025 17:58:36 GMT
- Title: AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
- Authors: Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, Huan Zhang,
- Abstract summary: $alpha$1 first introduces $alpha$ moment, which represents the scaled thinking phase with a universal parameter $alpha$.<n>After the $alpha$ moment, $alpha$1 deterministically terminates slow thinking with the end-of-thinking token.<n>This approach unifies and generalizes existing monotonic scaling methods by enabling flexible and dense slow-to-fast reasoning modulation.
- Score: 52.56648646336559
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents AlphaOne ($\alpha$1), a universal framework for modulating reasoning progress in large reasoning models (LRMs) at test time. $\alpha$1 first introduces $\alpha$ moment, which represents the scaled thinking phase with a universal parameter $\alpha$. Within this scaled pre-$\alpha$ moment phase, it dynamically schedules slow thinking transitions by modeling the insertion of reasoning transition tokens as a Bernoulli stochastic process. After the $\alpha$ moment, $\alpha$1 deterministically terminates slow thinking with the end-of-thinking token, thereby fostering fast reasoning and efficient answer generation. This approach unifies and generalizes existing monotonic scaling methods by enabling flexible and dense slow-to-fast reasoning modulation. Extensive empirical studies on various challenging benchmarks across mathematical, coding, and scientific domains demonstrate $\alpha$1's superior reasoning capability and efficiency. Project page: https://alphaone-project.github.io/
Related papers
- Scaling Speculative Decoding with Lookahead Reasoning [11.349400331288257]
Token-level speculative decoding (SD) helps, but its benefit is capped.<n>We develop Lookahead Reasoning, which exploits a second, step-level layer of parallelism.<n>Lookahead Reasoning improves the speedup of SD from 1.4x to 2.1x while preserving answer quality.
arXiv Detail & Related papers (2025-06-24T17:48:10Z) - Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute [57.16286134405821]
We propose Fractional Reasoning, a framework that enables continuous control over reasoning intensity at inference time.<n>Our method operates by extracting the latent steering vector associated with deeper reasoning and reapplying it with a tunable scaling factor.<n> Experiments on GSM8K, MATH500, and GPQA demonstrate that Fractional Reasoning consistently improves performance across diverse reasoning tasks and models.
arXiv Detail & Related papers (2025-06-18T21:15:59Z) - Z1: Efficient Test-time Scaling with Code [26.374317704720234]
Large Language Models (LLMs) can achieve enhanced complex problem-solving through test-time computing scaling.<n>We propose an efficient test-time scaling method that trains LLMs on code-related reasoning trajectories.<n>We present a novel Shifted Thinking Window to mitigate overthinking overhead.
arXiv Detail & Related papers (2025-04-01T14:01:50Z) - An Empirical Study of $μ$P Learning Rate Transfer [0.0]
We show that the $mu$-Transfer method can yield near-optimal learning rates in practice.<n>Despite its evident promise, the $mu$P method is not yet widely adopted.
arXiv Detail & Related papers (2024-04-08T17:59:44Z) - SpecTr: Fast Speculative Decoding via Optimal Transport [30.18181671899423]
We develop a new autoregressive sampling algorithm called $textitSpecTr$, which provides speedup in decoding while ensuring that there is no quality degradation in the decoded output.
We experimentally demonstrate that for state-of-the-art large language models, the proposed approach achieves a wall clock speedup of 2.13X, a further 1.37X speedup over speculative decoding on standard benchmarks.
arXiv Detail & Related papers (2023-10-23T17:47:34Z) - Think before you speak: Training Language Models With Pause Tokens [73.61375226378712]
Language models generate responses by producing a series of tokens in immediate succession.
What if instead we were to let the model manipulate say, $K+10$ hidden vectors, before it outputs the $(K+1)th$ token?
We operationalize this idea by performing training and inference on language models with a (learnable) $textitpause$ token.
arXiv Detail & Related papers (2023-10-03T17:32:41Z) - Acting in Delayed Environments with Non-Stationary Markov Policies [57.52103323209643]
We introduce a framework for learning and planning in MDPs where the decision-maker commits actions that are executed with a delay of $m$ steps.
We prove that with execution delay, deterministic Markov policies in the original state-space are sufficient for attaining maximal reward, but need to be non-stationary.
We devise a non-stationary Q-learning style model-based algorithm that solves delayed execution tasks without resorting to state-augmentation.
arXiv Detail & Related papers (2021-01-28T13:35:37Z) - Improving Robustness and Generality of NLP Models Using Disentangled
Representations [62.08794500431367]
Supervised neural networks first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$.
We present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning.
We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
arXiv Detail & Related papers (2020-09-21T02:48:46Z) - A new regret analysis for Adam-type algorithms [78.825194932103]
In theory, regret guarantees for online convex optimization require a rapidly decaying $beta_1to0$ schedule.
We propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant $beta_1$.
arXiv Detail & Related papers (2020-03-21T19:19:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.