LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning
- URL: http://arxiv.org/abs/2512.05325v1
- Date: Fri, 05 Dec 2025 00:04:42 GMT
- Title: LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning
- Authors: Ömer Faruk Akgül, Yusuf Hakan Kalaycı, Rajgopal Kannan, Willie Neiswanger, Viktor Prasanna,
- Abstract summary: LYNX is an online early-exit mechanism that turns a model's own hidden-state awareness into confidence-controlled stopping decisions.<n>We train and calibrate this probe once on a generic mathematical corpus and reuse it unchanged across benchmarks, decoding temperatures, and even non-mathematical tasks.
- Score: 15.597220136913258
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large reasoning models achieve strong performance on complex tasks by generating extended chains of thought, but they often "overthink": continuing to reason long after they have enough information to answer correctly. This wastes inference-time compute and can hurt accuracy. Existing attempts to stop early either manipulate decoding with extra sampling and heuristics, rely on auxiliary verifier models, or operate only as post-hoc analysis pipelines without formal guarantees. We introduce LYNX, an online early-exit mechanism that turns a model's own hidden-state awareness into confidence-controlled stopping decisions. LYNX attaches exit decisions to naturally occurring reasoning cues (e.g., "hmm", "wait") during generation, trains a lightweight probe on hidden states at those cue tokens using supervision from forced exits, and wraps the resulting scores in split conformal prediction to obtain distribution-free control over premature exits. Crucially, we train and calibrate this probe once on a generic mathematical corpus and reuse it unchanged across benchmarks, decoding temperatures, and even non-mathematical tasks. Across three model families spanning 1.5B to 32B parameters, a single mathematically trained probe per base model yields strong accuracy--efficiency tradeoffs. On GSM8K, LYNX matches or improves baseline accuracy while reducing tokens by 40--65\%; on MATH-500 it improves accuracy by up to 12 points with roughly 35--60\% fewer tokens; on AIME 2024 it recovers baseline accuracy with more than 50\% token savings; and on CommonsenseQA, a non-math benchmark, it transfers zero-shot with modest accuracy gains and up to 70\% fewer tokens. Compared to state-of-the-art early-exit methods, LYNX offers competitive or superior Pareto frontiers while remaining fully online, requiring no proxy models at inference, and providing explicit, user-tunable confidence guarantees.
Related papers
- Leap+Verify: Regime-Adaptive Speculative Weight Prediction for Accelerating Neural Network Training [0.0]
We introduce Leap+Verify to accelerate neural network training.<n>It applies speculative execution -- predicting future model weights and validating predictions before acceptance.<n>We evaluate Leap+Verify on GPT-2 124M and Qwen 2.5-1.5B trained on WikiText-103 across five random seeds.
arXiv Detail & Related papers (2026-02-23T08:01:44Z) - CoRefine: Confidence-Guided Self-Refinement for Adaptive Test-Time Compute [10.548368675645403]
CoRefine is a confidence-guided self-refinement method that achieves competitive accuracy using a fraction of the tokens.<n>The controller consumes full-trace confidence to decide whether to halt, re-examine, or try a different approach.<n>We extend this to CoRefine-Tree, a hybrid sequential-parallel variant that adaptively balances exploration and exploitation.
arXiv Detail & Related papers (2026-02-09T17:44:41Z) - ZIP-RC: Optimizing Test-Time Compute via Zero-Overhead Joint Reward-Cost Prediction [57.799425838564]
We present ZIP-RC, an adaptive inference method that equips models with zero-overhead inference-time predictions of reward and cost.<n> ZIP-RC improves accuracy by up to 12% over majority voting at equal or lower average cost.
arXiv Detail & Related papers (2025-12-01T09:44:31Z) - LaSeR: Reinforcement Learning with Last-Token Self-Rewarding [54.72617309922891]
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a core paradigm for enhancing the reasoning capabilities of Large Language Models (LLMs)<n>Previous practice requires the LLM to sequentially generate solutions and self-verifications using two separate prompt templates, which significantly reduces efficiency.<n>We propose LaSeR (Reinforcement Learning with Last-Token Self-Rewarding), an algorithm that simply augments the original RLVR loss with a MSE loss.
arXiv Detail & Related papers (2025-10-16T17:55:11Z) - Think Just Enough: Sequence-Level Entropy as a Confidence Signal for LLM Reasoning [5.37133760455631]
We introduce a novel entropy-based framework to drive token efficiency in large language models during reasoning tasks.<n>Our approach uses Shannon entropy from token-level logprobs as a confidence signal to enable early stopping.<n>We show that entropy-based confidence calibration represents an emergent property of advanced post-training optimization.
arXiv Detail & Related papers (2025-10-09T12:33:16Z) - CLUE: Non-parametric Verification from Experience via Hidden-State Clustering [64.50919789875233]
We show that correctness of a solution is encoded as a geometrically separable signature within the trajectory of hidden activations.<n>ClUE consistently outperforms LLM-as-a-judge baselines and matches or exceeds modern confidence-based methods in reranking candidates.
arXiv Detail & Related papers (2025-10-02T02:14:33Z) - Beyond Greedy Exits: Improved Early Exit Decisions for Risk Control and Reliability [14.00844847268286]
Early-Exit Deep Neural Networks enable adaptive inference by allowing prediction at intermediary layers.<n>Our framework demonstrates consistent improvements in speedup (1.70-2.10x) with a minimal performance drop (2%) as compared to full model performance.
arXiv Detail & Related papers (2025-09-28T06:05:24Z) - Accelerated Test-Time Scaling with Model-Free Speculative Sampling [58.69141724095398]
We introduce STAND (STochastic Adaptive N-gram Drafting), a novel model-free speculative decoding approach.<n>We show that STAND reduces inference latency by 60-65% compared to standard autoregressive decoding.<n>As a model-free approach, STAND can be applied to any existing language model without additional training.
arXiv Detail & Related papers (2025-06-05T07:31:18Z) - VeriThinker: Learning to Verify Makes Reasoning Model Efficient [52.74493506816969]
Large Reasoning Models excel at complex tasks using Chain-of-Thought (CoT) reasoning.<n>Their tendency to overthinking leads to unnecessarily lengthy reasoning chains.<n>We introduce VeriThinker, a novel approach for CoT compression.
arXiv Detail & Related papers (2025-05-23T14:17:56Z) - Dynamic Early Exit in Reasoning Models [21.30793518631921]
Overthinking in long chain-of-thought (CoT) generation slows down the efficiency of problem solving, but also risks accuracy loss.<n>We propose a simple yet effective method that allows LLMs to self-truncate CoT sequences by early exit during generation.<n>Our method requires no additional training and can be seamlessly integrated into existing o1-like reasoning LLMs.
arXiv Detail & Related papers (2025-04-22T13:36:53Z) - The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs.
We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit"
We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.