Leap+Verify: Regime-Adaptive Speculative Weight Prediction for Accelerating Neural Network Training
- URL: http://arxiv.org/abs/2602.19580v1
- Date: Mon, 23 Feb 2026 08:01:44 GMT
- Title: Leap+Verify: Regime-Adaptive Speculative Weight Prediction for Accelerating Neural Network Training
- Authors: Jeremy McEntire,
- Abstract summary: We introduce Leap+Verify to accelerate neural network training.<n>It applies speculative execution -- predicting future model weights and validating predictions before acceptance.<n>We evaluate Leap+Verify on GPT-2 124M and Qwen 2.5-1.5B trained on WikiText-103 across five random seeds.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Leap+Verify, a framework that applies speculative execution -- predicting future model weights and validating predictions before acceptance -- to accelerate neural network training. Inspired by speculative decoding in language model inference and by the Automatically Scalable Computation (ASC) architecture for program execution, Leap+Verify decomposes training into three dynamically detected regimes (chaotic, transition, stable) using activation-space cosine similarity as a real-time Lyapunov proxy signal. Within each regime, analytic weight predictors (momentum, linear, quadratic extrapolation) attempt to forecast model parameters K training steps ahead; predictions are accepted only when validated against a held-out loss criterion. We evaluate Leap+Verify on GPT-2 124M and Qwen 2.5-1.5B trained on WikiText-103 across five random seeds, sweeping prediction depth K in {5, 10, 25, 50, 75, 100}. Momentum-based prediction (Adam moment extrapolation) fails catastrophically at both scales, with predicted losses exceeding actuals by 100-10,000x -- a universal norm explosion in optimizer-state extrapolation. Finite-difference predictors (linear, quadratic) succeed where momentum fails: at 124M, they achieve 24% strict acceptance at K=5 in stable regimes; at 1.5B, they achieve 37% strict acceptance in transition regimes. The scale-dependent finding is in regime distribution: GPT-2 124M spends 34% of training in stable regime, while Qwen 1.5B spends 64% in chaotic regime and reaches stable in only 0-2 of 40 checkpoints. Larger models are more predictable when predictable, but less often predictable -- the practical bottleneck shifts from predictor accuracy to regime availability. Cross-seed results are highly consistent (less than 1% validation loss variance), and the three-regime framework produces identical phase boundaries (plus or minus 50 steps) across seeds.
Related papers
- d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models [45.27333046908981]
emphd-TreeRPO is a reliable reinforcement learning framework for dLLMs.<n>We show that emphd-TreeRPO achieves significant gains on multiple reasoning benchmarks.
arXiv Detail & Related papers (2025-12-10T14:20:07Z) - LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning [15.597220136913258]
LYNX is an online early-exit mechanism that turns a model's own hidden-state awareness into confidence-controlled stopping decisions.<n>We train and calibrate this probe once on a generic mathematical corpus and reuse it unchanged across benchmarks, decoding temperatures, and even non-mathematical tasks.
arXiv Detail & Related papers (2025-12-05T00:04:42Z) - Cost-Sensitive Conformal Training with Provably Controllable Learning Bounds [21.86960662161151]
Conformal prediction is a framework to quantify the predictive uncertainty of machine learning models.<n>To align the uncertainty measured by CP, conformal training methods minimize the size of the prediction sets.<n>We propose a simple cost-sensitive conformal training algorithm that does not rely on the indicator approximation mechanism.
arXiv Detail & Related papers (2025-11-22T01:11:44Z) - Bridging the Gap Between Bayesian Deep Learning and Ensemble Weather Forecasts [100.26854618129039]
Weather forecasting is fundamentally challenged by the chaotic nature of the atmosphere.<n>Recent advances in Bayesian Deep Learning (BDL) offer a promising but often disconnected alternative.<n>We bridge these paradigms through a unified hybrid BDL framework for ensemble weather forecasting.
arXiv Detail & Related papers (2025-11-18T07:49:52Z) - Re-FORC: Adaptive Reward Prediction for Efficient Chain-of-Thought Reasoning [85.76121000710522]
We propose Re-FORC, an adaptive reward prediction method.<n>It enables prediction of the expected future rewards as a function of the number of future thinking tokens.
arXiv Detail & Related papers (2025-11-03T23:47:49Z) - One Sample is Enough to Make Conformal Prediction Robust [53.78604391939934]
We show that conformal prediction attains some robustness even with a forward pass on a single randomly perturbed input.<n>Our approach returns robust sets with smaller average set size compared to SOTA methods which use many (e.g. around 100) passes per input.
arXiv Detail & Related papers (2025-06-19T19:14:25Z) - Mamba time series forecasting with uncertainty quantification [0.0]
In electricity consumption benchmarks, Mamba forecasts exhibit a mean error of approximately 8%.<n>In traffic occupancy benchmarks, the mean error reaches 18%.
arXiv Detail & Related papers (2025-03-13T20:39:38Z) - Movement-Prediction-Adjusted Naive Forecast [6.935130578959931]
The movement-prediction-adjusted naive forecast (MPANF) is designed to improve point forecasts beyond the naive baseline.<n> MPANF can serve as an effective second-stage method when reliable movement predictions are available.
arXiv Detail & Related papers (2024-06-20T16:32:18Z) - Regression Trees for Fast and Adaptive Prediction Intervals [2.6763498831034043]
We present a family of methods to calibrate prediction intervals for regression problems with local coverage guarantees.
We create a partition by training regression trees and Random Forests on conformity scores.
Our proposal is versatile, as it applies to various conformity scores and prediction settings.
arXiv Detail & Related papers (2024-02-12T01:17:09Z) - Learning Sample Difficulty from Pre-trained Models for Reliable
Prediction [55.77136037458667]
We propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization.
We simultaneously improve accuracy and uncertainty calibration across challenging benchmarks.
arXiv Detail & Related papers (2023-04-20T07:29:23Z) - Uncertainty estimation of pedestrian future trajectory using Bayesian
approximation [137.00426219455116]
Under dynamic traffic scenarios, planning based on deterministic predictions is not trustworthy.
The authors propose to quantify uncertainty during forecasting using approximation which deterministic approaches fail to capture.
The effect of dropout weights and long-term prediction on future state uncertainty has been studied.
arXiv Detail & Related papers (2022-05-04T04:23:38Z) - Dense Uncertainty Estimation [62.23555922631451]
In this paper, we investigate neural networks and uncertainty estimation techniques to achieve both accurate deterministic prediction and reliable uncertainty estimation.
We work on two types of uncertainty estimations solutions, namely ensemble based methods and generative model based methods, and explain their pros and cons while using them in fully/semi/weakly-supervised framework.
arXiv Detail & Related papers (2021-10-13T01:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.