Related papers: Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization

Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization

URL: http://arxiv.org/abs/2509.18116v2
Date: Fri, 07 Nov 2025 13:28:17 GMT
Title: Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization
Authors: Nathan Egbuna, Saatvik Gaur, Sunishchal Dev, Ashwinee Panda, Maheep Chaudhary,
Abstract summary: Amortized Latent Steering (ALS) collapses iterative optimization into a single offline-computed vector.<n>ALS achieves $2-5times$ speedup over iterative methods while matching or surpassing greedy Chain-of-Thought (CoT) and Self-Consistency baselines.<n>Results show that much of latent optimization's benefit can be captured offline, making sophisticated reasoning techniques viable for production deployment.
Score: 3.9311957222075935
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Test-time optimization remains impractical at scale due to prohibitive inference costs--techniques like iterative refinement and multi-step verification can require $10-100\times$ more compute per query than standard decoding. Latent space test-time optimization methods like LatentSeek offer a more direct approach by steering hidden representations, but still demand expensive per-query optimization loops with multiple backward passes. We propose Amortized Latent Steering (ALS), which collapses this iterative optimization into a single offline-computed vector applied at constant cost during inference. ALS computes the mean difference between hidden states from successful versus unsuccessful generations, then uses this direction to calibrate the model's hidden representations: when decoding drifts away from the success manifold, ALS nudges activations back toward it. Across GSM8K and MATH-500 benchmarks, ALS achieves $2-5\times$ speedup over iterative methods while matching or surpassing greedy Chain-of-Thought (CoT) and Self-Consistency baselines, yielding up to 101% improvement in efficiency--accuracy trade-off. These results show that much of latent optimization's benefit can be captured offline, making sophisticated reasoning techniques viable for production deployment. Code is available at https://github.com/negbuna/ALS.

Related papers

$\ abla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space [71.23672814629448]
$nabla$-Reasoner is an iterative generation framework that integrates differentiable optimization over token logits into the decoding loop.<n>$nabla$-Reasoner achieves over 20% accuracy improvement on a challenging mathematical reasoning benchmark.
arXiv Detail & Related papers (2026-03-05T08:42:54Z)
ZIP-RC: Optimizing Test-Time Compute via Zero-Overhead Joint Reward-Cost Prediction [57.799425838564]
We present ZIP-RC, an adaptive inference method that equips models with zero-overhead inference-time predictions of reward and cost.<n> ZIP-RC improves accuracy by up to 12% over majority voting at equal or lower average cost.
arXiv Detail & Related papers (2025-12-01T09:44:31Z)
$\texttt{SPECS}$: Faster Test-Time Scaling through Speculative Drafts [55.231201692232894]
$textttSPECS$ is a latency-aware test-time scaling method inspired by speculative decoding.<n>Our results show that $textttSPECS$matches or surpasses beam search accuracy while reducing latency by up to $sim$19.1%.
arXiv Detail & Related papers (2025-06-15T05:50:05Z)
Stacey: Promoting Stochastic Steepest Descent via Accelerated $\ell_p$-Smooth Nonconvex Optimization [15.179519413549086]
We introduce a new accelerated $ell_p$ steepest descent algorithm, called Stacey, to handle non-Euclidean smooth optimization tasks.<n>In addition to providing theoretical guarantees for the foundations of our algorithm, we empirically compare our approach against popular methods.
arXiv Detail & Related papers (2025-06-07T00:47:07Z)
Thinking Before Running! Efficient Code Generation with Thorough Exploration and Optimal Refinement [47.89758553708932]
We introduce textbfThinkCoder, a framework that combines thorough exploration with optimal refinement.<n>The exploration phase diversifies the solution space by searching for potential solutions, followed by a refinement phase that enhances precision.<n>To further minimize test-time computation overhead, we introduce preference-driven optimization with Reinforced Self-Training (ReST)
arXiv Detail & Related papers (2024-12-30T07:02:15Z)
Online Mirror Descent for Tchebycheff Scalarization in Multi-Objective Optimization [14.970965673760427]
We propose an online mirror descent algorithm for Tcheche scalarization, which we call OMD-TCH. We show the effectiveness of OMD-TCH on both synthetic problems and federated learning tasks under fairness constraints.
arXiv Detail & Related papers (2024-10-29T05:58:33Z)
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization [78.82586283794886]
$chi2$-Preference Optimization ($chi$PO) is an efficient offline alignment algorithm provably robust to overoptimization.<n>$chi$PO implements the principle of pessimism in the face of uncertainty via regularization.<n>$chi$PO's simplicity and strong guarantees make it the first practical and general-purpose offline alignment algorithm provably robust to overoptimization.
arXiv Detail & Related papers (2024-07-18T11:08:40Z)
Sparsity-Constraint Optimization via Splicing Iteration [1.3622424109977902]
We develop an algorithm named Sparsity-Constraint Optimization via sPlicing itEration (SCOPE) SCOPE converges effectively without tuning parameters. We apply SCOPE to solve quadratic optimization, learn sparse classifiers, and recover sparse Markov networks for binary variables. Our open-source Python package skscope based on C++ implementation is publicly available on GitHub.
arXiv Detail & Related papers (2024-06-17T18:34:51Z)
Non-stationary Delayed Online Convex Optimization: From Full-information to Bandit Setting [71.82716109461967]
We propose an algorithm called Mild-OGD for the full-information case, where delayed gradients are available.<n>We show that the dynamic regret of Mild-OGD can be automatically bounded by $O(sqrtbardT(P_T+1))$ under the in-order assumption.<n>We also develop a bandit variant of Mild-OGD for a more challenging case with only delayed loss values.
arXiv Detail & Related papers (2023-05-20T07:54:07Z)
Optimal Stochastic Non-smooth Non-convex Optimization through Online-to-Non-convex Conversion [46.46038357597395]
We present new algorithms for optimizing non-known, non-smooth objectives based on a novel analysis technique.<n>For deterministic second-order smooth objectives, applying advanced optimistic online learning techniques enables a new $O(delta0.5) all$ to recover optimal or best-known results.
arXiv Detail & Related papers (2023-02-07T22:09:20Z)
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning. The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.