Evolutionary Guided Decoding: Iterative Value Refinement for LLMs
- URL: http://arxiv.org/abs/2503.02368v3
- Date: Sat, 04 Oct 2025 08:17:16 GMT
- Title: Evolutionary Guided Decoding: Iterative Value Refinement for LLMs
- Authors: Zhenhua Liu, Lijun Li, Ruizhe Chen, Yuxian Jiang, Tong Zhu, Zhaochen Su, Wenliang Chen, Jing Shao,
- Abstract summary: Iterative Value Refinement is a novel framework designed to bridge this gap.<n>It employs Value Exploration to provide a more comprehensive and robust training signal.<n>Iterative Self-Refinement uses the improved value function from one iteration to guide the generation of higher-quality data.
- Score: 41.56764640311065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While guided decoding, especially value-guided methods, has emerged as a cost-effective alternative for controlling language model outputs without re-training models, its effectiveness is limited by the accuracy of the value function. We identify that this inaccuracy stems from a core distributional gap: existing methods train static value functions on trajectories sampled exclusively from the base policy, which inherently confines their training to a narrow and suboptimal view of the potential output space. We propose Iterative Value Refinement, a novel framework designed to bridge this gap. It employs Value Exploration to provide a more comprehensive and robust training signal, complemented by Iterative Self-Refinement, which uses the improved value function from one iteration to guide the generation of higher-quality data for the next. Extensive experiments on text summarization, multi-turn dialogue, and instruction following demonstrate the effectiveness of our framework in aligning language models. Our approach not only achieves alignment but also significantly reduces computational costs by leveraging principled value function optimization for efficient and effective control.
Related papers
- VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment [24.492954219955788]
We propose a closed-loop framework designed to navigate the trade-off between fine-tuning and Aligning Large Language Models (LLMs)<n> VISA features a high-precision value detector, a semantic-to-value translator, and a core value-rewriter.<n>Our experiments demonstrate that this approach enables precise control over a model's value expression while maintaining its factual consistency and general capabilities.
arXiv Detail & Related papers (2026-03-05T05:12:26Z) - Cost-aware Stopping for Bayesian Optimization [53.34052774820105]
We propose a cost-aware stopping rule for Bayesian optimization that adapts to varying evaluation costs and is free of tuning.<n>We prove a theoretical guarantee bounding the expected cumulative evaluation cost incurred by our stopping rule when paired with state-of-the-art acquisition functions.
arXiv Detail & Related papers (2025-07-16T17:54:14Z) - A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning [61.403275660120606]
Reinforcement learning ( RL)-based fine-tuning has emerged as a powerful approach for aligning diffusion models with black-box objectives.<n>We propose leave-one-out PPO ( LOOP), a novel RL for diffusion fine-tuning method.<n>Our results demonstrate that LOOP effectively improves diffusion models on various black-box objectives.
arXiv Detail & Related papers (2025-03-02T13:43:53Z) - Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values [31.415598465903884]
Direct Value Optimization (DVO) is an innovative reinforcement learning framework for enhancing large language models in complex reasoning tasks.<n>DVO utilizes value signals at individual reasoning steps, optimizing models via a mean squared error loss.<n>Our empirical analysis on both mathematical and commonsense reasoning tasks shows that DVO consistently outperforms existing offline preference optimization techniques.
arXiv Detail & Related papers (2025-02-19T13:51:05Z) - Reward-Guided Speculative Decoding for Efficient LLM Reasoning [80.55186052123196]
We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs)<n>RSD incorporates a controlled bias to prioritize high-reward outputs, in contrast to existing speculative decoding methods that enforce strict unbiasedness.<n>RSD delivers significant efficiency gains against decoding with the target model only, while achieving significant better accuracy than parallel decoding method on average.
arXiv Detail & Related papers (2025-01-31T17:19:57Z) - Efficient Estimation and Sequential Optimization of Cost Functions in Variational Quantum Algorithms [1.4981317129908267]
We introduce a novel optimization methodology that conceptualizes the parameterized quantum circuit as a weighted sum of distinct unitary operators.<n>This representation facilitates the efficient evaluation of nonlocal characteristics of cost functions, as well as their arbitrary derivatives.<n>Our findings reveal substantial enhancements in convergence speed and accuracy relative to traditional optimization methods.
arXiv Detail & Related papers (2024-12-30T14:24:53Z) - Direct Preference Optimization Using Sparse Feature-Level Constraints [47.15096507230884]
Feature-level constrained Preference Optimization is a novel method designed to simplify the alignment process while ensuring stability.
Our approach enjoys efficiency by using sparse features activated in a well-trained sparse autoencoder and the quality of sequential KL divergence.
arXiv Detail & Related papers (2024-11-12T07:54:13Z) - Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence [60.37934652213881]
Domain Adaptation (DA) facilitates knowledge transfer from a source domain to a related target domain.
This paper investigates a practical DA paradigm, namely Source data-Free Active Domain Adaptation (SFADA), where source data becomes inaccessible during adaptation.
We present learn from the learnt (LFTL), a novel paradigm for SFADA to leverage the learnt knowledge from the source pretrained model and actively iterated models without extra overhead.
arXiv Detail & Related papers (2024-07-26T17:51:58Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Boosting Vision-Language Models with Transduction [12.281505126587048]
We present TransCLIP, a novel and computationally efficient transductive approach for vision-language models.
TransCLIP is applicable as a plug-and-play module on top of popular inductive zero- and few-shot models.
arXiv Detail & Related papers (2024-06-03T23:09:30Z) - Efficient Off-Policy Learning for High-Dimensional Action Spaces [22.129001951441015]
Existing off-policy reinforcement learning algorithms often rely on an explicit state-action-value function representation.<n>We present an efficient approach that utilizes only a state-value function as the critic for off-policy deep reinforcement learning.
arXiv Detail & Related papers (2024-03-07T12:45:51Z) - An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models [55.01592097059969]
Supervised finetuning on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities.
Active learning is effective in identifying useful subsets of samples to annotate from an unlabeled pool.
We propose using experimental design to circumvent the computational bottlenecks of active learning.
arXiv Detail & Related papers (2024-01-12T16:56:54Z) - Learning to Rank for Active Learning via Multi-Task Bilevel Optimization [29.207101107965563]
We propose a novel approach for active learning, which aims to select batches of unlabeled instances through a learned surrogate model for data acquisition.
A key challenge in this approach is developing an acquisition function that generalizes well, as the history of data, which forms part of the utility function's input, grows over time.
arXiv Detail & Related papers (2023-10-25T22:50:09Z) - Landscape-Sketch-Step: An AI/ML-Based Metaheuristic for Surrogate
Optimization Problems [0.0]
We introduce a newimats for global optimization in scenarios where extensive evaluations of the cost function are expensive, inaccessible, or even prohibitive.
The method, which we call Landscape-Sketch-and-Step (LSS), combines Machine Learning, Replica Optimization, and Reinforcement Learning techniques.
arXiv Detail & Related papers (2023-09-14T01:53:45Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - Neural Solvers for Fast and Accurate Numerical Optimal Control [12.80824586913772]
This paper provides techniques to improve the quality of optimized control policies given a fixed computational budget.
We achieve the above via a hypersolvers approach, which hybridizes a differential equation solver and a neural network.
arXiv Detail & Related papers (2022-03-13T10:46:50Z) - Implicit Rate-Constrained Optimization of Non-decomposable Objectives [37.43791617018009]
We consider a family of constrained optimization problems arising in machine learning.
Our key idea is to formulate a rate-constrained optimization that expresses the threshold parameter as a function of the model parameters.
We show how the resulting optimization problem can be solved using standard gradient based methods.
arXiv Detail & Related papers (2021-07-23T00:04:39Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.