Reinforcement Learning-assisted Constraint Relaxation for Constrained Expensive Optimization
- URL: http://arxiv.org/abs/2602.00532v1
- Date: Sat, 31 Jan 2026 05:52:36 GMT
- Title: Reinforcement Learning-assisted Constraint Relaxation for Constrained Expensive Optimization
- Authors: Qianhao Zhu, Sijie Ma, Zeyuan Ma, Hongshu Guo, Yue-Jiao Gong,
- Abstract summary: We propose learning effective, adaptive and generalizable constraint handling policy through reinforcement learning.<n>Specifically, a tailored Markov Decision Process is first formulated, where given optimization dynamics features, a deep Q-network-based policy controls the constraint relaxation level.<n>Such adaptive constraint handling provides flexible tradeoff between objective-oriented exploitation and feasible-region-oriented exploration.
- Score: 14.12072551134237
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Constraint handling plays a key role in solving realistic complex optimization problems. Though intensively discussed in the last few decades, existing constraint handling techniques predominantly rely on human experts' designs, which more or less fall short in utility towards general cases. Motivated by recent progress in Meta-Black-Box Optimization where automated algorithm design can be learned to boost optimization performance, in this paper, we propose learning effective, adaptive and generalizable constraint handling policy through reinforcement learning. Specifically, a tailored Markov Decision Process is first formulated, where given optimization dynamics features, a deep Q-network-based policy controls the constraint relaxation level along the underlying optimization process. Such adaptive constraint handling provides flexible tradeoff between objective-oriented exploitation and feasible-region-oriented exploration, and hence leads to promising optimization performance. We train our approach on CEC 2017 Constrained Optimization benchmark with limited evaluation budget condition (expensive cases) and compare the trained constraint handling policy to strong baselines such as recent winners in CEC/GECCO competitions. Extensive experimental results show that our approach performs competitively or even surpasses the compared baselines under either Leave-one-out cross-validation or ordinary train-test split validation. Further analysis and ablation studies reveal key insights in our designs.
Related papers
- Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning [77.92320830700797]
Reinforcement Learning has played a central role in enabling reasoning capabilities of Large Language Models.<n>We propose a tractable computational framework that tracks and leverages curvature information during policy updates.<n>The algorithm, Curvature-Aware Policy Optimization (CAPO), identifies samples that contribute to unstable updates and masks them out.
arXiv Detail & Related papers (2025-10-01T12:29:32Z) - Incentivizing Safer Actions in Policy Optimization for Constrained Reinforcement Learning [9.62939764063531]
Constrained Reinforcement Learning aims to maximize the return while adhering to predefined constraint limits.<n>In continuous control settings, balancing the trade-off between reward and constraint satisfaction remains a significant challenge.<n>We introduce a novel approach that integrates an adaptive incentive mechanism in addition to the reward structure to stay within the constraint bound.
arXiv Detail & Related papers (2025-09-11T07:33:35Z) - TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making [75.29820290660065]
This paper proposes Thought-Centric Preference Optimization ( TCPO) for effective embodied decision-making.<n>It emphasizes the alignment of the model's intermediate reasoning process, mitigating the problem of model degradation.<n>Experiments in the ALFWorld environment demonstrate an average success rate of 26.67%, achieving a 6% improvement over RL4VLM.
arXiv Detail & Related papers (2025-09-10T11:16:21Z) - Off-Policy Learning in Large Action Spaces: Optimization Matters More Than Estimation [6.001574550157585]
Off-policy evaluation (OPE) and off-policy learning (OPL) are foundational for decision-making in offline contextual bandits.<n>Recent advances in OPL primarily optimize OPE estimators with improved statistical properties.<n>We argue this estimator-centric approach neglects a critical practical obstacle: challenging optimization landscapes.
arXiv Detail & Related papers (2025-09-03T16:25:45Z) - LLM-guided Chemical Process Optimization with a Multi-Agent Approach [8.714038047141202]
We present a multi-agent LLM framework that autonomously infers operating constraints from minimal process descriptions.<n>Our AutoGen-based framework employs OpenAI's o3 model with specialized agents for constraint generation, parameter validation, simulation, and optimization guidance.
arXiv Detail & Related papers (2025-06-26T01:03:44Z) - Optimizing Anytime Reasoning via Budget Relative Policy Optimization [70.32755424260336]
We present a novel framework, AnytimeReasoner, to optimize anytime reasoning performance.<n>We truncate the complete thinking process to fit within sampled token budgets from a prior distribution.<n>We then optimize the thinking and summary policies in a decoupled manner to maximize the cumulative reward.
arXiv Detail & Related papers (2025-05-19T17:58:44Z) - Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z) - Advancing CMA-ES with Learning-Based Cooperative Coevolution for Scalable Optimization [12.899626317088885]
This paper introduces LCC, a pioneering learning-based cooperative coevolution framework.<n> LCC dynamically schedules decomposition strategies during optimization processes.<n>It offers certain advantages over state-of-the-art baselines in terms of optimization effectiveness and resource consumption.
arXiv Detail & Related papers (2025-04-24T14:09:22Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.<n>To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.<n>Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Backpropagation of Unrolled Solvers with Folded Optimization [55.04219793298687]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks.
One typical strategy is algorithm unrolling, which relies on automatic differentiation through the operations of an iterative solver.
This paper provides theoretical insights into the backward pass of unrolled optimization, leading to a system for generating efficiently solvable analytical models of backpropagation.
arXiv Detail & Related papers (2023-01-28T01:50:42Z) - Self-Supervised Primal-Dual Learning for Constrained Optimization [19.965556179096385]
This paper studies how to train machine-learning models that directly approximate the optimal solutions of constrained optimization problems.
It proposes the idea of Primal-Dual Learning (PDL), a self-supervised training method that does not require a set of pre-solved instances or an optimization solver for training and inference.
arXiv Detail & Related papers (2022-08-18T20:07:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.