Automatic Constraint Policy Optimization based on Continuous Constraint Interpolation Framework for Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2601.23010v1
- Date: Fri, 30 Jan 2026 14:21:41 GMT
- Title: Automatic Constraint Policy Optimization based on Continuous Constraint Interpolation Framework for Offline Reinforcement Learning
- Authors: Xinchen Han, Qiuyang Fang, Hossam Afifi, Michel Marot,
- Abstract summary: offline Reinforcement Learning (RL) relies on policy constraints to shape performance.<n>Most existing methods commit to a single constraint family.<n>We propose Continuous Constraint Interpolation (CCI), a unified optimization framework.
- Score: 2.0719232729184145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline Reinforcement Learning (RL) relies on policy constraints to mitigate extrapolation error, where both the constraint form and constraint strength critically shape performance. However, most existing methods commit to a single constraint family: weighted behavior cloning, density regularization, or support constraints, without a unified principle that explains their connections or trade-offs. In this work, we propose Continuous Constraint Interpolation (CCI), a unified optimization framework in which these three constraint families arise as special cases along a common constraint spectrum. The CCI framework introduces a single interpolation parameter that enables smooth transitions and principled combinations across constraint types. Building on CCI, we develop Automatic Constraint Policy Optimization (ACPO), a practical primal--dual algorithm that adapts the interpolation parameter via a Lagrangian dual update. Moreover, we establish a maximum-entropy performance difference lemma and derive performance lower bounds for both the closed-form optimal policy and its parametric projection. Experiments on D4RL and NeoRL2 demonstrate robust gains across diverse domains, achieving state-of-the-art performance overall.
Related papers
- Decoupling Constraint from Two Direction in Evolutionary Constrained Multi-objective Optimization [26.967831462095067]
We propose a novel algorithm named Decoupling Constraint from Two Directions (DCF2D)<n>It periodically detects constraint couplings and spawns an auxiliary population for each relevant constraint with an appropriate search direction.<n>Experiments on seven challenging CMOP benchmark suites and on a collection of real-world CMOPs demonstrate that DCF2D outperforms five state-of-the-art CMOEAs.
arXiv Detail & Related papers (2025-12-30T02:22:32Z) - Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning [52.03884701766989]
offline reinforcement learning (RL) algorithms typically impose constraints on action selection.<n>We propose a new neighborhood constraint that restricts action selection in the Bellman target to the union of neighborhoods of dataset actions.<n>We develop a simple yet effective algorithm, Adaptive Neighborhood-constrained Q learning (ANQ), to perform Q learning with target actions satisfying this constraint.
arXiv Detail & Related papers (2025-11-04T13:42:05Z) - Rectified Robust Policy Optimization for Model-Uncertain Constrained Reinforcement Learning without Strong Duality [53.525547349715595]
We propose a novel primal-only algorithm called Rectified Robust Policy Optimization (RRPO)<n>RRPO operates directly on the primal problem without relying on dual formulations.<n>We show convergence to an approximately optimal feasible policy with complexity matching the best-known lower bound.
arXiv Detail & Related papers (2025-08-24T16:59:38Z) - MOSIC: Model-Agnostic Optimal Subgroup Identification with Multi-Constraint for Improved Reliability [11.997050225896679]
We propose a unified optimization framework that directly solves the primal constrained optimization problem to identify optimal subgroups.<n>Our key innovation is a reformulation of the constrained primal problem as an unconstrained differentiable min-max objective, solved via a gradient descent-ascent algorithm.<n>The framework is model-agnostic, compatible with a wide range of CATE estimators, and propensity to additional constraints like cost limits or fairness criteria.
arXiv Detail & Related papers (2025-04-29T16:25:23Z) - One-Shot Safety Alignment for Large Language Models via Optimal Dualization [64.52223677468861]
This paper presents a perspective of dualization that reduces constrained alignment to an equivalent unconstrained alignment problem.
We do so by pre-optimizing a smooth and convex dual function that has a closed form.
Our strategy leads to two practical algorithms in model-based and preference-based settings.
arXiv Detail & Related papers (2024-05-29T22:12:52Z) - A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints [0.0]
We use a generic primal-dual framework for value-based and actor-critic reinforcement learning methods.<n>The obtained dual formulations turn out to be especially useful for imposing additional constraints on the learned policy.<n>A practical algorithm is derived that supports various combinations of policy constraints that are automatically handled throughout training.
arXiv Detail & Related papers (2024-04-25T09:50:57Z) - Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - Constrained Proximal Policy Optimization [36.20839673950677]
We propose a novel first-order feasible method named Constrained Proximal Policy Optimization (CPPO)
Our approach integrates the Expectation-Maximization framework to solve it through two steps: 1) calculating the optimal policy distribution within the feasible region (E-step), and 2) conducting a first-order update to adjust the current policy towards the optimal policy obtained in the E-step (M-step)
Empirical evaluations conducted in complex and uncertain environments validate the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-05-23T16:33:55Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - CRPO: A New Approach for Safe Reinforcement Learning with Convergence
Guarantee [61.176159046544946]
In safe reinforcement learning (SRL) problems, an agent explores the environment to maximize an expected total reward and avoids violation of certain constraints.
This is the first-time analysis of SRL algorithms with global optimal policies.
arXiv Detail & Related papers (2020-11-11T16:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.