Related papers: Automatic Constraint Policy Optimization based on Continuous Constraint Interpolation Framework for Offline Reinforcement Learning

Automatic Constraint Policy Optimization based on Continuous Constraint Interpolation Framework for Offline Reinforcement Learning

URL: http://arxiv.org/abs/2601.23010v1
Date: Fri, 30 Jan 2026 14:21:41 GMT
Title: Automatic Constraint Policy Optimization based on Continuous Constraint Interpolation Framework for Offline Reinforcement Learning
Authors: Xinchen Han, Qiuyang Fang, Hossam Afifi, Michel Marot,
Abstract summary: offline Reinforcement Learning (RL) relies on policy constraints to shape performance.<n>Most existing methods commit to a single constraint family.<n>We propose Continuous Constraint Interpolation (CCI), a unified optimization framework.
Score: 2.0719232729184145
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Offline Reinforcement Learning (RL) relies on policy constraints to mitigate extrapolation error, where both the constraint form and constraint strength critically shape performance. However, most existing methods commit to a single constraint family: weighted behavior cloning, density regularization, or support constraints, without a unified principle that explains their connections or trade-offs. In this work, we propose Continuous Constraint Interpolation (CCI), a unified optimization framework in which these three constraint families arise as special cases along a common constraint spectrum. The CCI framework introduces a single interpolation parameter that enables smooth transitions and principled combinations across constraint types. Building on CCI, we develop Automatic Constraint Policy Optimization (ACPO), a practical primal--dual algorithm that adapts the interpolation parameter via a Lagrangian dual update. Moreover, we establish a maximum-entropy performance difference lemma and derive performance lower bounds for both the closed-form optimal policy and its parametric projection. Experiments on D4RL and NeoRL2 demonstrate robust gains across diverse domains, achieving state-of-the-art performance overall.

Related papers

Decoupling Constraint from Two Direction in Evolutionary Constrained Multi-objective Optimization [26.967831462095067]
We propose a novel algorithm named Decoupling Constraint from Two Directions (DCF2D)<n>It periodically detects constraint couplings and spawns an auxiliary population for each relevant constraint with an appropriate search direction.<n>Experiments on seven challenging CMOP benchmark suites and on a collection of real-world CMOPs demonstrate that DCF2D outperforms five state-of-the-art CMOEAs.
arXiv Detail & Related papers (2025-12-30T02:22:32Z)
Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning [52.03884701766989]
offline reinforcement learning (RL) algorithms typically impose constraints on action selection.<n>We propose a new neighborhood constraint that restricts action selection in the Bellman target to the union of neighborhoods of dataset actions.<n>We develop a simple yet effective algorithm, Adaptive Neighborhood-constrained Q learning (ANQ), to perform Q learning with target actions satisfying this constraint.
arXiv Detail & Related papers (2025-11-04T13:42:05Z)
Rectified Robust Policy Optimization for Model-Uncertain Constrained Reinforcement Learning without Strong Duality [53.525547349715595]
We propose a novel primal-only algorithm called Rectified Robust Policy Optimization (RRPO)<n>RRPO operates directly on the primal problem without relying on dual formulations.<n>We show convergence to an approximately optimal feasible policy with complexity matching the best-known lower bound.
arXiv Detail & Related papers (2025-08-24T16:59:38Z)
MOSIC: Model-Agnostic Optimal Subgroup Identification with Multi-Constraint for Improved Reliability [11.997050225896679]
We propose a unified optimization framework that directly solves the primal constrained optimization problem to identify optimal subgroups.<n>Our key innovation is a reformulation of the constrained primal problem as an unconstrained differentiable min-max objective, solved via a gradient descent-ascent algorithm.<n>The framework is model-agnostic, compatible with a wide range of CATE estimators, and propensity to additional constraints like cost limits or fairness criteria.
arXiv Detail & Related papers (2025-04-29T16:25:23Z)
One-Shot Safety Alignment for Large Language Models via Optimal Dualization [64.52223677468861]
This paper presents a perspective of dualization that reduces constrained alignment to an equivalent unconstrained alignment problem. We do so by pre-optimizing a smooth and convex dual function that has a closed form. Our strategy leads to two practical algorithms in model-based and preference-based settings.
arXiv Detail & Related papers (2024-05-29T22:12:52Z)
A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints [0.0]
We use a generic primal-dual framework for value-based and actor-critic reinforcement learning methods.<n>The obtained dual formulations turn out to be especially useful for imposing additional constraints on the learned policy.<n>A practical algorithm is derived that supports various combinations of policy constraints that are automatically handled throughout training.
arXiv Detail & Related papers (2024-04-25T09:50:57Z)
Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure. Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z)
Constrained Proximal Policy Optimization [36.20839673950677]
We propose a novel first-order feasible method named Constrained Proximal Policy Optimization (CPPO) Our approach integrates the Expectation-Maximization framework to solve it through two steps: 1) calculating the optimal policy distribution within the feasible region (E-step), and 2) conducting a first-order update to adjust the current policy towards the optimal policy obtained in the E-step (M-step) Empirical evaluations conducted in complex and uncertain environments validate the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-05-23T16:33:55Z)
Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints. A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual. This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z)
CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee [61.176159046544946]
In safe reinforcement learning (SRL) problems, an agent explores the environment to maximize an expected total reward and avoids violation of certain constraints. This is the first-time analysis of SRL algorithms with global optimal policies.
arXiv Detail & Related papers (2020-11-11T16:05:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.