Provably Efficient Exploration in Inverse Constrained Reinforcement Learning
- URL: http://arxiv.org/abs/2409.15963v4
- Date: Fri, 16 May 2025 09:40:19 GMT
- Title: Provably Efficient Exploration in Inverse Constrained Reinforcement Learning
- Authors: Bo Yue, Jian Li, Guiliang Liu,
- Abstract summary: Inverse Constrained Reinforcement Learning is a common solver for recovering feasible constraints in complex environments.<n>We propose a strategic exploration framework for sampling with guaranteed efficiency to bridge this gap.<n>We introduce two exploratory algorithms to achieve efficient constraint inference.
- Score: 12.178081346315523
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Optimizing objective functions subject to constraints is fundamental in many real-world applications. However, these constraints are often not readily defined and must be inferred from expert agent behaviors, a problem known as Inverse Constraint Inference. Inverse Constrained Reinforcement Learning (ICRL) is a common solver for recovering feasible constraints in complex environments, relying on training samples collected from interactive environments. However, the efficacy and efficiency of current sampling strategies remain unclear. We propose a strategic exploration framework for sampling with guaranteed efficiency to bridge this gap. By defining the feasible cost set for ICRL problems, we analyze how estimation errors in transition dynamics and the expert policy influence the feasibility of inferred constraints. Based on this analysis, we introduce two exploratory algorithms to achieve efficient constraint inference via 1) dynamically reducing the bounded aggregate error of cost estimations or 2) strategically constraining the exploration policy around plausibly optimal ones. Both algorithms are theoretically grounded with tractable sample complexity, and their performance is validated empirically across various environments.
Related papers
- Learning to Explore: An In-Context Learning Approach for Pure Exploration [23.16863295063427]
In this work, we study the active sequential hypothesis testing problem, also known as pure exploration.<n>We introduce In-Context Pure Exploration (ICPE), an in-context learning approach that uses Transformers to learn exploration strategies directly from experience.<n>ICPE combines supervised learning and reinforcement learning to identify and exploit latent structure across related tasks, without requiring prior assumptions.
arXiv Detail & Related papers (2025-06-02T17:04:50Z) - Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints [49.76332265680669]
This paper examines a crucial subset of problems where both the objective and constraint functions are weakly convex.<n>Existing methods often face limitations, including slow convergence rates or reliance on double-loop designs.<n>We introduce a novel single-loop penalty-based algorithm to overcome these challenges.
arXiv Detail & Related papers (2025-04-21T17:15:48Z) - On The Sample Complexity Bounds In Bilevel Reinforcement Learning [36.239015146313136]
Bilevel reinforcement learning (BRL) has emerged as a powerful mathematical framework for studying generative AI alignment.
We present the first sample complexity result for BRL, achieving a bound of $epsilon-4$.
This result extends to standard bilevel optimization problems, providing an interesting theoretical contribution with practical implications.
arXiv Detail & Related papers (2025-03-22T04:22:04Z) - COMBO-Grasp: Learning Constraint-Based Manipulation for Bimanual Occluded Grasping [56.907940167333656]
Occluded robot grasping is where the desired grasp poses are kinematically infeasible due to environmental constraints such as surface collisions.
Traditional robot manipulation approaches struggle with the complexity of non-prehensile or bimanual strategies commonly used by humans.
We introduce Constraint-based Manipulation for Bimanual Occluded Grasping (COMBO-Grasp), a learning-based approach which leverages two coordinated policies.
arXiv Detail & Related papers (2025-02-12T01:31:01Z) - Evolutionary Algorithm with Detection Region Method for Constrained Multi-Objective Problems with Binary Constraints [9.764702512419946]
This paper proposes a novel algorithm called DRMCMO based on the detection region method.
In DRMCMO, detection regions dynamic monitor feasible solutions to enhance convergence, helping the population escape local optima.
We have modified three existing test suites to serve as benchmark test problems for CMOPs with binary constraints.
arXiv Detail & Related papers (2024-11-13T08:39:04Z) - EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty.
We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications.
Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z) - Adaptive trajectory-constrained exploration strategy for deep
reinforcement learning [6.589742080994319]
Deep reinforcement learning (DRL) faces significant challenges in addressing the hard-exploration problems in tasks with sparse or deceptive rewards and large state spaces.
We propose an efficient adaptive trajectory-constrained exploration strategy for DRL.
We conduct experiments on two large 2D grid world mazes and several MuJoCo tasks.
arXiv Detail & Related papers (2023-12-27T07:57:15Z) - Controlling Continuous Relaxation for Combinatorial Optimization [0.0]
Unsupervised learning solvers for optimization (CO) train a neural network that generates a soft solution using a continuous relaxation strategy.
This study introduces a Continuous Relaxation Anneal (CRA) strategy, an effective-free learning method for UL-based solvers.
arXiv Detail & Related papers (2023-09-29T04:23:58Z) - Evolving Constrained Reinforcement Learning Policy [5.4444944707433525]
We propose a novel evolutionary constrained reinforcement learning algorithm, which adaptively balances the reward and constraint violation with ranking.
Experiments on robotic control benchmarks show that our ECRL achieves outstanding performance compared to state-of-the-art algorithms.
arXiv Detail & Related papers (2023-04-19T03:54:31Z) - Reinforcement Learning with Stepwise Fairness Constraints [50.538878453547966]
We introduce the study of reinforcement learning with stepwise fairness constraints.
We provide learning algorithms with strong theoretical guarantees in regard to policy optimality and fairness violation.
arXiv Detail & Related papers (2022-11-08T04:06:23Z) - Exploring Viable Algorithmic Options for Learning from Demonstration
(LfD): A Parameterized Complexity Approach [0.0]
In this paper, we show how such a systematic exploration of algorithmic options can be done using parameterized complexity analysis.
We show that none of our problems can be solved efficiently either in general or relative to a number of (often simultaneous) restrictions on environments, demonstrations, and policies.
arXiv Detail & Related papers (2022-05-10T15:54:06Z) - A Globally Convergent Evolutionary Strategy for Stochastic Constrained
Optimization with Applications to Reinforcement Learning [0.6445605125467573]
Evolutionary strategies have been shown to achieve competing levels of performance for complex optimization problems in reinforcement learning.
Convergence guarantees for evolutionary strategies to optimize constrained problems are however lacking in the literature.
arXiv Detail & Related papers (2022-02-21T17:04:51Z) - Instance-Dependent Confidence and Early Stopping for Reinforcement
Learning [99.57168572237421]
Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure.
This research provides guarantees that explain textitex post the performance differences observed.
A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice.
arXiv Detail & Related papers (2022-01-21T04:25:35Z) - The Statistical Complexity of Interactive Decision Making [126.04974881555094]
We provide a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning.
A unified algorithm design principle, Estimation-to-Decisions (E2D), transforms any algorithm for supervised estimation into an online algorithm for decision making.
arXiv Detail & Related papers (2021-12-27T02:53:44Z) - Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks [59.419152768018506]
We show that any optimal policy necessarily satisfies the k-SP constraint.
We propose a novel cost function that penalizes the policy violating SP constraint, instead of completely excluding it.
Our experiments on MiniGrid, DeepMind Lab, Atari, and Fetch show that the proposed method significantly improves proximal policy optimization (PPO)
arXiv Detail & Related papers (2021-07-13T21:39:21Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Constrained Reinforcement Learning for Dynamic Optimization under
Uncertainty [1.5797349391370117]
Dynamic real-time optimization (DRTO) is a challenging task due to the fact that optimal operating conditions must be computed in real time.
The main bottleneck in the industrial application of DRTO is the presence of uncertainty.
We present a constrained reinforcement learning (RL) based approach to accommodate these difficulties.
arXiv Detail & Related papers (2020-06-04T10:17:35Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.