Related papers: Constrained Exploration in Reinforcement Learning with Optimality Preservation

Constrained Exploration in Reinforcement Learning with Optimality Preservation

URL: http://arxiv.org/abs/2304.03104v1
Date: Wed, 5 Apr 2023 15:49:51 GMT
Title: Constrained Exploration in Reinforcement Learning with Optimality Preservation
Authors: Peter C. Y. Chen
Abstract summary: We consider a class of reinforcement-learning systems in which the agent follows a behavior policy to explore a discrete state-action space to find an optimal policy. Such restriction may prevent the agent from visiting some state-action pairs, possibly leading to the agent finding only a sub-optimal policy. We introduce the concept of constrained exploration with optimality preservation, whereby the exploration behavior of the agent is constrained to meet a specification.
Score: 2.4671396651514983
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider a class of reinforcement-learning systems in which the agent follows a behavior policy to explore a discrete state-action space to find an optimal policy while adhering to some restriction on its behavior. Such restriction may prevent the agent from visiting some state-action pairs, possibly leading to the agent finding only a sub-optimal policy. To address this problem we introduce the concept of constrained exploration with optimality preservation, whereby the exploration behavior of the agent is constrained to meet a specification while the optimality of the (original) unconstrained learning process is preserved. We first establish a feedback-control structure that models the dynamics of the unconstrained learning process. We then extend this structure by adding a supervisor to ensure that the behavior of the agent meets the specification, and establish (for a class of reinforcement-learning problems with a known deterministic environment) a necessary and sufficient condition under which optimality is preserved. This work demonstrates the utility and the prospect of studying reinforcement-learning problems in the context of the theories of discrete-event systems, automata and formal languages.

Related papers

Learning to Lead: Incentivizing Strategic Agents in the Dark [50.93875404941184]
We study an online learning version of the generalized principal-agent model.<n>We develop the first provably sample-efficient algorithm for this challenging setting.<n>We establish a near optimal $tildeO(sqrtT) $ regret bound for learning the principal's optimal policy.
arXiv Detail & Related papers (2025-06-10T04:25:04Z)
Towards Unsupervised Multi-Agent Reinforcement Learning via Task-Agnostic Exploration [44.601019677298005]
We present a scalable, decentralized, trust-region policy search algorithm to address the problem in practical settings.<n>We show that optimizing for a specific objective, namely mixture entropy, provides an excellent trade-off between tractability and performances.
arXiv Detail & Related papers (2025-02-12T12:51:36Z)
Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study. It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction. We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z)
Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice. HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics. Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z)
Resilient Constrained Learning [94.27081585149836]
This paper presents a constrained learning approach that adapts the requirements while simultaneously solving the learning task. We call this approach resilient constrained learning after the term used to describe ecological systems that adapt to disruptions by modifying their operation.
arXiv Detail & Related papers (2023-06-04T18:14:18Z)
Maximum Causal Entropy Inverse Constrained Reinforcement Learning [3.409089945290584]
We propose a novel method that utilizes the principle of maximum causal entropy to learn constraints and an optimal policy. We evaluate the effectiveness of the learned policy by assessing the reward received and the number of constraint violations. Our method has been shown to outperform state-of-the-art approaches across a variety of tasks and environments.
arXiv Detail & Related papers (2023-05-04T14:18:19Z)
When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z)
A Globally Convergent Evolutionary Strategy for Stochastic Constrained Optimization with Applications to Reinforcement Learning [0.6445605125467573]
Evolutionary strategies have been shown to achieve competing levels of performance for complex optimization problems in reinforcement learning. Convergence guarantees for evolutionary strategies to optimize constrained problems are however lacking in the literature.
arXiv Detail & Related papers (2022-02-21T17:04:51Z)
Towards Robust Bisimulation Metric Learning [3.42658286826597]
Bisimulation metrics offer one solution to representation learning problem. We generalize value function approximation bounds for on-policy bisimulation metrics to non-optimal policies. We find that these issues stem from an underconstrained dynamics model and an unstable dependence of the embedding norm on the reward signal.
arXiv Detail & Related papers (2021-10-27T00:32:07Z)
Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem. We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z)
Constrained Markov Decision Processes via Backward Value Functions [43.649330976089004]
We model the problem of learning with constraints as a Constrained Markov Decision Process. A key contribution of our approach is to translate cumulative cost constraints into state-based constraints. We provide theoretical guarantees under which the agent converges while ensuring safety over the course of training.
arXiv Detail & Related papers (2020-08-26T20:56:16Z)
Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process. Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.