Reinforcement Learning with Almost Sure Constraints
- URL: http://arxiv.org/abs/2112.05198v1
- Date: Thu, 9 Dec 2021 20:07:53 GMT
- Title: Reinforcement Learning with Almost Sure Constraints
- Authors: Agustin Castellano, Hancheng Min, Juan Bazerque, Enrique Mallada
- Abstract summary: We argue that stationary policies are not sufficient for solving this problem.
We show that the minimal budget required to act safely can be obtained as the smallest fixed point of a Bellman-like operator.
- Score: 1.0323063834827415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work we address the problem of finding feasible policies for
Constrained Markov Decision Processes under probability one constraints. We
argue that stationary policies are not sufficient for solving this problem, and
that a rich class of policies can be found by endowing the controller with a
scalar quantity, so called budget, that tracks how close the agent is to
violating the constraint. We show that the minimal budget required to act
safely can be obtained as the smallest fixed point of a Bellman-like operator,
for which we analyze its convergence properties. We also show how to learn this
quantity when the true kernel of the Markov decision process is not known,
while providing sample-complexity bounds. The utility of knowing this minimal
budget relies in that it can aid in the search of optimal or near-optimal
policies by shrinking down the region of the state space the agent must
navigate. Simulations illustrate the different nature of probability one
constraints against the typically used constraints in expectation.
Related papers
- A learning-based approach to stochastic optimal control under reach-avoid constraint [7.036452261968767]
We develop a model-free approach to optimally control, Markovian systems subject to a reach-avoid constraint.
We prove that under suitable assumptions, the policy parameters converge to the optimal parameters, while ensuring that the system trajectories satisfy the reach-avoid constraint with high probability.
arXiv Detail & Related papers (2024-12-21T10:07:40Z) - Statistical Analysis of Policy Space Compression Problem [54.1754937830779]
Policy search methods are crucial in reinforcement learning, offering a framework to address continuous state-action and partially observable problems.
Reducing the policy space through policy compression emerges as a powerful, reward-free approach to accelerate the learning process.
This technique condenses the policy space into a smaller, representative set while maintaining most of the original effectiveness.
arXiv Detail & Related papers (2024-11-15T02:46:55Z) - Algorithms for learning value-aligned policies considering admissibility relaxation [1.8336820954218835]
The emerging field of emphvalue awareness engineering claims that software agents and systems should be value-aware.
In this paper, we present two algorithms, $epsilontext-ADQL$ for strategies based on local alignment and its extension $epsilontext-CADQL$ for a sequence of decisions.
We have validated their efficiency in a water distribution problem in a drought scenario.
arXiv Detail & Related papers (2024-06-07T11:10:07Z) - Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage [100.8180383245813]
We propose value-based algorithms for offline reinforcement learning (RL)
We show an analogous result for vanilla Q-functions under a soft margin condition.
Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.
arXiv Detail & Related papers (2023-02-05T14:22:41Z) - Constrained Pure Exploration Multi-Armed Bandits with a Fixed Budget [4.226118870861363]
We consider a constrained, pure exploration, multi-armed bandit formulation under a fixed budget.
We propose an algorithm called textscConstrained-SR based on the Successive Rejects framework.
We show that the associated decay rate is nearly optimal relative to an information theoretic lower bound in certain special cases.
arXiv Detail & Related papers (2022-11-27T08:58:16Z) - Quantization for decentralized learning under subspace constraints [61.59416703323886]
We consider decentralized optimization problems where agents have individual cost functions to minimize subject to subspace constraints.
We propose and study an adaptive decentralized strategy where the agents employ differential randomized quantizers to compress their estimates.
The analysis shows that, under some general conditions on the quantization noise, the strategy is stable both in terms of mean-square error and average bit rate.
arXiv Detail & Related papers (2022-09-16T09:38:38Z) - Nearly Optimal Latent State Decoding in Block MDPs [74.51224067640717]
In episodic Block MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states.
We are first interested in estimating the latent state decoding function based on data generated under a fixed behavior policy.
We then study the problem of learning near-optimal policies in the reward-free framework.
arXiv Detail & Related papers (2022-08-17T18:49:53Z) - Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks [59.419152768018506]
We show that any optimal policy necessarily satisfies the k-SP constraint.
We propose a novel cost function that penalizes the policy violating SP constraint, instead of completely excluding it.
Our experiments on MiniGrid, DeepMind Lab, Atari, and Fetch show that the proposed method significantly improves proximal policy optimization (PPO)
arXiv Detail & Related papers (2021-07-13T21:39:21Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.