Efficient Algorithms for Planning with Participation Constraints
- URL: http://arxiv.org/abs/2205.07767v1
- Date: Mon, 16 May 2022 15:47:41 GMT
- Title: Efficient Algorithms for Planning with Participation Constraints
- Authors: Hanrui Zhang, Yu Cheng, Vincent Conitzer
- Abstract summary: We consider the problem of planning with participation constraints introduced in [Zhang et al., 2022]
In this problem, a principal chooses actions in a decision process, resulting in separate utilities for the principal and the agent.
We provide the first-time exact algorithm for this problem for finite-horizon settings, where previously only an additive $varepsilon$-approximation was known.
- Score: 74.74967476995572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of planning with participation constraints introduced
in [Zhang et al., 2022]. In this problem, a principal chooses actions in a
Markov decision process, resulting in separate utilities for the principal and
the agent. However, the agent can and will choose to end the process whenever
his expected onward utility becomes negative. The principal seeks to compute
and commit to a policy that maximizes her expected utility, under the
constraint that the agent should always want to continue participating. We
provide the first polynomial-time exact algorithm for this problem for
finite-horizon settings, where previously only an additive
$\varepsilon$-approximation algorithm was known. Our approach can also be
extended to the (discounted) infinite-horizon case, for which we give an
algorithm that runs in time polynomial in the size of the input and
$\log(1/\varepsilon)$, and returns a policy that is optimal up to an additive
error of $\varepsilon$.
Related papers
- Provably Efficient Reinforcement Learning via Surprise Bound [66.15308700413814]
We propose a provably efficient reinforcement learning algorithm (both computationally and statistically) with general value function approximations.
Our algorithm achieves reasonable regret bounds when applied to both the linear setting and the sparse high-dimensional linear setting.
arXiv Detail & Related papers (2023-02-22T20:21:25Z) - Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes.
This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z) - Learning Infinite-Horizon Average-Reward Markov Decision Processes with
Constraints [39.715977181666766]
We study regret for infinite-horizon average-reward Markov Decision Processes (MDPs) under cost constraints.
Our algorithm ensures $widetildeO(sqrtT)$ regret and constant constraint violation for ergodic MDPs.
These are the first set of provable algorithms for weakly communicating MDPs with cost constraints.
arXiv Detail & Related papers (2022-01-31T23:52:34Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Model-Free Algorithm and Regret Analysis for MDPs with Long-Term
Constraints [38.2783003051101]
This paper uses concepts from constrained optimization and Q-learning to propose an algorithm for CMDP with long-term constraints.
We note that these are the first results on regret analysis for MDP with long-term constraints, where the transition probabilities are not known apriori.
arXiv Detail & Related papers (2020-06-10T17:19:29Z) - Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints [38.2783003051101]
This paper considers the peak Constrained Markov Decision Process (PCMDP), where the agent chooses the policy to maximize total reward in the finite horizon as well as satisfy constraints at each epoch with probability 1.
We propose a model-free algorithm that converts PCMDP problem to an unconstrained problem and a Q-learning based approach is applied.
arXiv Detail & Related papers (2020-03-11T23:23:29Z) - Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation.
We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.