Related papers: Primal-Dual Sample Complexity Bounds for Constrained Markov Decision Processes with Multiple Constraints

Primal-Dual Sample Complexity Bounds for Constrained Markov Decision Processes with Multiple Constraints

URL: http://arxiv.org/abs/2503.06751v1
Date: Sun, 09 Mar 2025 20:10:35 GMT
Title: Primal-Dual Sample Complexity Bounds for Constrained Markov Decision Processes with Multiple Constraints
Authors: Max Buckley, Konstantinos Papathanasiou, Andreas Spanopoulos,
Abstract summary: This paper addresses the challenge of solving Constrained Markov Decision Processes (CMDPs) with $d > 1$ constraints when the transition dynamics are unknown.<n>We propose a model-based algorithm for infinite horizon CMDPs with multiple constraints, aiming to derive and prove sample complexity bounds for learning near-optimal policies.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper addresses the challenge of solving Constrained Markov Decision Processes (CMDPs) with $d > 1$ constraints when the transition dynamics are unknown, but samples can be drawn from a generative model. We propose a model-based algorithm for infinite horizon CMDPs with multiple constraints in the tabular setting, aiming to derive and prove sample complexity bounds for learning near-optimal policies. Our approach tackles both the relaxed and strict feasibility settings, where relaxed feasibility allows some constraint violations, and strict feasibility requires adherence to all constraints. The main contributions include the development of the algorithm and the derivation of sample complexity bounds for both settings. For the relaxed feasibility setting we show that our algorithm requires $\tilde{\mathcal{O}} \left( \frac{d |\mathcal{S}| |\mathcal{A}| \log(1/\delta)}{(1-\gamma)^3\epsilon^2} \right)$ samples to return $\epsilon$-optimal policy, while in the strict feasibility setting it requires $\tilde{\mathcal{O}} \left( \frac{d^3 |\mathcal{S}| |\mathcal{A}| \log(1/\delta)}{(1-\gamma)^5\epsilon^2{\zeta_{\mathbf{c}}^*}^2} \right)$ samples.

Related papers

Sample Complexity Bounds for Linear Constrained MDPs with a Generative Model [16.578348944264505]
We consider infinite-horizon $gamma$-discounted (linear) constrained Markov decision processes (CMDPs)<n>The objective is to find a policy that maximizes the expected cumulative reward subject to expected cumulative constraints.<n>We propose a primal-dual framework that can leverage any black-box unconstrained MDP solver.
arXiv Detail & Related papers (2025-07-02T19:07:37Z)
Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Key to our solution is a novel projection technique based on ideas from harmonic analysis. Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z)
An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting. Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function. We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z)
Safe Posterior Sampling for Constrained MDPs with Bounded Constraint Violation [8.849815837266977]
Constrained Markov decision processes (CMDPs) model scenarios of sequential decision making with multiple objectives that are increasingly important in many applications. We propose the Safe PSRL (posterior sampling-based RL) algorithm that does not need such assumptions and yet performs very well. We establish a sub-linear $tildemathcal Oleft(H2.5 sqrt|mathcalS|2 |mathcalA| K right)$ upper bound on the Bayesian reward objective regret along with a bounded, i.
arXiv Detail & Related papers (2023-01-27T06:18:25Z)
Penalized Overdamped and Underdamped Langevin Monte Carlo Algorithms for Constrained Sampling [17.832449046193933]
We consider the constrained sampling problem where the goal is to sample from a target distribution of $pi(x)prop ef(x)$ when $x$ is constrained. Motivated by penalty methods, we convert the constrained problem into an unconstrained sampling problem by introducing a penalty function for constraint violations. For PSGLD and PSGULMC, when $tildemathcalO(d/varepsilon18)$ is strongly convex and smooth, we obtain $tildemathcalO(d/varepsilon
arXiv Detail & Related papers (2022-11-29T18:43:22Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
Near-Optimal Sample Complexity Bounds for Constrained MDPs [25.509556551558834]
We provide minimax upper and lower bounds on the sample complexity for learning near-optimal policies in a discounted CMDP. We show that learning CMDPs is as easy as MDPs when small constraint violations are allowed, but inherently more difficult when we demand zero constraint violation.
arXiv Detail & Related papers (2022-06-13T15:58:14Z)
Settling the Sample Complexity of Model-Based Offline Reinforcement Learning [50.5790774201146]
offline reinforcement learning (RL) learns using pre-collected data without further exploration. Prior algorithms or analyses either suffer from suboptimal sample complexities or incur high burn-in cost to reach sample optimality. We demonstrate that the model-based (or "plug-in") approach achieves minimax-optimal sample complexity without burn-in cost.
arXiv Detail & Related papers (2022-04-11T17:26:19Z)
Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach [37.80609997145897]
Reinforcement learning is widely used in applications where one needs to perform sequential decisions while interacting with the environment. The problem becomes more challenging when the decision requirement includes satisfying some safety constraints. Various algorithms are available to solve CMDP problems in a model-free manner to achieve $epsilon$-optimal cumulative reward with $epsilon$ feasible policies. An important question here is whether we can achieve $epsilon$-optimal cumulative reward with zero constraint violations or not.
arXiv Detail & Related papers (2021-09-13T21:27:03Z)
Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model [3.749193647980305]
This paper considers a Markov decision process (MDP) that admits a set of state-action features. We show that a model-based approach (resp.$$Q-learning) provably learns an $varepsilon$-optimal policy with high probability.
arXiv Detail & Related papers (2021-05-28T17:49:39Z)
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy. For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP) We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator) We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$. We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.