Related papers: A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs

A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs

URL: http://arxiv.org/abs/2402.04493v2
Date: Sun, 2 Jun 2024 21:38:47 GMT
Title: A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs
Authors: Kihyuk Hong, Ambuj Tewari,
Abstract summary: We propose a primal dual algorithm for offline RL with linear MDPs in the infinite-horizon discounted setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $O(epsilon-2)$ with partial data coverage assumption. We extend our algorithm to work in the offline constrained RL setting that enforces constraints on additional reward signals.
Score: 18.449996575976993
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon discounted setting which aims to learn a policy that maximizes the expected discounted cumulative reward using a pre-collected dataset. Existing algorithms for this setting either require a uniform data coverage assumptions or are computationally inefficient for finding an $\epsilon$-optimal policy with $O(\epsilon^{-2})$ sample complexity. In this paper, we propose a primal dual algorithm for offline RL with linear MDPs in the infinite-horizon discounted setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $O(\epsilon^{-2})$ with partial data coverage assumption. Our work is an improvement upon a recent work that requires $O(\epsilon^{-4})$ samples. Moreover, we extend our algorithm to work in the offline constrained RL setting that enforces constraints on additional reward signals.

Related papers

Offline Constrained Reinforcement Learning under Partial Data Coverage [18.449996575976993]
We study offline constrained reinforcement learning (RL) with general function approximation.<n>We propose an oracle-efficient primal-dual algorithm based on a linear programming (LP) formulation.
arXiv Detail & Related papers (2025-05-23T06:00:01Z)
Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis [18.323002218335215]
This paper investigates a hybrid learning framework for reinforcement learning (RL) in which the agent can leverage both an offline dataset and online interactions to learn the optimal policy.<n>We present a unified algorithm and analysis and show that augmenting confidence-based online RL algorithms with the offline dataset outperforms any pure online or offline algorithm alone.
arXiv Detail & Related papers (2025-05-19T22:58:54Z)
Offline Primal-Dual Reinforcement Learning for Linear MDPs [16.782625445546273]
Offline Reinforcement Learning (RL) aims to learn a near-optimal policy from a fixed dataset of transitions collected by another policy. This paper proposes a primal-dual optimization method based on the linear programming formulation of RL.
arXiv Detail & Related papers (2023-05-22T11:45:23Z)
Offline Reinforcement Learning via Linear-Programming with Error-Bound Induced Constraints [26.008426384903764]
offline reinforcement learning (RL) aims to find an optimal policy for Markov decision processes (MDPs) using a pre-collected dataset. In this work, we revisit the linear programming (LP) reformulation of Markov decision processes for offline RL.
arXiv Detail & Related papers (2022-12-28T15:28:12Z)
On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation [80.86358123230757]
We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI) Under a partial data coverage assumption, BCP-VI yields a fast rate of $tildemathcalO(frac1K)$ for offline RL when there is a positive gap in the optimal Q-value functions. These are the first $tildemathcalO(frac1K)$ bound and absolute zero sub-optimality bound respectively for offline RL with linear function approximation from adaptive data.
arXiv Detail & Related papers (2022-11-23T18:50:44Z)
Settling the Sample Complexity of Model-Based Offline Reinforcement Learning [50.5790774201146]
offline reinforcement learning (RL) learns using pre-collected data without further exploration. Prior algorithms or analyses either suffer from suboptimal sample complexities or incur high burn-in cost to reach sample optimality. We demonstrate that the model-based (or "plug-in") approach achieves minimax-optimal sample complexity without burn-in cost.
arXiv Detail & Related papers (2022-04-11T17:26:19Z)
Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes [61.11090361892306]
Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration. We show that this separation does not exist in the setting of linear MDPs. We develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP.
arXiv Detail & Related papers (2022-01-26T22:09:59Z)
Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes [78.27542864367821]
Reinforcement learning (RL) algorithms can be used to provide personalized services, which rely on users' private and sensitive data. To protect the users' privacy, privacy-preserving RL algorithms are in demand. We propose a novel $(varepsilon, delta)$-LDP algorithm for learning a class of Markov decision processes (MDPs) dubbed linear mixture MDPs.
arXiv Detail & Related papers (2021-10-19T17:44:09Z)
Online Sub-Sampling for Reinforcement Learning with General Function Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm. For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times. In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z)
Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning [59.02541753781001]
This paper initiates the theoretical study of policy finetuning, that is, online RL where the learner has additional access to a "reference policy" We first design a sharp offline reduction algorithm that finds an $varepsilon$ near-optimal policy within $widetildeO(H3SCstar/varepsilon2)$ episodes. We then establish an $Omega(H3SminCstar, A/varepsilon2)$ sample complexity lower bound for any policy finetuning algorithm, including those that can adaptively explore the
arXiv Detail & Related papers (2021-06-09T08:28:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.