Related papers: Adaptive Resolving Methods for Reinforcement Learning with Function Approximations

Adaptive Resolving Methods for Reinforcement Learning with Function Approximations

URL: http://arxiv.org/abs/2505.12037v1
Date: Sat, 17 May 2025 14:59:15 GMT
Title: Adaptive Resolving Methods for Reinforcement Learning with Function Approximations
Authors: Jiashuo Jiang, Yiming Zong, Yinyu Ye,
Abstract summary: We develop a new algorithm to solve theReinforcement learning problems with function approximation.<n>Our algorithm is based on the linear programming (LP) reformulation and it resolves the LP at each improved with new data arrival.<n>In comparison to the $O(1/sqrtN)$ worst-case guarantee established in the previous literature, our instance-dependent guarantee is tighter when the underlying instance is favorable.
Score: 4.168629519090361
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) problems are fundamental in online decision-making and have been instrumental in finding an optimal policy for Markov decision processes (MDPs). Function approximations are usually deployed to handle large or infinite state-action space. In our work, we consider the RL problems with function approximation and we develop a new algorithm to solve it efficiently. Our algorithm is based on the linear programming (LP) reformulation and it resolves the LP at each iteration improved with new data arrival. Such a resolving scheme enables our algorithm to achieve an instance-dependent sample complexity guarantee, more precisely, when we have $N$ data, the output of our algorithm enjoys an instance-dependent $\tilde{O}(1/N)$ suboptimality gap. In comparison to the $O(1/\sqrt{N})$ worst-case guarantee established in the previous literature, our instance-dependent guarantee is tighter when the underlying instance is favorable, and the numerical experiments also reveal the efficient empirical performances of our algorithms.

Related papers

Span-Agnostic Optimal Sample Complexity and Oracle Inequalities for Average-Reward RL [6.996002801232415]
We study the sample complexity of finding an $varepsilon$-optimal policy in Markov Decision Processes (MDPs) with a generative model.<n>We develop the first algorithms matching the optimal span-based complexity without $H$ knowledge.
arXiv Detail & Related papers (2025-02-16T19:10:55Z)
Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation [24.299769025346368]
We study the reinforcement learning problem in a constrained decision process (CMDP)<n>We propose an RL algorithm for linear CMDPs that achieves $tildemathcalO(sqrtK)$ regret with an episode-wise zero-violation guarantee.<n>Our results significantly improve upon recent linear CMDP algorithms, which either violate the constraint or incur exponential computational costs.
arXiv Detail & Related papers (2025-02-14T13:07:25Z)
Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy. Many algorithms for IRL have an inherently nested structure. We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z)
Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation [16.128778192359327]
We learn an RL agent with the historical data obtained from the source environment and optimize it to perform well in the perturbed one. We prove our algorithm can achieve the suboptimality of $O(sqrtK)$ depending on the linear function dimension $d$.
arXiv Detail & Related papers (2022-09-14T13:17:59Z)
Instance-Dependent Confidence and Early Stopping for Reinforcement Learning [99.57168572237421]
Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure. This research provides guarantees that explain textitex post the performance differences observed. A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice.
arXiv Detail & Related papers (2022-01-21T04:25:35Z)
A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states. We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z)
Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation [92.3161051419884]
We study reinforcement learning with linear function approximation. Existing algorithms only have high-probability regret and/or Probably Approximately Correct (PAC) sample complexity guarantees. We propose a new algorithm called FLUTE, which enjoys uniform-PAC convergence to the optimal policy with high probability.
arXiv Detail & Related papers (2021-06-22T08:48:56Z)
Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.