Related papers: RL in Markov Games with Independent Function Approximation: Improved Sample Complexity Bound under the Local Access Model

RL in Markov Games with Independent Function Approximation: Improved Sample Complexity Bound under the Local Access Model

URL: http://arxiv.org/abs/2403.11544v2
Date: Wed, 20 Mar 2024 01:22:11 GMT
Title: RL in Markov Games with Independent Function Approximation: Improved Sample Complexity Bound under the Local Access Model
Authors: Junyi Fan, Yuxuan Han, Jialin Zeng, Jian-Feng Cai, Yang Wang, Yang Xiang, Jiheng Zhang,
Abstract summary: We introduce a new algorithm, Lin-Confident-FTRL, for learning coarse correlated equilibria (CCE) with local access to the simulator. Up to a logarithmic dependence on the size of the state space, Lin-Confident-FTRL learns $epsilon$-CCE with a provable optimal accuracy bound $O(epsilon-2)$. Our analysis generalizes the virtual policy bounds in the single-agent local planning literature.
Score: 15.596599935486534
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Efficiently learning equilibria with large state and action spaces in general-sum Markov games while overcoming the curse of multi-agency is a challenging problem. Recent works have attempted to solve this problem by employing independent linear function classes to approximate the marginal $Q$-value for each agent. However, existing sample complexity bounds under such a framework have a suboptimal dependency on the desired accuracy $\varepsilon$ or the action space. In this work, we introduce a new algorithm, Lin-Confident-FTRL, for learning coarse correlated equilibria (CCE) with local access to the simulator, i.e., one can interact with the underlying environment on the visited states. Up to a logarithmic dependence on the size of the state space, Lin-Confident-FTRL learns $\epsilon$-CCE with a provable optimal accuracy bound $O(\epsilon^{-2})$ and gets rids of the linear dependency on the action space, while scaling polynomially with relevant problem parameters (such as the number of agents and time horizon). Moreover, our analysis of Linear-Confident-FTRL generalizes the virtual policy iteration technique in the single-agent local planning literature, which yields a new computationally efficient algorithm with a tighter sample complexity bound when assuming random access to the simulator.

Related papers

Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation [24.299769025346368]
We study the reinforcement learning problem in a constrained decision process (CMDP) We propose an RL algorithm for linear CMDPs that achieves $tildemathcalO(sqrtK)$ regret with an episode-wise zero-violation guarantee. Our results significantly improve upon recent linear CMDP algorithms, which either violate the constraint or incur exponential computational costs.
arXiv Detail & Related papers (2025-02-14T13:07:25Z)
Sample and Oracle Efficient Reinforcement Learning for MDPs with Linearly-Realizable Value Functions [10.225358400539719]
We present an efficient reinforcement algorithm for Decision (MDPs) where a linear-action-action generalizes in a feature map. Specifically, we introduce a new algorithm that efficiently finds a near-optimal policy in this setting.
arXiv Detail & Related papers (2024-09-07T14:38:05Z)
The Power of Resets in Online Reinforcement Learning [73.64852266145387]
We explore the power of simulators through online reinforcement learning with local simulator access (or, local planning) We show that MDPs with low coverability can be learned in a sample-efficient fashion with only $Qstar$-realizability. We show that the notorious Exogenous Block MDP problem is tractable under local simulator access.
arXiv Detail & Related papers (2024-04-23T18:09:53Z)
Learning RL-Policies for Joint Beamforming Without Exploration: A Batch Constrained Off-Policy Approach [1.0080317855851213]
We consider the problem of network parameter cancellation optimization for networks. We show that deploying an algorithm in the real world for exploration and learning can be achieved with the data without exploring.
arXiv Detail & Related papers (2023-10-12T18:36:36Z)
Breaking the Curse of Multiagency: Provably Efficient Decentralized Multi-Agent RL with Function Approximation [44.051717720483595]
This paper presents the first line of MARL algorithms that provably resolve the curse of multiagency approximation. In exchange for learning a weaker version of CCEs, this algorithm applies to a wider range of problems under generic function approximation. Our algorithm always outputs Markov CCEs, and an optimal rate of $widetildemathcalO(epsilon-2)$ for finding $epsilon$-optimal solutions.
arXiv Detail & Related papers (2023-02-13T18:59:25Z)
Near-optimal Policy Identification in Active Reinforcement Learning [84.27592560211909]
AE-LSVI is a novel variant of the kernelized least-squares value RL (LSVI) algorithm that combines optimism with pessimism for active exploration. We show that AE-LSVI outperforms other algorithms in a variety of environments when robustness to the initial state is required.
arXiv Detail & Related papers (2022-12-19T14:46:57Z)
On the Convergence of Heterogeneous Federated Learning with Arbitrary Adaptive Online Model Pruning [15.300983585090794]
We present a unifying framework for heterogeneous FL algorithms with em arbitrary adaptive online model pruning. In particular, we prove that under certain sufficient conditions, these algorithms converge to a stationary point of standard FL for general smooth cost functions. We illuminate two key factors impacting convergence: pruning-induced noise and minimum coverage index.
arXiv Detail & Related papers (2022-01-27T20:43:38Z)
Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning [82.31436758872715]
We develop an algorithm that achieves the same PAC guarantee while using only $O(1)$ episodes of environment interactions. We establish a connection between value functions in discounted and finite-horizon Markov decision processes.
arXiv Detail & Related papers (2021-11-01T00:21:24Z)
Online Sub-Sampling for Reinforcement Learning with General Function Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm. For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times. In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z)
Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting [60.98700344526674]
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning. In this paper, we investigate a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner. We develop an algorithm tailored to this setting, achieving a sample complexity that scales practicallyly with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space.
arXiv Detail & Related papers (2021-05-17T17:22:07Z)
Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error. We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.