Related papers: Distributionally Robust Online Markov Game with Linear Function Approximation

Distributionally Robust Online Markov Game with Linear Function Approximation

URL: http://arxiv.org/abs/2511.07831v1
Date: Wed, 12 Nov 2025 01:22:42 GMT
Title: Distributionally Robust Online Markov Game with Linear Function Approximation
Authors: Zewu Zheng, Yuanyuan Lin,
Abstract summary: Sim-to-real gap, where agents trained in a simulator face significant performance degradation during testing, is a fundamental challenge in reinforcement learning.<n>We devise algorithms that are sample efficient with interactive data collection and large state spaces.<n>Our work introduces the first sample-efficient algorithm for this setting, matches the best result so far in single agent setting, and achieves minimax optimalsample complexity.
Score: 2.4636535146231613
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The sim-to-real gap, where agents trained in a simulator face significant performance degradation during testing, is a fundamental challenge in reinforcement learning. Extansive works adopt the framework of distributionally robust RL, to learn a policy that acts robustly under worst case environment shift. Within this framework, our objective is to devise algorithms that are sample efficient with interactive data collection and large state spaces. By assuming d-rectangularity of environment dynamic shift, we identify a fundamental hardness result for learning in online Markov game, and address it by adopting minimum value assumption. Then, a novel least square value iteration type algorithm, DR-CCE-LSI, with exploration bonus devised specifically for multiple agents, is proposed to find an \episilon-approximate robust Coarse Correlated Equilibrium(CCE). To obtain sample efficient learning, we find that: when the feature mapping function satisfies certain properties, our algorithm, DR-CCE-LSI, is able to achieve ε-approximate CCE with a regret bound of O{dHmin{H,1/min{σ_i}}\sqrt{K}}, where K is the number of interacting episodes, H is the horizon length, d is the feature dimension, and \simga_i represents the uncertainty level of player i. Our work introduces the first sample-efficient algorithm for this setting, matches the best result so far in single agent setting, and achieves minimax optimalsample complexity in terms of the feature dimension d. Meanwhile, we also conduct simulation study to validate the efficacy of our algorithm in learning a robust equilibrium.

Related papers

XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning [26.063477716451512]
We introduce XQC: a well-motivated, sample-efficient deep actor-critic algorithm built upon soft actor-critic.<n>We achieve state-of-the-art sample efficiency across 55 proprioception and 15 vision-based continuous control tasks.
arXiv Detail & Related papers (2025-09-29T17:58:53Z)
Offline RL via Feature-Occupancy Gradient Ascent [9.983014605039658]
We study offline Reinforcement Learning in large infinite-horizon discounted Markov Decision Processes (MDPs) We develop a new algorithm that performs a form of gradient ascent in the space of feature occupancies. We show that the resulting simple algorithm satisfies strong computational and sample complexity guarantees.
arXiv Detail & Related papers (2024-05-22T15:39:05Z)
Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm [14.517103323409307]
Sim-to-real gap represents disparity between training and testing environments. A promising approach to addressing this challenge is distributionally robust RL. We tackle robust RL via interactive data collection and present an algorithm with a provable sample complexity guarantee.
arXiv Detail & Related papers (2024-04-04T16:40:22Z)
Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z)
Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation. Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions. We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z)
Risk-Aware Distributed Multi-Agent Reinforcement Learning [8.287693091673658]
We develop a distributed MARL approach to solve decision-making problems in unknown environments by learning risk-aware actions. We then propose a distributed MARL algorithm called the CVaR QD-Learning algorithm, and establish that value functions of individual agents reaches consensus.
arXiv Detail & Related papers (2023-04-04T17:56:44Z)
Sample Efficient Deep Reinforcement Learning via Local Planning [21.420851589712626]
This work focuses on sample-efficient deep reinforcement learning (RL) with a simulator. We propose an algorithmic framework, named uncertainty-first local planning (UFLP), that takes advantage of this property. We demonstrate that this simple procedure can dramatically improve the sample cost of several baseline RL algorithms on difficult exploration tasks.
arXiv Detail & Related papers (2023-01-29T23:17:26Z)
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation. We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z)
On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function. We tackle this problem under the context of function approximation, leveraging powerful function approximators. We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z)
Learning the Linear Quadratic Regulator from Nonlinear Observations [135.66883119468707]
We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR. In our setting, the environment is summarized by a low-dimensional continuous latent state with linear dynamics and quadratic costs. Our results constitute the first provable sample complexity guarantee for continuous control with an unknown nonlinearity in the system model and general function approximation.
arXiv Detail & Related papers (2020-10-08T07:02:47Z)
Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function. We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.