Related papers: Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning

Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning

URL: http://arxiv.org/abs/2505.17610v1
Date: Fri, 23 May 2025 08:18:35 GMT
Title: Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning
Authors: Till Freihaut, Luca Viano, Volkan Cevher, Matthieu Geist, Giorgia Ramponi,
Abstract summary: We show that a new quantity named the single policy deviation concentrability coefficient is unavoidable in the non-interactive imitation learning setting.<n>We introduce two novel solution algorithms: MAIL-BRO and MURMAIL.<n>The latter bypasses completely the best response oracle at the cost of a worse expert query complexity of order $mathcalO(varepsilon-8)$.
Score: 69.45910671974296
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper provides the first expert sample complexity characterization for learning a Nash equilibrium from expert data in Markov Games. We show that a new quantity named the single policy deviation concentrability coefficient is unavoidable in the non-interactive imitation learning setting, and we provide an upper bound for behavioral cloning (BC) featuring such coefficient. BC exhibits substantial regret in games with high concentrability coefficient, leading us to utilize expert queries to develop and introduce two novel solution algorithms: MAIL-BRO and MURMAIL. The former employs a best response oracle and learns an $\varepsilon$-Nash equilibrium with $\mathcal{O}(\varepsilon^{-4})$ expert and oracle queries. The latter bypasses completely the best response oracle at the cost of a worse expert query complexity of order $\mathcal{O}(\varepsilon^{-8})$. Finally, we provide numerical evidence, confirming our theoretical findings.

Related papers

Matching Multiple Experts: On the Exploitability of Multi-Agent Imitation Learning [51.77462571479799]
Multi-agent imitation learning (MA-IL) aims to learn optimal policies from expert demonstrations of interactions in multi-agent interactive domains.<n>Despite existing guarantees on the performance of the resulting learned policies, characterizations of how far the learned polices are from a Nash equilibrium are missing for offline MA-IL.
arXiv Detail & Related papers (2026-02-24T15:38:11Z)
Rate optimal learning of equilibria from data [63.14746189846806]
We close theoretical gaps in Multi-Agent Imitation Learning (MAIL) by characterizing the limits of non-interactive MAIL and presenting the first interactive algorithm with near-optimal sample complexity.<n>For the interactive setting, we introduce a framework that combines reward-free reinforcement learning with interactive MAIL and instantiate it with an algorithm, MAIL-WARM.<n>We provide numerical results that support our theory and illustrate, in environments such as grid worlds, where Behavior Cloning fails to learn.
arXiv Detail & Related papers (2025-10-10T12:28:35Z)
Computational Lower Bounds for Regret Minimization in Normal-Form Games [68.66209476382213]
We provide evidence that existing learning algorithms, such as multiplicative weights update, are close to optimal. Our results are obtained in the algorithmic framework put forward by Kothari and Mehta.
arXiv Detail & Related papers (2024-11-04T00:39:52Z)
Barriers to Welfare Maximization with No-Regret Learning [68.66209476382213]
We prove lower bounds for computing a near-optimal $T$-sparse CCE. In particular, we show that the inapproximability of maximum clique precludes attaining any non-trivial sparsity in time.
arXiv Detail & Related papers (2024-11-04T00:34:56Z)
Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization [65.8915778873691]
conditional distributions is a central problem in machine learning.<n>We propose a new paradigm that integrates both paired and unpaired data.<n>We show that our approach can theoretically recover true conditional distributions with arbitrarily small error.
arXiv Detail & Related papers (2024-10-03T16:12:59Z)
Last-Iterate Convergence of Payoff-Based Independent Learning in Zero-Sum Stochastic Games [31.554420227087043]
We develop learning dynamics that are payoff-based, convergent, rational, and symmetric between the two players. In the matrix game setting, the results imply a complexity of $O(epsilon-1)$ to find the Nash distribution. In the game setting, the results also imply a complexity of $O(epsilon-8)$ to find a Nash equilibrium.
arXiv Detail & Related papers (2024-09-02T20:07:25Z)
Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions. We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z)
A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation [16.29514743112387]
We study sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearlyrealizable. We present a statistically and computationally efficient algorithm (Delphi) for blending exploration with expert queries. Delphi requires $tildemathcalO(d)$ expert queries and a $textttpoly(d,|mathcalA|,1/varepsilon)$ amount of exploratory samples to provably recover an $varepsilon$suboptimal policy.
arXiv Detail & Related papers (2022-07-18T01:39:13Z)
DASHA: Distributed Nonconvex Optimization with Communication Compression, Optimal Oracle Complexity, and No Client Synchronization [77.34726150561087]
We develop and analyze DASHA: a new family of methods for noneps distributed optimization problems. Unlike MARINA, the new methods DASHA, DASHA-MVR send compressed vectors only and never synchronize the nodes, which makes them more practical for learning.
arXiv Detail & Related papers (2022-02-02T20:10:40Z)
Permutation Compressors for Provably Faster Distributed Nonconvex Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity. Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.