Provably Efficient Adversarial Imitation Learning with Unknown
Transitions
- URL: http://arxiv.org/abs/2306.06563v1
- Date: Sun, 11 Jun 2023 02:46:41 GMT
- Title: Provably Efficient Adversarial Imitation Learning with Unknown
Transitions
- Authors: Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo
- Abstract summary: Imitation learning (IL) has proven to be an effective method for learning good policies from expert demonstrations.
This paper explores the theoretical underpinnings of AIL in the presence of unknown transitions.
We propose an algorithm, MB-TAIL, that achieves the minimax optimal expert sample complexity of $widetildeO (H3/2 |S|/varepsilon)$ and interaction complexity of $widetildeO (H3 |S|2 |A|/varepsilon2)$.
- Score: 24.70187647541753
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Imitation learning (IL) has proven to be an effective method for learning
good policies from expert demonstrations. Adversarial imitation learning (AIL),
a subset of IL methods, is particularly promising, but its theoretical
foundation in the presence of unknown transitions has yet to be fully
developed. This paper explores the theoretical underpinnings of AIL in this
context, where the stochastic and uncertain nature of environment transitions
presents a challenge. We examine the expert sample complexity and interaction
complexity required to recover good policies. To this end, we establish a
framework connecting reward-free exploration and AIL, and propose an algorithm,
MB-TAIL, that achieves the minimax optimal expert sample complexity of
$\widetilde{O} (H^{3/2} |S|/\varepsilon)$ and interaction complexity of
$\widetilde{O} (H^{3} |S|^2 |A|/\varepsilon^2)$. Here, $H$ represents the
planning horizon, $|S|$ is the state space size, $|A|$ is the action space
size, and $\varepsilon$ is the desired imitation gap. MB-TAIL is the first
algorithm to achieve this level of expert sample complexity in the unknown
transition setting and improves upon the interaction complexity of the
best-known algorithm, OAL, by $O(H)$. Additionally, we demonstrate the
generalization ability of MB-TAIL by extending it to the function approximation
setting and proving that it can achieve expert sample and interaction
complexity independent of $|S|$
Related papers
- MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier [56.250921274032066]
MOOSE-Star is a unified framework enabling tractable training and scalable inference.<n>TOMATO-Star is a dataset of 108717 decomposed papers (38,400 GPU hours) for training.
arXiv Detail & Related papers (2026-03-04T06:11:18Z) - Sample-Adaptivity Tradeoff in On-Demand Sampling [27.212536031930643]
We study the tradeoff between sample complexity and round complexity in on-demand sampling, where the learning algorithm adaptively samples from $k$ distributions over a limited number of rounds.<n>We present an algorithm that achieves near-optimal sample complexity of $widetilde O((d + k) / 2)$ within $widetilde O(sqrtk)$ rounds.
arXiv Detail & Related papers (2025-11-19T14:59:47Z) - Rate optimal learning of equilibria from data [63.14746189846806]
We close theoretical gaps in Multi-Agent Imitation Learning (MAIL) by characterizing the limits of non-interactive MAIL and presenting the first interactive algorithm with near-optimal sample complexity.<n>For the interactive setting, we introduce a framework that combines reward-free reinforcement learning with interactive MAIL and instantiate it with an algorithm, MAIL-WARM.<n>We provide numerical results that support our theory and illustrate, in environments such as grid worlds, where Behavior Cloning fails to learn.
arXiv Detail & Related papers (2025-10-10T12:28:35Z) - From Generative to Episodic: Sample-Efficient Replicable Reinforcement Learning [18.59097820898079]
In batch settings where data comes from a fixed i.i.d. source, the design of data-efficient replicable algorithms is more or less understood.<n>We show that exploration is not a significant barrier to replicable learning!<n>Our main result is a replicable RL algorithm on $tildeO(S2A)$ samples, bridging the gap between the generative and episodic settings.
arXiv Detail & Related papers (2025-07-16T05:43:46Z) - Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning [15.46907000938726]
We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL)
We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC.
We evaluate our proposed method on multiple parallel RL environments, including a deep exploration problem (textiti.e., $N$-chain), a video game, and a real-world problem in energy systems.
arXiv Detail & Related papers (2024-04-16T17:01:38Z) - Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge [0.704590071265998]
We study the sample complexity of online Q-learning methods when some prior knowledge about the dynamics is available or can be learned efficiently.
We present an optimistic Q-learning algorithm that achieves $tildemathcalO(textPoly(H)sqrtSAT)$ regret under perfect knowledge of $f$.
arXiv Detail & Related papers (2023-12-19T19:53:58Z) - Faster Stochastic Variance Reduction Methods for Compositional MiniMax
Optimization [50.10952609321302]
compositional minimax optimization is a pivotal challenge across various machine learning domains.
Current methods of compositional minimax optimization are plagued by sub-optimal complexities or heavy reliance on sizable batch sizes.
This paper introduces a novel method, called Nested STOchastic Recursive Momentum (NSTORM), which can achieve the optimal sample complexity of $O(kappa3 /epsilon3 )$.
arXiv Detail & Related papers (2023-08-18T14:57:21Z) - Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs [60.40452803295326]
We propose a new reward-free algorithm for learning linear Markov decision processes (MDPs)
At the core of our algorithm is uncertainty-weighted value-targeted regression with exploration-driven pseudo-reward.
We show that our algorithm only needs to explore $tilde O( d2varepsilon-2)$ episodes to find an $varepsilon$-optimal policy.
arXiv Detail & Related papers (2023-03-17T17:53:28Z) - A Few Expert Queries Suffices for Sample-Efficient RL with Resets and
Linear Value Approximation [16.29514743112387]
We study sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearlyrealizable.
We present a statistically and computationally efficient algorithm (Delphi) for blending exploration with expert queries.
Delphi requires $tildemathcalO(d)$ expert queries and a $textttpoly(d,|mathcalA|,1/varepsilon)$ amount of exploratory samples to provably recover an $varepsilon$suboptimal policy.
arXiv Detail & Related papers (2022-07-18T01:39:13Z) - Settling the Horizon-Dependence of Sample Complexity in Reinforcement
Learning [82.31436758872715]
We develop an algorithm that achieves the same PAC guarantee while using only $O(1)$ episodes of environment interactions.
We establish a connection between value functions in discounted and finite-horizon Markov decision processes.
arXiv Detail & Related papers (2021-11-01T00:21:24Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Nearly Minimax Optimal Adversarial Imitation Learning with Known and
Unknown Transitions [13.9603281084922]
This paper is dedicated to designing provably efficient adversarial imitation learning (AIL) algorithms that directly optimize policies from expert demonstrations.
We develop a transition-aware AIL algorithm named TAIL with an expert sample complexity of $tildeO(H3/2 |S|/varepsilon)$ under the known transition setting.
In particular, MB-TAIL builds an empirical transition model by interacting with the environment and performs imitation under the recovered empirical model.
arXiv Detail & Related papers (2021-06-19T04:41:33Z) - Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal
Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error.
We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z) - A Provably Efficient Sample Collection Strategy for Reinforcement
Learning [123.69175280309226]
One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior.
We propose to tackle the exploration-exploitation problem following a decoupled approach composed of: 1) An "objective-specific" algorithm that prescribes how many samples to collect at which states, as if it has access to a generative model (i.e., sparse simulator of the environment); 2) An "objective-agnostic" sample collection responsible for generating the prescribed samples as fast as possible.
arXiv Detail & Related papers (2020-07-13T15:17:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.