Related papers: A Ranking Game for Imitation Learning

A Ranking Game for Imitation Learning

URL: http://arxiv.org/abs/2202.03481v1
Date: Mon, 7 Feb 2022 19:38:22 GMT
Title: A Ranking Game for Imitation Learning
Authors: Harshit Sikchi, Akanksha Saran, Wonjoon Goo, Scott Niekum
Abstract summary: We treat imitation as a two-player ranking-based Stackelberg game between a $textitpolicy$ and a $textitreward$ function. This game encompasses a large subset of both inverse reinforcement learning (IRL) methods and methods which learn from offline preferences. We theoretically analyze the requirements of the loss function used for ranking policy performances to facilitate near-optimal imitation learning at equilibrium.
Score: 22.028680861819215
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a new framework for imitation learning - treating imitation as a two-player ranking-based Stackelberg game between a $\textit{policy}$ and a $\textit{reward}$ function. In this game, the reward agent learns to satisfy pairwise performance rankings within a set of policies, while the policy agent learns to maximize this reward. This game encompasses a large subset of both inverse reinforcement learning (IRL) methods and methods which learn from offline preferences. The Stackelberg game formulation allows us to use optimization methods that take the game structure into account, leading to more sample efficient and stable learning dynamics compared to existing IRL methods. We theoretically analyze the requirements of the loss function used for ranking policy performances to facilitate near-optimal imitation learning at equilibrium. We use insights from this analysis to further increase sample efficiency of the ranking game by using automatically generated rankings or with offline annotated rankings. Our experiments show that the proposed method achieves state-of-the-art sample efficiency and is able to solve previously unsolvable tasks in the Learning from Observation (LfO) setting.

Related papers

Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching [23.600285251963395]
In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment. Traditionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and a learner optimize the reward through repeated RL procedures. We propose a novel approach to IRL by direct policy optimization, exploiting a linear factorization of the return as the inner product of successor features and a reward vector.
arXiv Detail & Related papers (2024-11-11T14:05:50Z)
Finite-Sample Guarantees for Best-Response Learning Dynamics in Zero-Sum Matrix Games [22.380293155135096]
We study best-response type learning dynamics for two player zero-sum matrix games. We consider two settings that are distinguished by the type of information that each player has about the game and their opponent's strategy.
arXiv Detail & Related papers (2024-07-29T15:56:49Z)
Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from Imperfect Demonstration for Interactive Recommendation [23.048841953423846]
We focus on the problem of learning to reward, which is fundamental to reinforcement learning. Previous approaches either introduce additional procedures for learning to reward, thereby increasing the complexity of optimization. We propose a novel batch inverse reinforcement learning paradigm that achieves the desired properties.
arXiv Detail & Related papers (2023-10-30T13:43:20Z)
MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning [62.065503126104126]
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes. This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people. We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents.
arXiv Detail & Related papers (2023-04-10T15:44:50Z)
ApproxED: Approximate exploitability descent via learned best responses [61.17702187957206]
We study the problem of finding an approximate Nash equilibrium of games with continuous action sets. We propose two new methods that minimize an approximation of exploitability with respect to the strategy profile.
arXiv Detail & Related papers (2023-01-20T23:55:30Z)
No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning. We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures. Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z)
Markov Cricket: Using Forward and Inverse Reinforcement Learning to Model, Predict And Optimize Batting Performance in One-Day International Cricket [0.8122270502556374]
We model one-day international cricket games as Markov processes, applying forward and inverse Reinforcement Learning (RL) to develop three novel tools for the game. We show that, when used as a proxy for remaining scoring resources, this approach outperforms the state-of-the-art Duckworth-Lewis-Stern method by 3 to 10 fold. We envisage our prediction and simulation techniques may provide a fairer alternative for estimating final scores in interrupted games, while the inferred reward model may provide useful insights for the professional game to optimize playing strategy.
arXiv Detail & Related papers (2021-03-07T13:11:16Z)
Provable Fictitious Play for General Mean-Field Games [111.44976345867005]
We propose a reinforcement learning algorithm for stationary mean-field games. The goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium.
arXiv Detail & Related papers (2020-10-08T18:46:48Z)
Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification. Our strategy enables important aspects of the base learner objective to be learned during meta-training. We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z)
Efficient Competitive Self-Play Policy Optimization [20.023522000925094]
We propose a new algorithmic framework for competitive self-play reinforcement learning in two-player zero-sum games. Our method trains several agents simultaneously, and intelligently takes each other as opponent based on simple adversarial rules. We prove theoretically that our algorithm converges to an approximate equilibrium with high probability in convex-concave games.
arXiv Detail & Related papers (2020-09-13T21:01:38Z)
Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents. Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z)
Single-Agent Optimization Through Policy Iteration Using Monte-Carlo Tree Search [8.22379888383833]
Combination of Monte-Carlo Tree Search (MCTS) and deep reinforcement learning is state-of-the-art in two-player perfect-information games. We describe a search algorithm that uses a variant of MCTS which we enhanced by 1) a novel action value normalization mechanism for games with potentially unbounded rewards, 2) defining a virtual loss function that enables effective search parallelization, and 3) a policy network, trained by generations of self-play, to guide the search.
arXiv Detail & Related papers (2020-05-22T18:02:36Z)
Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay. During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well. Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.