Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning
Algorithms
- URL: http://arxiv.org/abs/2109.12286v1
- Date: Sat, 25 Sep 2021 06:18:41 GMT
- Title: Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning
Algorithms
- Authors: Liyuan Zheng, Tanner Fiez, Zane Alumbaugh, Benjamin Chasnov and
Lillian J. Ratliff
- Abstract summary: hierarchical interaction between the actor and critic in actor-critic based reinforcement learning algorithms naturally lends itself to a game-theoretic interpretation.
We propose a meta-framework for Stackelberg actor-critic algorithms where the leader player follows the total derivative of its objective instead of the usual individual gradient.
Experiments on OpenAI gym environments show that Stackelberg actor-critic algorithms always perform at least as well and often significantly outperform the standard actor-critic algorithm counterparts.
- Score: 13.649494534428745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The hierarchical interaction between the actor and critic in actor-critic
based reinforcement learning algorithms naturally lends itself to a
game-theoretic interpretation. We adopt this viewpoint and model the actor and
critic interaction as a two-player general-sum game with a leader-follower
structure known as a Stackelberg game. Given this abstraction, we propose a
meta-framework for Stackelberg actor-critic algorithms where the leader player
follows the total derivative of its objective instead of the usual individual
gradient. From a theoretical standpoint, we develop a policy gradient theorem
for the refined update and provide a local convergence guarantee for the
Stackelberg actor-critic algorithms to a local Stackelberg equilibrium. From an
empirical standpoint, we demonstrate via simple examples that the learning
dynamics we study mitigate cycling and accelerate convergence compared to the
usual gradient dynamics given cost structures induced by actor-critic
formulations. Finally, extensive experiments on OpenAI gym environments show
that Stackelberg actor-critic algorithms always perform at least as well and
often significantly outperform the standard actor-critic algorithm
counterparts.
Related papers
- Learning a Diffusion Model Policy from Rewards via Q-Score Matching [93.0191910132874]
We present a theoretical framework linking the structure of diffusion model policies to a learned Q-function.
We propose a new policy update method from this theory, which we denote Q-score matching.
arXiv Detail & Related papers (2023-12-18T23:31:01Z) - Stackelberg Batch Policy Learning [3.5426153040167754]
Batch reinforcement learning (RL) defines the task of learning from a fixed batch of data lacking exhaustive exploration.
Worst-case optimality algorithms, which calibrate a value-function model class from logged experience, have emerged as a promising paradigm for batch RL.
We propose a novel gradient-based learning algorithm: StackelbergLearner, in which the leader player updates according to the total derivative of its objective instead of the usual individual gradient.
arXiv Detail & Related papers (2023-09-28T06:18:34Z) - Decision-Aware Actor-Critic with Function Approximation and Theoretical
Guarantees [12.259191000019033]
Actor-critic (AC) methods are widely used in reinforcement learning (RL)
We design a joint objective for training the actor and critic in a decision-aware fashion.
We empirically demonstrate the benefit of our decision-aware actor-critic framework on simple RL problems.
arXiv Detail & Related papers (2023-05-24T15:34:21Z) - Follower Agnostic Methods for Stackelberg Games [14.143502615941648]
We present an efficient algorithm to solve online Stackelberg games, featuring multiple followers, in a follower-agnostic manner.
Our approach works even when leader has no knowledge about the followers' utility functions or strategy space.
arXiv Detail & Related papers (2023-02-02T21:21:14Z) - Differentiable Bilevel Programming for Stackelberg Congestion Games [47.60156422249365]
In a Stackelberg congestion game (SCG), a leader aims to maximize their own gain by anticipating and manipulating the equilibrium state at which the followers settle by playing a congestion game.
Here, we attempt to tackle this computational challenge by marrying traditional methodologies with the latest differentiable programming techniques in machine learning.
We propose two new local search algorithms for SCGs. The first is a gradient descent algorithm that obtains the derivatives by unrolling ILD via differentiable programming.
The second algorithm adds a twist by cutting short the followers' evolution trajectory.
arXiv Detail & Related papers (2022-09-15T21:32:23Z) - Learning in Stackelberg Games with Non-myopic Agents [60.927889817803745]
We study Stackelberg games where a principal repeatedly interacts with a non-myopic long-lived agent, without knowing the agent's payoff function.
We provide a general framework that reduces learning in presence of non-myopic agents to robust bandit optimization in the presence of myopic agents.
arXiv Detail & Related papers (2022-08-19T15:49:30Z) - Analysis of a Target-Based Actor-Critic Algorithm with Linear Function
Approximation [2.1592777170316366]
Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning.
We bridge this gap by proposing the first theoretical analysis of an online target-based actor-critic with linear function approximation in the discounted reward setting.
arXiv Detail & Related papers (2021-06-14T14:59:05Z) - Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games [78.65798135008419]
It remains vastly open how to learn the Stackelberg equilibrium in general-sum games efficiently from samples.
This paper initiates the theoretical study of sample-efficient learning of the Stackelberg equilibrium in two-player turn-based general-sum games.
arXiv Detail & Related papers (2021-02-23T05:11:07Z) - A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms [81.01917016753644]
We investigate the discounting mismatch in actor-critic algorithm implementations from a representation learning perspective.
Theoretically, actor-critic algorithms usually have discounting for both actor and critic, i.e., there is a $gammat$ term in the actor update for the transition observed at time $t$ in a trajectory.
Practitioners, however, usually ignore the discounting ($gammat$) for the actor while using a discounted critic.
arXiv Detail & Related papers (2020-10-02T15:51:48Z) - Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy [122.01837436087516]
We study the global convergence and global optimality of actor-critic, one of the most popular families of reinforcement learning algorithms.
We establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the first time.
arXiv Detail & Related papers (2020-08-02T14:01:49Z) - Follow the Neurally-Perturbed Leader for Adversarial Training [0.0]
We propose a novel leader algorithm for zeros-sum training to mixed equilibrium without behaviors without perturbations.
We validate our theoretical results by applying this training algorithm to games with convex and non-perturbed loss as well as generative adversarial architectures.
We customize the implementation of this algorithm for adversarial imitation learning applications.
arXiv Detail & Related papers (2020-02-16T00:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.