Approximating Discontinuous Nash Equilibrial Values of Two-Player
General-Sum Differential Games
- URL: http://arxiv.org/abs/2207.01773v1
- Date: Tue, 5 Jul 2022 02:22:05 GMT
- Title: Approximating Discontinuous Nash Equilibrial Values of Two-Player
General-Sum Differential Games
- Authors: Lei Zhang, Mukesh Ghimire, Wenlong Zhang, Zhe Xu, Yi Ren
- Abstract summary: This paper extends from previous SOTA on zero-sum games with continuous values to general-sum games with discontinuous values, where the discontinuity is caused by that of the players' losses.
We show that due to its lack of convergence proof and generalization analysis on discontinuous losses, the existing self-supervised learning technique fails to generalize and raises safety concerns in an autonomous driving application.
Our solution is to first pre-train the value network on supervised Nash equilibria, and then refine it by minimizing a loss that combines the supervised data with the PDE and boundary conditions.
- Score: 21.291449080239673
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Finding Nash equilibrial policies for two-player differential games requires
solving Hamilton-Jacobi-Isaacs PDEs. Recent studies achieved success in
circumventing the curse of dimensionality in solving such PDEs with underlying
applications to human-robot interactions (HRI), by adopting self-supervised
(physics-informed) neural networks as universal value approximators. This paper
extends from previous SOTA on zero-sum games with continuous values to
general-sum games with discontinuous values, where the discontinuity is caused
by that of the players' losses. We show that due to its lack of convergence
proof and generalization analysis on discontinuous losses, the existing
self-supervised learning technique fails to generalize and raises safety
concerns in an autonomous driving application. Our solution is to first
pre-train the value network on supervised Nash equilibria, and then refine it
by minimizing a loss that combines the supervised data with the PDE and
boundary conditions. Importantly, the demonstrated advantage of the proposed
learning method against purely supervised and self-supervised approaches
requires careful choice of the neural activation function: Among
$\texttt{relu}$, $\texttt{sin}$, and $\texttt{tanh}$, we show that
$\texttt{tanh}$ is the only choice that achieves optimal generalization and
safety performance. Our conjecture is that $\texttt{tanh}$ (similar to
$\texttt{sin}$) allows continuity of value and its gradient, which is
sufficient for the convergence of learning, and at the same time is expressive
enough (similar to $\texttt{relu}$) at approximating discontinuous value
landscapes. Lastly, we apply our method to approximating control policies for
an incomplete-information interaction and demonstrate its contribution to safe
interactions.
Related papers
- HSVI-based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms [31.51588071503617]
We consider a variant of continuous-state partially-observable games with neural perception mechanisms and an asymmetric information structure.
One agent has partial information, while the other agent is assumed to have full knowledge of the state.
We present an efficient online method to compute an $varepsilon$-minimax strategy profile for each agent.
arXiv Detail & Related papers (2024-04-16T15:58:20Z) - On Tractable $Φ$-Equilibria in Non-Concave Games [53.212133025684224]
Non-concave games introduce significant game-theoretic and optimization challenges.
We show that when $Phi$ is finite, there exists an efficient uncoupled learning algorithm that converges to the corresponding $Phi$-equilibria.
We also show that Online Gradient Descent can efficiently approximate $Phi$-equilibria in non-trivial regimes.
arXiv Detail & Related papers (2024-03-13T01:51:30Z) - Pontryagin Neural Operator for Solving Parametric General-Sum Differential Games [24.012924492073974]
We show that a Pontryagin-mode neural operator outperforms the current state-of-the-art hybrid PINN model on safety performance across games with parametric state constraints.
Our key contribution is the introduction of a costate loss defined on the discrepancy between forward and backward costate rollouts.
We show that the costate dynamics, which can reflect state constraint violation, effectively enables the learning of differentiable values with large Lipschitz constants.
arXiv Detail & Related papers (2024-01-03T02:15:32Z) - Value Approximation for Two-Player General-Sum Differential Games with State Constraints [24.012924492073974]
Solving Hamilton-Jacobi-Isaacs (HJI) PDEs numerically enables equilibrial feedback control in two-player differential games, yet faces the curse of dimensionality (CoD)
While physics-informed neural networks (PINNs) have shown promise in alleviating CoD in solving PDEs, vanilla PINNs fall short in learning discontinuous solutions due to their sampling nature.
In this study, we explore three potential solutions to this challenge: (1) a hybrid learning method that is guided by both supervisory equilibria and the HJI PDE, (2) a value-hardening method
arXiv Detail & Related papers (2023-11-28T04:58:41Z) - Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback [75.29048190099523]
Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions.
In this paper, we design a fully adaptive OGD algorithm, textsfAdaOGD, that does not require a priori knowledge of these parameters.
arXiv Detail & Related papers (2023-10-21T18:38:13Z) - Global Nash Equilibrium in Non-convex Multi-player Game: Theory and
Algorithms [66.8634598612777]
We show that Nash equilibrium (NE) is acceptable to all players in a multi-player game.
We also show that no one can benefit unilaterally from the general theory step by step.
arXiv Detail & Related papers (2023-01-19T11:36:50Z) - Provably Efficient Offline Reinforcement Learning with Trajectory-Wise
Reward [66.81579829897392]
We propose a novel offline reinforcement learning algorithm called Pessimistic vAlue iteRaTion with rEward Decomposition (PARTED)
PARTED decomposes the trajectory return into per-step proxy rewards via least-squares-based reward redistribution, and then performs pessimistic value based on the learned proxy reward.
To the best of our knowledge, PARTED is the first offline RL algorithm that is provably efficient in general MDP with trajectory-wise reward.
arXiv Detail & Related papers (2022-06-13T19:11:22Z) - Learning Stationary Nash Equilibrium Policies in $n$-Player Stochastic
Games with Independent Chains [2.132096006921048]
We consider a class of $n$-player games in which players have their own internal state/action spaces while they are coupled through their payoff functions.
For this class of games, we first show that finding a stationary Nash (NE) policy without any assumption on the reward functions is interactable.
We develop algorithms based on dual averaging and dual mirror descent, which converge to the set of $epsilon$-NE policies.
arXiv Detail & Related papers (2022-01-28T16:27:21Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z) - Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation.
We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.