Decision-making with Speculative Opponent Models
- URL: http://arxiv.org/abs/2211.11940v3
- Date: Fri, 22 Mar 2024 04:40:11 GMT
- Title: Decision-making with Speculative Opponent Models
- Authors: Jing Sun, Shuo Chen, Cong Zhang, Yining Ma, Jie Zhang,
- Abstract summary: We introduce Distributional Opponent-aided Multi-agent Actor-Critic (DOMAC)
DOMAC is the first speculative opponent modelling algorithm that relies solely on local information (i.e., the controlled agent's observations, actions, and rewards)
- Score: 10.594910251058087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Opponent modelling has proven effective in enhancing the decision-making of the controlled agent by constructing models of opponent agents. However, existing methods often rely on access to the observations and actions of opponents, a requirement that is infeasible when such information is either unobservable or challenging to obtain. To address this issue, we introduce Distributional Opponent-aided Multi-agent Actor-Critic (DOMAC), the first speculative opponent modelling algorithm that relies solely on local information (i.e., the controlled agent's observations, actions, and rewards). Specifically, the actor maintains a speculated belief about the opponents using the tailored speculative opponent models that predict the opponents' actions using only local information. Moreover, DOMAC features distributional critic models that estimate the return distribution of the actor's policy, yielding a more fine-grained assessment of the actor's quality. This thus more effectively guides the training of the speculative opponent models that the actor depends upon. Furthermore, we formally derive a policy gradient theorem with the proposed opponent models. Extensive experiments under eight different challenging multi-agent benchmark tasks within the MPE, Pommerman and StarCraft Multiagent Challenge (SMAC) demonstrate that our DOMAC successfully models opponents' behaviours and delivers superior performance against state-of-the-art methods with a faster convergence speed.
Related papers
- A Minimaximalist Approach to Reinforcement Learning from Human Feedback [49.45285664482369]
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback.
Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training.
We demonstrate that on a suite of continuous control tasks, we are able to learn significantly more efficiently than reward-model based approaches.
arXiv Detail & Related papers (2024-01-08T17:55:02Z) - ShuttleSHAP: A Turn-Based Feature Attribution Approach for Analyzing
Forecasting Models in Badminton [52.21869064818728]
Deep learning approaches for player tactic forecasting in badminton show promising performance partially attributed to effective reasoning about rally-player interactions.
We propose a turn-based feature attribution approach, ShuttleSHAP, for analyzing forecasting models in badminton based on variants of Shapley values.
arXiv Detail & Related papers (2023-12-18T05:37:51Z) - Malicious Agent Detection for Robust Multi-Agent Collaborative Perception [52.261231738242266]
Multi-agent collaborative (MAC) perception is more vulnerable to adversarial attacks than single-agent perception.
We propose Malicious Agent Detection (MADE), a reactive defense specific to MAC perception.
We conduct comprehensive evaluations on a benchmark 3D dataset V2X-sim and a real-road dataset DAIR-V2X.
arXiv Detail & Related papers (2023-10-18T11:36:42Z) - Mimicking To Dominate: Imitation Learning Strategies for Success in
Multiagent Competitive Games [13.060023718506917]
We develop a new multi-agent imitation learning model for predicting next moves of the opponents.
We also present a new multi-agent reinforcement learning algorithm that combines our imitation learning model and policy training into one single training process.
Experimental results show that our approach achieves superior performance compared to existing state-of-the-art multi-agent RL algorithms.
arXiv Detail & Related papers (2023-08-20T07:30:13Z) - Multi-agent Actor-Critic with Time Dynamical Opponent Model [16.820873906787906]
In multi-agent reinforcement learning, multiple agents learn simultaneously while interacting with a common environment and each other.
We propose a novel textitTime Dynamical Opponent Model (TDOM) to encode the knowledge that the opponent policies tend to improve over time.
We show empirically that TDOM achieves superior opponent behavior prediction during test time.
arXiv Detail & Related papers (2022-04-12T07:16:15Z) - Training Meta-Surrogate Model for Transferable Adversarial Attack [98.13178217557193]
We consider adversarial attacks to a black-box model when no queries are allowed.
In this setting, many methods directly attack surrogate models and transfer the obtained adversarial examples to fool the target model.
We show we can obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easier transferred to other models.
arXiv Detail & Related papers (2021-09-05T03:27:46Z) - Moody Learners -- Explaining Competitive Behaviour of Reinforcement
Learning Agents [65.2200847818153]
In a competitive scenario, the agent does not only have a dynamic environment but also is directly affected by the opponents' actions.
Observing the Q-values of the agent is usually a way of explaining its behavior, however, do not show the temporal-relation between the selected actions.
arXiv Detail & Related papers (2020-07-30T11:30:42Z) - Agent Modelling under Partial Observability for Deep Reinforcement
Learning [12.903487594031276]
Existing methods for agent modelling assume knowledge of the local observations and chosen actions of the modelled agents during execution.
We learn to extract representations about the modelled agents conditioned only on the local observations of the controlled agent.
The representations are used to augment the controlled agent's decision policy which is trained via deep reinforcement learning.
arXiv Detail & Related papers (2020-06-16T18:43:42Z) - Learning to Model Opponent Learning [11.61673411387596]
Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment.
This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment.
We develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL)
arXiv Detail & Related papers (2020-06-06T17:19:04Z) - Regularizers for Single-step Adversarial Training [49.65499307547198]
We propose three types of regularizers that help to learn robust models using single-step adversarial training methods.
Regularizers mitigate the effect of gradient masking by harnessing on properties that differentiate a robust model from that of a pseudo robust model.
arXiv Detail & Related papers (2020-02-03T09:21:04Z) - Variational Autoencoders for Opponent Modeling in Multi-Agent Systems [9.405879323049659]
Multi-agent systems exhibit complex behaviors that emanate from the interactions of multiple agents in a shared environment.
In this work, we are interested in controlling one agent in a multi-agent system and successfully learn to interact with the other agents that have fixed policies.
Modeling the behavior of other agents (opponents) is essential in understanding the interactions of the agents in the system.
arXiv Detail & Related papers (2020-01-29T13:38:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.