Cooperative-Competitive Reinforcement Learning with History-Dependent
Rewards
- URL: http://arxiv.org/abs/2010.08030v1
- Date: Thu, 15 Oct 2020 21:37:07 GMT
- Title: Cooperative-Competitive Reinforcement Learning with History-Dependent
Rewards
- Authors: Keyang He, Bikramjit Banerjee, Prashant Doshi
- Abstract summary: We show that an agent's decision-making problem can be modeled as an interactive partially observable Markov decision process (I-POMDP)
We present an interactive advantage actor-critic method (IA2C$+$), which combines the independent advantage actor-critic network with a belief filter.
Empirical results show that IA2C$+$ learns the optimal policy faster and more robustly than several other baselines.
- Score: 12.41853254173419
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consider a typical organization whose worker agents seek to collectively
cooperate for its general betterment. However, each individual agent
simultaneously seeks to act to secure a larger chunk than its co-workers of the
annual increment in compensation, which usually comes from a {\em fixed} pot.
As such, the individual agent in the organization must cooperate and compete.
Another feature of many organizations is that a worker receives a bonus, which
is often a fraction of previous year's total profit. As such, the agent derives
a reward that is also partly dependent on historical performance. How should
the individual agent decide to act in this context? Few methods for the mixed
cooperative-competitive setting have been presented in recent years, but these
are challenged by problem domains whose reward functions do not depend on the
current state and action only. Recent deep multi-agent reinforcement learning
(MARL) methods using long short-term memory (LSTM) may be used, but these adopt
a joint perspective to the interaction or require explicit exchange of
information among the agents to promote cooperation, which may not be possible
under competition. In this paper, we first show that the agent's
decision-making problem can be modeled as an interactive partially observable
Markov decision process (I-POMDP) that captures the dynamic of a
history-dependent reward. We present an interactive advantage actor-critic
method (IA2C$^+$), which combines the independent advantage actor-critic
network with a belief filter that maintains a belief distribution over other
agents' models. Empirical results show that IA2C$^+$ learns the optimal policy
faster and more robustly than several other baselines including one that uses a
LSTM, even when attributed models are incorrect.
Related papers
- Principal-Agent Reward Shaping in MDPs [50.914110302917756]
Principal-agent problems arise when one party acts on behalf of another, leading to conflicts of interest.
We study a two-player Stack game where the principal and the agent have different reward functions, and the agent chooses an MDP policy for both players.
Our results establish trees and deterministic decision processes with a finite horizon.
arXiv Detail & Related papers (2023-12-30T18:30:44Z) - Byzantine-Resilient Decentralized Multi-Armed Bandits [25.499420566469098]
We develop an algorithm that fuses an information mixing step among agents with a truncation of inconsistent and extreme values.
This framework can be used to model attackers in computer networks, instigators of offensive content into recommender systems, or manipulators of financial markets.
arXiv Detail & Related papers (2023-10-11T09:09:50Z) - Estimating and Incentivizing Imperfect-Knowledge Agents with Hidden
Rewards [4.742123770879715]
In practice, incentive providers often cannot observe the reward realizations of incentivized agents.
This paper explores a repeated adverse selection game between a self-interested learning agent and a learning principal.
We introduce an estimator whose only input is the history of principal's incentives and agent's choices.
arXiv Detail & Related papers (2023-08-13T08:12:01Z) - Credit-cognisant reinforcement learning for multi-agent cooperation [0.0]
We introduce the concept of credit-cognisant rewards, which allows an agent to perceive the effect its actions had on the environment as well as on its co-agents.
We show that by manipulating these experiences and constructing the reward contained within them to include the rewards received by all the agents within the same action sequence, we are able to improve significantly on the performance of independent deep Q-learning.
arXiv Detail & Related papers (2022-11-18T09:00:25Z) - Online Learning of Competitive Equilibria in Exchange Economies [94.24357018178867]
In economics, the sharing of scarce resources among multiple rational agents is a classical problem.
We propose an online learning mechanism to learn agent preferences.
We demonstrate the effectiveness of this mechanism through numerical simulations.
arXiv Detail & Related papers (2021-06-11T21:32:17Z) - Cooperative and Competitive Biases for Multi-Agent Reinforcement
Learning [12.676356746752893]
Training a multi-agent reinforcement learning (MARL) algorithm is more challenging than training a single-agent reinforcement learning algorithm.
We propose an algorithm that boosts MARL training using the biased action information of other agents based on a friend-or-foe concept.
We empirically demonstrate that our algorithm outperforms existing algorithms in various mixed cooperative-competitive environments.
arXiv Detail & Related papers (2021-01-18T05:52:22Z) - Multi-agent Policy Optimization with Approximatively Synchronous
Advantage Estimation [55.96893934962757]
In multi-agent system, polices of different agents need to be evaluated jointly.
In current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously.
In this work, we propose the approximatively synchronous advantage estimation.
arXiv Detail & Related papers (2020-12-07T07:29:19Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.