Policy Search with Rare Significant Events: Choosing the Right Partner
to Cooperate with
- URL: http://arxiv.org/abs/2103.06846v1
- Date: Thu, 11 Mar 2021 18:14:41 GMT
- Title: Policy Search with Rare Significant Events: Choosing the Right Partner
to Cooperate with
- Authors: Paul Ecoffet, Nicolas Fontbonne, Jean-Baptiste Andr\'e, Nicolas
Bredeche
- Abstract summary: This paper focuses on a class of reinforcement learning problems where significant events are rare and limited to a single positive reward per episode.
We show that when significant events are rare, gradient information is also scarce, making it difficult for policy gradient search methods to find an optimal policy.
On the other hand, we show that direct policy search methods are invariant to the rarity of significant events, which is yet another confirmation of the unique role evolutionary algorithms has to play as a reinforcement learning method.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper focuses on a class of reinforcement learning problems where
significant events are rare and limited to a single positive reward per
episode. A typical example is that of an agent who has to choose a partner to
cooperate with, while a large number of partners are simply not interested in
cooperating, regardless of what the agent has to offer. We address this problem
in a continuous state and action space with two different kinds of search
methods: a gradient policy search method and a direct policy search method
using an evolution strategy. We show that when significant events are rare,
gradient information is also scarce, making it difficult for policy gradient
search methods to find an optimal policy, with or without a deep neural
architecture. On the other hand, we show that direct policy search methods are
invariant to the rarity of significant events, which is yet another
confirmation of the unique role evolutionary algorithms has to play as a
reinforcement learning method.
Related papers
- Joint Intrinsic Motivation for Coordinated Exploration in Multi-Agent
Deep Reinforcement Learning [0.0]
We propose an approach for rewarding strategies where agents collectively exhibit novel behaviors.
Jim rewards joint trajectories based on a centralized measure of novelty designed to function in continuous environments.
Results show that joint exploration is crucial for solving tasks where the optimal strategy requires a high level of coordination.
arXiv Detail & Related papers (2024-02-06T13:02:00Z) - Fast Peer Adaptation with Context-aware Exploration [63.08444527039578]
We propose a peer identification reward for learning agents in multi-agent games.
This reward motivates the agent to learn a context-aware policy for effective exploration and fast adaptation.
We evaluate our method on diverse testbeds that involve competitive (Kuhn Poker), cooperative (PO-Overcooked), or mixed (Predator-Prey-W) games with peer agents.
arXiv Detail & Related papers (2024-02-04T13:02:27Z) - Conditional Imitation Learning for Multi-Agent Games [89.897635970366]
We study the problem of conditional multi-agent imitation learning, where we have access to joint trajectory demonstrations at training time.
We propose a novel approach to address the difficulties of scalability and data scarcity.
Our model learns a low-rank subspace over ego and partner agent strategies, then infers and adapts to a new partner strategy by interpolating in the subspace.
arXiv Detail & Related papers (2022-01-05T04:40:13Z) - Influence-based Reinforcement Learning for Intrinsically-motivated
Agents [0.0]
We present an algorithmic framework of two reinforcement learning agents each with a different objective.
We introduce a novel function approximation approach to assess the influence $F$ of a certain policy on others.
Our method was evaluated on the suite of OpenAI gym tasks as well as cooperative and mixed scenarios.
arXiv Detail & Related papers (2021-08-28T05:36:10Z) - Q-Mixing Network for Multi-Agent Pathfinding in Partially Observable
Grid Environments [62.997667081978825]
We consider the problem of multi-agent navigation in partially observable grid environments.
We suggest utilizing the reinforcement learning approach when the agents, first, learn the policies that map observations to actions and then follow these policies to reach their goals.
arXiv Detail & Related papers (2021-08-13T09:44:47Z) - Cooperative Exploration for Multi-Agent Deep Reinforcement Learning [127.4746863307944]
We propose cooperative multi-agent exploration (CMAE) for deep reinforcement learning.
The goal is selected from multiple projected state spaces via a normalized entropy-based technique.
We demonstrate that CMAE consistently outperforms baselines on various tasks.
arXiv Detail & Related papers (2021-07-23T20:06:32Z) - Curriculum-Driven Multi-Agent Learning and the Role of Implicit
Communication in Teamwork [24.92668968807012]
We propose a curriculum-driven learning strategy for solving difficult multi-agent coordination tasks.
We argue that emergent implicit communication plays a large role in enabling superior levels of coordination.
arXiv Detail & Related papers (2021-06-21T14:54:07Z) - Scalable, Decentralized Multi-Agent Reinforcement Learning Methods
Inspired by Stigmergy and Ant Colonies [0.0]
We investigate a novel approach to decentralized multi-agent learning and planning.
In particular, this method is inspired by the cohesion, coordination, and behavior of ant colonies.
The approach combines single-agent RL and an ant-colony-inspired decentralized, stigmergic algorithm for multi-agent path planning and environment modification.
arXiv Detail & Related papers (2021-05-08T01:04:51Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Guided Dialog Policy Learning without Adversarial Learning in the Loop [103.20723982440788]
A number of adversarial learning methods have been proposed to learn the reward function together with the dialogue policy.
We propose to decompose the adversarial training into two steps.
First, we train the discriminator with an auxiliary dialogue generator and then incorporate a derived reward model into a common RL method to guide the dialogue policy learning.
arXiv Detail & Related papers (2020-04-07T11:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.