Collaborative Training of Heterogeneous Reinforcement Learning Agents in
Environments with Sparse Rewards: What and When to Share?
- URL: http://arxiv.org/abs/2202.12174v1
- Date: Thu, 24 Feb 2022 16:15:51 GMT
- Title: Collaborative Training of Heterogeneous Reinforcement Learning Agents in
Environments with Sparse Rewards: What and When to Share?
- Authors: Alain Andres, Esther Villar-Rodriguez and Javier Del Ser
- Abstract summary: This work focuses on combining information obtained through intrinsic motivation with the aim of having a more efficient exploration and faster learning.
Our results reveal different ways in which a collaborative framework with little additional computational cost can outperform an independent learning process without knowledge sharing.
- Score: 7.489793155793319
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the early stages of human life, babies develop their skills by exploring
different scenarios motivated by their inherent satisfaction rather than by
extrinsic rewards from the environment. This behavior, referred to as intrinsic
motivation, has emerged as one solution to address the exploration challenge
derived from reinforcement learning environments with sparse rewards. Diverse
exploration approaches have been proposed to accelerate the learning process
over single- and multi-agent problems with homogeneous agents. However, scarce
studies have elaborated on collaborative learning frameworks between
heterogeneous agents deployed into the same environment, but interacting with
different instances of the latter without any prior knowledge. Beyond the
heterogeneity, each agent's characteristics grant access only to a subset of
the full state space, which may hide different exploration strategies and
optimal solutions. In this work we combine ideas from intrinsic motivation and
transfer learning. Specifically, we focus on sharing parameters in actor-critic
model architectures and on combining information obtained through intrinsic
motivation with the aim of having a more efficient exploration and faster
learning. We test our strategies through experiments performed over a modified
ViZDooM's My Way Home scenario, which is more challenging than its original
version and allows evaluating the heterogeneity between agents. Our results
reveal different ways in which a collaborative framework with little additional
computational cost can outperform an independent learning process without
knowledge sharing. Additionally, we depict the need for modulating correctly
the importance between the extrinsic and intrinsic rewards to avoid undesired
agent behaviors.
Related papers
- Reciprocal Reward Influence Encourages Cooperation From Self-Interested Agents [2.1301560294088318]
Cooperation between self-interested individuals is a widespread phenomenon in the natural world, but remains elusive in interactions between artificially intelligent agents.
We introduce Reciprocators, reinforcement learning agents which are intrinsically motivated to reciprocate the influence of opponents' actions on their returns.
We show that Reciprocators can be used to promote cooperation in temporally extended social dilemmas during simultaneous learning.
arXiv Detail & Related papers (2024-06-03T06:07:27Z) - Joint Intrinsic Motivation for Coordinated Exploration in Multi-Agent
Deep Reinforcement Learning [0.0]
We propose an approach for rewarding strategies where agents collectively exhibit novel behaviors.
Jim rewards joint trajectories based on a centralized measure of novelty designed to function in continuous environments.
Results show that joint exploration is crucial for solving tasks where the optimal strategy requires a high level of coordination.
arXiv Detail & Related papers (2024-02-06T13:02:00Z) - Multi-Agent Interplay in a Competitive Survival Environment [0.0]
This thesis is part of the author's thesis "Multi-Agent Interplay in a Competitive Survival Environment" for the Master's Degree in Artificial Intelligence and Robotics at Sapienza University of Rome, 2022.
arXiv Detail & Related papers (2023-01-19T12:04:03Z) - Towards Improving Exploration in Self-Imitation Learning using Intrinsic
Motivation [7.489793155793319]
Reinforcement Learning has emerged as a strong alternative to solve optimization tasks efficiently.
The use of these algorithms highly depends on the feedback signals provided by the environment in charge of informing about how good (or bad) the decisions made by the learned agent are.
In this work intrinsic motivation is used to encourage the agent to explore the environment based on its curiosity, whereas imitation learning allows repeating the most promising experiences to accelerate the learning process.
arXiv Detail & Related papers (2022-11-30T09:18:59Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - Seeing Differently, Acting Similarly: Imitation Learning with
Heterogeneous Observations [126.78199124026398]
In many real-world imitation learning tasks, the demonstrator and the learner have to act in different but full observation spaces.
In this work, we model the above learning problem as Heterogeneous Observations Learning (HOIL)
We propose the Importance Weighting with REjection (IWRE) algorithm based on the techniques of importance-weighting, learning with rejection, and active querying to solve the key challenge of occupancy measure matching.
arXiv Detail & Related papers (2021-06-17T05:44:04Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z) - Ecological Reinforcement Learning [76.9893572776141]
We study the kinds of environment properties that can make learning under such conditions easier.
understanding how properties of the environment impact the performance of reinforcement learning agents can help us to structure our tasks in ways that make learning tractable.
arXiv Detail & Related papers (2020-06-22T17:55:03Z) - Human AI interaction loop training: New approach for interactive
reinforcement learning [0.0]
Reinforcement Learning (RL) in various decision-making tasks of machine learning provides effective results with an agent learning from a stand-alone reward function.
RL presents unique challenges with large amounts of environment states and action spaces, as well as in the determination of rewards.
Imitation Learning (IL) offers a promising solution for those challenges using a teacher.
arXiv Detail & Related papers (2020-03-09T15:27:48Z) - Intrinsic Motivation for Encouraging Synergistic Behavior [55.10275467562764]
We study the role of intrinsic motivation as an exploration bias for reinforcement learning in sparse-reward synergistic tasks.
Our key idea is that a good guiding principle for intrinsic motivation in synergistic tasks is to take actions which affect the world in ways that would not be achieved if the agents were acting on their own.
arXiv Detail & Related papers (2020-02-12T19:34:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.