Expert-Free Online Transfer Learning in Multi-Agent Reinforcement
Learning
- URL: http://arxiv.org/abs/2303.01170v3
- Date: Fri, 28 Jul 2023 11:52:56 GMT
- Title: Expert-Free Online Transfer Learning in Multi-Agent Reinforcement
Learning
- Authors: Alberto Castagna and Ivana Dusparic
- Abstract summary: Expert-Free Online Transfer Learning (EF-OnTL) is an algorithm that enables expert-free real-time dynamic transfer learning in multi-agent system.
EF-OnTL achieves overall comparable performance when compared against advice-based baselines.
- Score: 2.984934409689467
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transfer learning in Reinforcement Learning (RL) has been widely studied to
overcome training issues of Deep-RL, i.e., exploration cost, data availability
and convergence time, by introducing a way to enhance training phase with
external knowledge. Generally, knowledge is transferred from expert-agents to
novices. While this fixes the issue for a novice agent, a good understanding of
the task on expert agent is required for such transfer to be effective. As an
alternative, in this paper we propose Expert-Free Online Transfer Learning
(EF-OnTL), an algorithm that enables expert-free real-time dynamic transfer
learning in multi-agent system. No dedicated expert exists, and transfer source
agent and knowledge to be transferred are dynamically selected at each transfer
step based on agents' performance and uncertainty. To improve uncertainty
estimation, we also propose State Action Reward Next-State Random Network
Distillation (sars-RND), an extension of RND that estimates uncertainty from RL
agent-environment interaction. We demonstrate EF-OnTL effectiveness against a
no-transfer scenario and advice-based baselines, with and without expert
agents, in three benchmark tasks: Cart-Pole, a grid-based Multi-Team
Predator-Prey (mt-pp) and Half Field Offense (HFO). Our results show that
EF-OnTL achieve overall comparable performance when compared against
advice-based baselines while not requiring any external input nor threshold
tuning. EF-OnTL outperforms no-transfer with an improvement related to the
complexity of the task addressed.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a novel trainer-student system that learns a dynamic reward function based on the student's performance and alignment with expert demonstrations.
RILe enables better performance in complex settings where traditional methods falter, outperforming existing methods by 2x in complex simulated robot-locomotion tasks.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - Data-Driven Inverse Reinforcement Learning for Expert-Learner Zero-Sum
Games [30.720112378448285]
We formulate inverse reinforcement learning as an expert-learner interaction.
The optimal performance intent of an expert or target agent is unknown to a learner agent.
We develop an off-policy IRL algorithm that does not require knowledge of the expert and learner agent dynamics.
arXiv Detail & Related papers (2023-01-05T10:35:08Z) - Discriminator-Weighted Offline Imitation Learning from Suboptimal
Demonstrations [5.760034336327491]
We study the problem of offline Learning (IL) where an agent aims to learn an optimal expert behavior policy without additional online environment interactions.
We introduce an additional discriminator to distinguish expert and non-expert data.
Our proposed algorithm achieves higher returns and faster training speed compared to baseline algorithms.
arXiv Detail & Related papers (2022-07-20T17:29:04Z) - Transferred Q-learning [79.79659145328856]
We consider $Q$-learning with knowledge transfer, using samples from a target reinforcement learning (RL) task as well as source samples from different but related RL tasks.
We propose transfer learning algorithms for both batch and online $Q$-learning with offline source studies.
arXiv Detail & Related papers (2022-02-09T20:08:19Z) - Self-Supervised Knowledge Transfer via Loosely Supervised Auxiliary
Tasks [24.041268664220294]
knowledge transfer using convolutional neural networks (CNNs) can help efficiently train a CNN with fewer parameters or maximize the generalization performance under limited supervision.
We propose a simple yet powerful knowledge transfer methodology without any restrictions regarding the network structure or dataset used.
We devise a training methodology that transfers previously learned knowledge to the current training process as an auxiliary task for the target task through self-supervision using a soft label.
arXiv Detail & Related papers (2021-10-25T07:18:26Z) - Targeted Data Acquisition for Evolving Negotiation Agents [6.953246373478702]
Successful negotiators must learn how to balance optimizing for self-interest and cooperation.
Current artificial negotiation agents often heavily depend on the quality of the static datasets they were trained on.
We introduce a targeted data acquisition framework where we guide the exploration of a reinforcement learning agent.
arXiv Detail & Related papers (2021-06-14T19:45:59Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model
Distillation Approach [55.83558520598304]
We propose a brand new solution to reuse experiences and transfer value functions among multiple students via model distillation.
We also describe how to design an efficient communication protocol to exploit heterogeneous knowledge.
Our proposed framework, namely Learning and Teaching Categorical Reinforcement, shows promising performance on stabilizing and accelerating learning progress.
arXiv Detail & Related papers (2020-02-06T11:31:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.