Related papers: Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning

Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2303.01170v3
Date: Fri, 28 Jul 2023 11:52:56 GMT
Title: Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning
Authors: Alberto Castagna and Ivana Dusparic
Abstract summary: Expert-Free Online Transfer Learning (EF-OnTL) is an algorithm that enables expert-free real-time dynamic transfer learning in multi-agent system. EF-OnTL achieves overall comparable performance when compared against advice-based baselines.
Score: 2.984934409689467
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transfer learning in Reinforcement Learning (RL) has been widely studied to overcome training issues of Deep-RL, i.e., exploration cost, data availability and convergence time, by introducing a way to enhance training phase with external knowledge. Generally, knowledge is transferred from expert-agents to novices. While this fixes the issue for a novice agent, a good understanding of the task on expert agent is required for such transfer to be effective. As an alternative, in this paper we propose Expert-Free Online Transfer Learning (EF-OnTL), an algorithm that enables expert-free real-time dynamic transfer learning in multi-agent system. No dedicated expert exists, and transfer source agent and knowledge to be transferred are dynamically selected at each transfer step based on agents' performance and uncertainty. To improve uncertainty estimation, we also propose State Action Reward Next-State Random Network Distillation (sars-RND), an extension of RND that estimates uncertainty from RL agent-environment interaction. We demonstrate EF-OnTL effectiveness against a no-transfer scenario and advice-based baselines, with and without expert agents, in three benchmark tasks: Cart-Pole, a grid-based Multi-Team Predator-Prey (mt-pp) and Half Field Offense (HFO). Our results show that EF-OnTL achieve overall comparable performance when compared against advice-based baselines while not requiring any external input nor threshold tuning. EF-OnTL outperforms no-transfer with an improvement related to the complexity of the task addressed.

Related papers

From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a novel trainer-student system that learns a dynamic reward function based on the student's performance and alignment with expert demonstrations. RILe enables better performance in complex settings where traditional methods falter, outperforming existing methods by 2x in complex simulated robot-locomotion tasks.
arXiv Detail & Related papers (2024-06-12T17:56:31Z)
Data-Driven Inverse Reinforcement Learning for Expert-Learner Zero-Sum Games [30.720112378448285]
We formulate inverse reinforcement learning as an expert-learner interaction. The optimal performance intent of an expert or target agent is unknown to a learner agent. We develop an off-policy IRL algorithm that does not require knowledge of the expert and learner agent dynamics.
arXiv Detail & Related papers (2023-01-05T10:35:08Z)
Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations [5.760034336327491]
We study the problem of offline Learning (IL) where an agent aims to learn an optimal expert behavior policy without additional online environment interactions. We introduce an additional discriminator to distinguish expert and non-expert data. Our proposed algorithm achieves higher returns and faster training speed compared to baseline algorithms.
arXiv Detail & Related papers (2022-07-20T17:29:04Z)
Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context. We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z)
Transferred Q-learning [79.79659145328856]
We consider $Q$-learning with knowledge transfer, using samples from a target reinforcement learning (RL) task as well as source samples from different but related RL tasks. We propose transfer learning algorithms for both batch and online $Q$-learning with offline source studies.
arXiv Detail & Related papers (2022-02-09T20:08:19Z)
Self-Supervised Knowledge Transfer via Loosely Supervised Auxiliary Tasks [24.041268664220294]
knowledge transfer using convolutional neural networks (CNNs) can help efficiently train a CNN with fewer parameters or maximize the generalization performance under limited supervision. We propose a simple yet powerful knowledge transfer methodology without any restrictions regarding the network structure or dataset used. We devise a training methodology that transfers previously learned knowledge to the current training process as an auxiliary task for the target task through self-supervision using a soft label.
arXiv Detail & Related papers (2021-10-25T07:18:26Z)
Targeted Data Acquisition for Evolving Negotiation Agents [6.953246373478702]
Successful negotiators must learn how to balance optimizing for self-interest and cooperation. Current artificial negotiation agents often heavily depend on the quality of the static datasets they were trained on. We introduce a targeted data acquisition framework where we guide the exploration of a reinforcement learning agent.
arXiv Detail & Related papers (2021-06-14T19:45:59Z)
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning. We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD) We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model Distillation Approach [55.83558520598304]
We propose a brand new solution to reuse experiences and transfer value functions among multiple students via model distillation. We also describe how to design an efficient communication protocol to exploit heterogeneous knowledge. Our proposed framework, namely Learning and Teaching Categorical Reinforcement, shows promising performance on stabilizing and accelerating learning progress.
arXiv Detail & Related papers (2020-02-06T11:31:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.