An advantage based policy transfer algorithm for reinforcement learning
with metrics of transferability
- URL: http://arxiv.org/abs/2311.06731v1
- Date: Sun, 12 Nov 2023 04:25:53 GMT
- Title: An advantage based policy transfer algorithm for reinforcement learning
with metrics of transferability
- Authors: Md Ferdous Alam, Parinaz Naghizadeh, David Hoelzle
- Abstract summary: Reinforcement learning (RL) can enable sequential decision-making in complex and high-dimensional environments.
transfer RL algorithms can be used for the transfer of knowledge from one or multiple source environments to a target environment.
This paper proposes an off-policy Advantage-based Policy Transfer algorithm, APT-RL, for fixed domain environments.
- Score: 6.660458629649826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) can enable sequential decision-making in complex
and high-dimensional environments if the acquisition of a new state-action pair
is efficient, i.e., when interaction with the environment is inexpensive.
However, there are a myriad of real-world applications in which a high number
of interactions are infeasible. In these environments, transfer RL algorithms,
which can be used for the transfer of knowledge from one or multiple source
environments to a target environment, have been shown to increase learning
speed and improve initial and asymptotic performance. However, most existing
transfer RL algorithms are on-policy and sample inefficient, and often require
heuristic choices in algorithm design. This paper proposes an off-policy
Advantage-based Policy Transfer algorithm, APT-RL, for fixed domain
environments. Its novelty is in using the popular notion of ``advantage'' as a
regularizer, to weigh the knowledge that should be transferred from the source,
relative to new knowledge learned in the target, removing the need for
heuristic choices. Further, we propose a new transfer performance metric to
evaluate the performance of our algorithm and unify existing transfer RL
frameworks. Finally, we present a scalable, theoretically-backed task
similarity measurement algorithm to illustrate the alignments between our
proposed transferability metric and similarities between source and target
environments. Numerical experiments on three continuous control benchmark tasks
demonstrate that APT-RL outperforms existing transfer RL algorithms on most
tasks, and is $10\%$ to $75\%$ more sample efficient than learning from
scratch.
Related papers
- Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins.
We employ inverse RL (IRL) to automatically learn reward functions without manual tuning.
We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z) - Multi-Objective Optimization Using Adaptive Distributed Reinforcement Learning [8.471466670802815]
We propose a multi-objective, multi-agent reinforcement learning (MARL) algorithm with high learning efficiency and low computational requirements.
We test our algorithm in an ITS environment with edge cloud computing.
Our algorithm also addresses various practical concerns with its modularized and asynchronous online training method.
arXiv Detail & Related papers (2024-03-13T18:05:16Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - Inapplicable Actions Learning for Knowledge Transfer in Reinforcement
Learning [3.194414753332705]
We show that learning inapplicable actions greatly improves the sample efficiency of RL algorithms.
Thanks to the transferability of the knowledge acquired, it can be reused in other tasks and domains to make the learning process more efficient.
arXiv Detail & Related papers (2022-11-28T17:45:39Z) - Lean Evolutionary Reinforcement Learning by Multitasking with Importance
Sampling [20.9680985132322]
We introduce a novel neuroevolutionary multitasking (NuEMT) algorithm to transfer information from a set of auxiliary tasks to the target (full length) RL task.
We demonstrate that the NuEMT algorithm data-lean evolutionary RL, reducing expensive agent-environment interaction data requirements.
arXiv Detail & Related papers (2022-03-21T10:06:16Z) - Adaptive Stochastic ADMM for Decentralized Reinforcement Learning in
Edge Industrial IoT [106.83952081124195]
Reinforcement learning (RL) has been widely investigated and shown to be a promising solution for decision-making and optimal control processes.
We propose an adaptive ADMM (asI-ADMM) algorithm and apply it to decentralized RL with edge-computing-empowered IIoT networks.
Experiment results show that our proposed algorithms outperform the state of the art in terms of communication costs and scalability, and can well adapt to complex IoT environments.
arXiv Detail & Related papers (2021-06-30T16:49:07Z) - Reinforcement Learning for Datacenter Congestion Control [50.225885814524304]
Successful congestion control algorithms can dramatically improve latency and overall network throughput.
Until today, no such learning-based algorithms have shown practical potential in this domain.
We devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks.
We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training.
arXiv Detail & Related papers (2021-02-18T13:49:28Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance
Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks.
This problem is still not fully understood, for which two major challenges need to be addressed.
We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z) - Efficient Deep Reinforcement Learning via Adaptive Policy Transfer [50.51637231309424]
Policy Transfer Framework (PTF) is proposed to accelerate Reinforcement Learning (RL)
Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it.
Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods.
arXiv Detail & Related papers (2020-02-19T07:30:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.