An advantage based policy transfer algorithm for reinforcement learning with measures of transferability
- URL: http://arxiv.org/abs/2311.06731v2
- Date: Sun, 15 Dec 2024 01:27:10 GMT
- Title: An advantage based policy transfer algorithm for reinforcement learning with measures of transferability
- Authors: Md Ferdous Alam, Parinaz Naghizadeh, David Hoelzle,
- Abstract summary: Reinforcement learning (RL) enables sequential decision-making in complex and high-dimensional environments.
This paper proposes an off-policy Advantage-based Policy Transfer algorithm, APT-RL, for fixed domain environments.
- Score: 5.926203312586109
- License:
- Abstract: Reinforcement learning (RL) enables sequential decision-making in complex and high-dimensional environments through interaction with the environment. In most real-world applications, however, a high number of interactions are infeasible. In these environments, transfer RL algorithms, which can be used for the transfer of knowledge from one or multiple source environments to a target environment, have been shown to increase learning speed and improve initial and asymptotic performance. However, most existing transfer RL algorithms are on-policy and sample inefficient, fail in adversarial target tasks, and often require heuristic choices in algorithm design. This paper proposes an off-policy Advantage-based Policy Transfer algorithm, APT-RL, for fixed domain environments. Its novelty is in using the popular notion of ``advantage'' as a regularizer, to weigh the knowledge that should be transferred from the source, relative to new knowledge learned in the target, removing the need for heuristic choices. Further, we propose a new transfer performance measure to evaluate the performance of our algorithm and unify existing transfer RL frameworks. Finally, we present a scalable, theoretically-backed task similarity measurement algorithm to illustrate the alignments between our proposed transferability measure and similarities between source and target environments. We compare APT-RL with several baselines, including existing transfer-RL algorithms, in three high-dimensional continuous control tasks. Our experiments demonstrate that APT-RL outperforms existing transfer RL algorithms and is at least as good as learning from scratch in adversarial tasks.
Related papers
- Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins.
We employ inverse RL (IRL) to automatically learn reward functions without manual tuning.
We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z) - Safe and Accelerated Deep Reinforcement Learning-based O-RAN Slicing: A
Hybrid Transfer Learning Approach [20.344810727033327]
We propose and design a hybrid TL-aided approach to provide safe and accelerated convergence in DRL-based O-RAN slicing.
The proposed hybrid approach shows at least: 7.7% and 20.7% improvements in the average initial reward value and the percentage of converged scenarios.
arXiv Detail & Related papers (2023-09-13T18:58:34Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - Lean Evolutionary Reinforcement Learning by Multitasking with Importance
Sampling [20.9680985132322]
We introduce a novel neuroevolutionary multitasking (NuEMT) algorithm to transfer information from a set of auxiliary tasks to the target (full length) RL task.
We demonstrate that the NuEMT algorithm data-lean evolutionary RL, reducing expensive agent-environment interaction data requirements.
arXiv Detail & Related papers (2022-03-21T10:06:16Z) - Semantic-Aware Collaborative Deep Reinforcement Learning Over Wireless
Cellular Networks [82.02891936174221]
Collaborative deep reinforcement learning (CDRL) algorithms in which multiple agents can coordinate over a wireless network is a promising approach.
In this paper, a novel semantic-aware CDRL method is proposed to enable a group of untrained agents with semantically-linked DRL tasks to collaborate efficiently across a resource-constrained wireless cellular network.
arXiv Detail & Related papers (2021-11-23T18:24:47Z) - Reinforcement Learning for Datacenter Congestion Control [50.225885814524304]
Successful congestion control algorithms can dramatically improve latency and overall network throughput.
Until today, no such learning-based algorithms have shown practical potential in this domain.
We devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks.
We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training.
arXiv Detail & Related papers (2021-02-18T13:49:28Z) - Pareto Deterministic Policy Gradients and Its Application in 5G Massive
MIMO Networks [32.099949375036495]
We consider jointly optimizing cell load balance and network throughput via a reinforcement learning (RL) approach.
Our rationale behind using RL is to circumvent the challenges of analytically modeling user mobility and network dynamics.
To accomplish this joint optimization, we integrate vector rewards into the RL value network and conduct RL action via a separate policy network.
arXiv Detail & Related papers (2020-12-02T15:35:35Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance
Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks.
This problem is still not fully understood, for which two major challenges need to be addressed.
We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z) - Efficient Deep Reinforcement Learning via Adaptive Policy Transfer [50.51637231309424]
Policy Transfer Framework (PTF) is proposed to accelerate Reinforcement Learning (RL)
Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it.
Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods.
arXiv Detail & Related papers (2020-02-19T07:30:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.