Related papers: An advantage based policy transfer algorithm for reinforcement learning with measures of transferability

An advantage based policy transfer algorithm for reinforcement learning with measures of transferability

URL: http://arxiv.org/abs/2311.06731v2
Date: Sun, 15 Dec 2024 01:27:10 GMT
Title: An advantage based policy transfer algorithm for reinforcement learning with measures of transferability
Authors: Md Ferdous Alam, Parinaz Naghizadeh, David Hoelzle,
Abstract summary: Reinforcement learning (RL) enables sequential decision-making in complex and high-dimensional environments.<n>This paper proposes an off-policy Advantage-based Policy Transfer algorithm, APT-RL, for fixed domain environments.
Score: 5.926203312586109
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) enables sequential decision-making in complex and high-dimensional environments through interaction with the environment. In most real-world applications, however, a high number of interactions are infeasible. In these environments, transfer RL algorithms, which can be used for the transfer of knowledge from one or multiple source environments to a target environment, have been shown to increase learning speed and improve initial and asymptotic performance. However, most existing transfer RL algorithms are on-policy and sample inefficient, fail in adversarial target tasks, and often require heuristic choices in algorithm design. This paper proposes an off-policy Advantage-based Policy Transfer algorithm, APT-RL, for fixed domain environments. Its novelty is in using the popular notion of ``advantage'' as a regularizer, to weigh the knowledge that should be transferred from the source, relative to new knowledge learned in the target, removing the need for heuristic choices. Further, we propose a new transfer performance measure to evaluate the performance of our algorithm and unify existing transfer RL frameworks. Finally, we present a scalable, theoretically-backed task similarity measurement algorithm to illustrate the alignments between our proposed transferability measure and similarities between source and target environments. We compare APT-RL with several baselines, including existing transfer-RL algorithms, in three high-dimensional continuous control tasks. Our experiments demonstrate that APT-RL outperforms existing transfer RL algorithms and is at least as good as learning from scratch in adversarial tasks.

Related papers

Policy Transfer Ensures Fast Learning for Continuous-Time LQR with Entropy Regularization [2.494814157306265]
Reinforcement Learning (RL) enables agents to learn optimal decision-making strategies through interaction with an environment, yet training from scratch on complex tasks can be highly inefficient.<n>This paper investigates policy transfer, a TL approach that initializes learning in a target RL task using a policy from a related source task.<n>We introduce a novel policy learning algorithm for continuous-time LQRs that achieves global linear and local super-linear convergence.
arXiv Detail & Related papers (2025-10-16T21:57:53Z)
Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z)
Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins. We employ inverse RL (IRL) to automatically learn reward functions without manual tuning. We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z)
Multi-Objective Optimization Using Adaptive Distributed Reinforcement Learning [8.471466670802815]
We propose a multi-objective, multi-agent reinforcement learning (MARL) algorithm with high learning efficiency and low computational requirements. We test our algorithm in an ITS environment with edge cloud computing. Our algorithm also addresses various practical concerns with its modularized and asynchronous online training method.
arXiv Detail & Related papers (2024-03-13T18:05:16Z)
Safe and Accelerated Deep Reinforcement Learning-based O-RAN Slicing: A Hybrid Transfer Learning Approach [20.344810727033327]
We propose and design a hybrid TL-aided approach to provide safe and accelerated convergence in DRL-based O-RAN slicing. The proposed hybrid approach shows at least: 7.7% and 20.7% improvements in the average initial reward value and the percentage of converged scenarios.
arXiv Detail & Related papers (2023-09-13T18:58:34Z)
MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms. Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return. We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z)
Inapplicable Actions Learning for Knowledge Transfer in Reinforcement Learning [3.194414753332705]
We show that learning inapplicable actions greatly improves the sample efficiency of RL algorithms. Thanks to the transferability of the knowledge acquired, it can be reused in other tasks and domains to make the learning process more efficient.
arXiv Detail & Related papers (2022-11-28T17:45:39Z)
Lean Evolutionary Reinforcement Learning by Multitasking with Importance Sampling [20.9680985132322]
We introduce a novel neuroevolutionary multitasking (NuEMT) algorithm to transfer information from a set of auxiliary tasks to the target (full length) RL task. We demonstrate that the NuEMT algorithm data-lean evolutionary RL, reducing expensive agent-environment interaction data requirements.
arXiv Detail & Related papers (2022-03-21T10:06:16Z)
Semantic-Aware Collaborative Deep Reinforcement Learning Over Wireless Cellular Networks [82.02891936174221]
Collaborative deep reinforcement learning (CDRL) algorithms in which multiple agents can coordinate over a wireless network is a promising approach. In this paper, a novel semantic-aware CDRL method is proposed to enable a group of untrained agents with semantically-linked DRL tasks to collaborate efficiently across a resource-constrained wireless cellular network.
arXiv Detail & Related papers (2021-11-23T18:24:47Z)
Adaptive Stochastic ADMM for Decentralized Reinforcement Learning in Edge Industrial IoT [106.83952081124195]
Reinforcement learning (RL) has been widely investigated and shown to be a promising solution for decision-making and optimal control processes. We propose an adaptive ADMM (asI-ADMM) algorithm and apply it to decentralized RL with edge-computing-empowered IIoT networks. Experiment results show that our proposed algorithms outperform the state of the art in terms of communication costs and scalability, and can well adapt to complex IoT environments.
arXiv Detail & Related papers (2021-06-30T16:49:07Z)
Reinforcement Learning for Datacenter Congestion Control [50.225885814524304]
Successful congestion control algorithms can dramatically improve latency and overall network throughput. Until today, no such learning-based algorithms have shown practical potential in this domain. We devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks. We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training.
arXiv Detail & Related papers (2021-02-18T13:49:28Z)
Pareto Deterministic Policy Gradients and Its Application in 5G Massive MIMO Networks [32.099949375036495]
We consider jointly optimizing cell load balance and network throughput via a reinforcement learning (RL) approach. Our rationale behind using RL is to circumvent the challenges of analytically modeling user mobility and network dynamics. To accomplish this joint optimization, we integrate vector rewards into the RL value network and conduct RL action via a separate policy network.
arXiv Detail & Related papers (2020-12-02T15:35:35Z)
Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED) TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model. Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)
FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks. This problem is still not fully understood, for which two major challenges need to be addressed. We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z)
Efficient Deep Reinforcement Learning via Adaptive Policy Transfer [50.51637231309424]
Policy Transfer Framework (PTF) is proposed to accelerate Reinforcement Learning (RL) Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it. Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods.
arXiv Detail & Related papers (2020-02-19T07:30:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.