Faded-Experience Trust Region Policy Optimization for Model-Free Power
Allocation in Interference Channel
- URL: http://arxiv.org/abs/2008.01705v1
- Date: Tue, 4 Aug 2020 17:12:29 GMT
- Title: Faded-Experience Trust Region Policy Optimization for Model-Free Power
Allocation in Interference Channel
- Authors: Mohammad G. Khoshkholgh and Halim Yanikomeroglu
- Abstract summary: Policy reinforcement learning techniques enable an agent to learn an optimal action policy through the interactions with the environment.
Inspired by human decision making approach, we work toward enhancing its convergence speed by augmenting the agent to memorize and use the recently learned policies.
Results indicate that with FE-TRPO it is possible to almost double the learning speed compared to TRPO.
- Score: 28.618312473850974
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Policy gradient reinforcement learning techniques enable an agent to directly
learn an optimal action policy through the interactions with the environment.
Nevertheless, despite its advantages, it sometimes suffers from slow
convergence speed. Inspired by human decision making approach, we work toward
enhancing its convergence speed by augmenting the agent to memorize and use the
recently learned policies. We apply our method to the trust-region policy
optimization (TRPO), primarily developed for locomotion tasks, and propose
faded-experience (FE) TRPO. To substantiate its effectiveness, we adopt it to
learn continuous power control in an interference channel when only noisy
location information of devices is available. Results indicate that with
FE-TRPO it is possible to almost double the learning speed compared to TRPO.
Importantly, our method neither increases the learning complexity nor imposes
performance loss.
Related papers
- Diffusion Policies creating a Trust Region for Offline Reinforcement Learning [66.17291150498276]
We introduce a dual policy approach, Diffusion Trusted Q-Learning (DTQL), which comprises a diffusion policy for pure behavior cloning and a practical one-step policy.
DTQL eliminates the need for iterative denoising sampling during both training and inference, making it remarkably computationally efficient.
We show that DTQL could not only outperform other methods on the majority of the D4RL benchmark tasks but also demonstrate efficiency in training and inference speeds.
arXiv Detail & Related papers (2024-05-30T05:04:33Z) - Skill or Luck? Return Decomposition via Advantage Functions [15.967056781224102]
Learning from off-policy data is essential for sample-efficient reinforcement learning.
We show that the advantage function can be understood as the causal effect of an action on the return.
This decomposition enables us to naturally extend Direct Advantage Estimation to off-policy settings.
arXiv Detail & Related papers (2024-02-20T10:09:00Z) - Adversarial Policy Optimization in Deep Reinforcement Learning [16.999444076456268]
The policy represented by the deep neural network can overfitting, which hamper a reinforcement learning agent from learning effective policy.
Data augmentation can provide a performance boost to RL agents by mitigating the effect of overfitting.
We propose a novel RL algorithm to mitigate the above issue and improve the efficiency of the learned policy.
arXiv Detail & Related papers (2023-04-27T21:01:08Z) - Computationally Efficient Reinforcement Learning: Targeted Exploration
leveraging Simple Rules [1.124958340749622]
We propose a simple yet effective modification of continuous actor-critic frameworks to incorporate such rules.
On a room temperature control case study, it allows agents to converge to well-performing policies up to 6-7x faster than classical agents.
arXiv Detail & Related papers (2022-11-30T02:24:42Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - Transferable Deep Reinforcement Learning Framework for Autonomous
Vehicles with Joint Radar-Data Communications [69.24726496448713]
We propose an intelligent optimization framework based on the Markov Decision Process (MDP) to help the AV make optimal decisions.
We then develop an effective learning algorithm leveraging recent advances of deep reinforcement learning techniques to find the optimal policy for the AV.
We show that the proposed transferable deep reinforcement learning framework reduces the obstacle miss detection probability by the AV up to 67% compared to other conventional deep reinforcement learning approaches.
arXiv Detail & Related papers (2021-05-28T08:45:37Z) - Path Design and Resource Management for NOMA enhanced Indoor Intelligent
Robots [58.980293789967575]
A communication enabled indoor intelligent robots (IRs) service framework is proposed.
Lego modeling method is proposed, which can deterministically describe the indoor layout and channel state.
The investigated radio map is invoked as a virtual environment to train the reinforcement learning agent.
arXiv Detail & Related papers (2020-11-23T21:45:01Z) - Optimization-driven Deep Reinforcement Learning for Robust Beamforming
in IRS-assisted Wireless Communications [54.610318402371185]
Intelligent reflecting surface (IRS) is a promising technology to assist downlink information transmissions from a multi-antenna access point (AP) to a receiver.
We minimize the AP's transmit power by a joint optimization of the AP's active beamforming and the IRS's passive beamforming.
We propose a deep reinforcement learning (DRL) approach that can adapt the beamforming strategies from past experiences.
arXiv Detail & Related papers (2020-05-25T01:42:55Z) - Off-Policy Adversarial Inverse Reinforcement Learning [0.0]
Adversarial Imitation Learning (AIL) is a class of algorithms in Reinforcement learning (RL)
We propose an Off-Policy Adversarial Inverse Reinforcement Learning (Off-policy-AIRL) algorithm which is sample efficient as well as gives good imitation performance.
arXiv Detail & Related papers (2020-05-03T16:51:40Z) - Efficient Deep Reinforcement Learning via Adaptive Policy Transfer [50.51637231309424]
Policy Transfer Framework (PTF) is proposed to accelerate Reinforcement Learning (RL)
Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it.
Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods.
arXiv Detail & Related papers (2020-02-19T07:30:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.