Efficient Exploration in Resource-Restricted Reinforcement Learning
- URL: http://arxiv.org/abs/2212.06988v1
- Date: Wed, 14 Dec 2022 02:50:26 GMT
- Title: Efficient Exploration in Resource-Restricted Reinforcement Learning
- Authors: Zhihai Wang, Taoxing Pan, Qi Zhou, Jie Wang
- Abstract summary: In many real-world applications of reinforcement learning, performing actions requires consuming certain types of resources that are non-replenishable in each episode.
We propose a novel resource-aware exploration bonus (RAEB) to make reasonable usage of resources.
RAEB significantly outperforms state-of-the-art exploration strategies in resource-restricted reinforcement learning environments.
- Score: 6.463999435780127
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many real-world applications of reinforcement learning (RL), performing
actions requires consuming certain types of resources that are
non-replenishable in each episode. Typical applications include robotic control
with limited energy and video games with consumable items. In tasks with
non-replenishable resources, we observe that popular RL methods such as soft
actor critic suffer from poor sample efficiency. The major reason is that, they
tend to exhaust resources fast and thus the subsequent exploration is severely
restricted due to the absence of resources. To address this challenge, we first
formalize the aforementioned problem as a resource-restricted reinforcement
learning, and then propose a novel resource-aware exploration bonus (RAEB) to
make reasonable usage of resources. An appealing feature of RAEB is that, it
can significantly reduce unnecessary resource-consuming trials while
effectively encouraging the agent to explore unvisited states. Experiments
demonstrate that the proposed RAEB significantly outperforms state-of-the-art
exploration strategies in resource-restricted reinforcement learning
environments, improving the sample efficiency by up to an order of magnitude.
Related papers
- Dynamics of Resource Allocation in O-RANs: An In-depth Exploration of On-Policy and Off-Policy Deep Reinforcement Learning for Real-Time Applications [0.6752538702870792]
This paper investigates the application of two DRL models, on-policy and off-policy, in the field of resource allocation for Open Radio Access Networks (O-RAN)
Motivated by the original work of Nessrine Hammami and Kim Khoa Nguyen, this study is a replication to validate and prove the findings.
arXiv Detail & Related papers (2024-11-17T17:46:40Z) - Towards Sample-Efficiency and Generalization of Transfer and Inverse Reinforcement Learning: A Comprehensive Literature Review [50.67937325077047]
This paper is devoted to a comprehensive review of realizing the sample efficiency and generalization of RL algorithms through transfer and inverse reinforcement learning (T-IRL)
Our findings denote that a majority of recent research works have dealt with the aforementioned challenges by utilizing human-in-the-loop and sim-to-real strategies.
Under the IRL structure, training schemes that require a low number of experience transitions and extension of such frameworks to multi-agent and multi-intention problems have been the priority of researchers in recent years.
arXiv Detail & Related papers (2024-11-15T15:18:57Z) - LMGT: Optimizing Exploration-Exploitation Balance in Reinforcement Learning through Language Model Guided Trade-offs [27.014415210732103]
We introduce textbfLanguage textbfModel textbfGuided textbfTrade-offs (i.e., textbfLMGT), a novel, sample-efficient framework for Reinforcement Learning.
arXiv Detail & Related papers (2024-09-07T07:40:43Z) - Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - Proactive Resource Request for Disaster Response: A Deep Learning-based
Optimization Model [0.2580765958706854]
We develop a new resource management problem that proactively decides optimal quantities of requested resources.
We take salient characteristics of the problem into consideration and develop a novel deep learning method for future demand prediction.
We demonstrate the superior performance of our method over prevalent existing methods using both real world and simulated data.
arXiv Detail & Related papers (2023-07-31T13:44:01Z) - Operating critical machine learning models in resource constrained
regimes [0.18416014644193066]
We investigate the trade-off between resource consumption and performance of deep learning models.
Deep learning models are used in critical settings such as in clinics.
arXiv Detail & Related papers (2023-03-17T12:02:08Z) - The Cost of Learning: Efficiency vs. Efficacy of Learning-Based RRM for
6G [10.28841351455586]
Deep Reinforcement Learning (DRL) has become a valuable solution to automatically learn efficient resource management strategies in complex networks.
In many scenarios, the learning task is performed in the Cloud, while experience samples are generated directly by edge nodes or users.
This creates a friction between the need to speed up convergence towards an effective strategy, which requires the allocation of resources to transmit learning samples.
We propose a dynamic balancing strategy between the learning and data planes, which allows the centralized learning agent to quickly converge to an efficient resource allocation strategy.
arXiv Detail & Related papers (2022-11-30T11:26:01Z) - Efficient Methods for Natural Language Processing: A Survey [76.34572727185896]
This survey synthesizes and relates current methods and findings in efficient NLP.
We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for developing more efficient methods.
arXiv Detail & Related papers (2022-08-31T20:32:35Z) - Persistent Reinforcement Learning via Subgoal Curricula [114.83989499740193]
Value-accelerated Persistent Reinforcement Learning (VaPRL) generates a curriculum of initial states.
VaPRL reduces the interventions required by three orders of magnitude compared to episodic reinforcement learning.
arXiv Detail & Related papers (2021-07-27T16:39:45Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Hierarchical Adaptive Contextual Bandits for Resource Constraint based
Recommendation [49.69139684065241]
Contextual multi-armed bandit (MAB) achieves cutting-edge performance on a variety of problems.
In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.
arXiv Detail & Related papers (2020-04-02T17:04:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.