Efficient Exploration in Resource-Restricted Reinforcement Learning
- URL: http://arxiv.org/abs/2212.06988v1
- Date: Wed, 14 Dec 2022 02:50:26 GMT
- Title: Efficient Exploration in Resource-Restricted Reinforcement Learning
- Authors: Zhihai Wang, Taoxing Pan, Qi Zhou, Jie Wang
- Abstract summary: In many real-world applications of reinforcement learning, performing actions requires consuming certain types of resources that are non-replenishable in each episode.
We propose a novel resource-aware exploration bonus (RAEB) to make reasonable usage of resources.
RAEB significantly outperforms state-of-the-art exploration strategies in resource-restricted reinforcement learning environments.
- Score: 6.463999435780127
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many real-world applications of reinforcement learning (RL), performing
actions requires consuming certain types of resources that are
non-replenishable in each episode. Typical applications include robotic control
with limited energy and video games with consumable items. In tasks with
non-replenishable resources, we observe that popular RL methods such as soft
actor critic suffer from poor sample efficiency. The major reason is that, they
tend to exhaust resources fast and thus the subsequent exploration is severely
restricted due to the absence of resources. To address this challenge, we first
formalize the aforementioned problem as a resource-restricted reinforcement
learning, and then propose a novel resource-aware exploration bonus (RAEB) to
make reasonable usage of resources. An appealing feature of RAEB is that, it
can significantly reduce unnecessary resource-consuming trials while
effectively encouraging the agent to explore unvisited states. Experiments
demonstrate that the proposed RAEB significantly outperforms state-of-the-art
exploration strategies in resource-restricted reinforcement learning
environments, improving the sample efficiency by up to an order of magnitude.
Related papers
- LMGT: Optimizing Exploration-Exploitation Balance in Reinforcement Learning through Language Model Guided Trade-offs [27.014415210732103]
We introduce textbfLanguage textbfModel textbfGuided textbfTrade-offs (i.e., textbfLMGT), a novel, sample-efficient framework for Reinforcement Learning.
arXiv Detail & Related papers (2024-09-07T07:40:43Z) - Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - Proactive Resource Request for Disaster Response: A Deep Learning-based
Optimization Model [0.2580765958706854]
We develop a new resource management problem that proactively decides optimal quantities of requested resources.
We take salient characteristics of the problem into consideration and develop a novel deep learning method for future demand prediction.
We demonstrate the superior performance of our method over prevalent existing methods using both real world and simulated data.
arXiv Detail & Related papers (2023-07-31T13:44:01Z) - Operating critical machine learning models in resource constrained
regimes [0.18416014644193066]
We investigate the trade-off between resource consumption and performance of deep learning models.
Deep learning models are used in critical settings such as in clinics.
arXiv Detail & Related papers (2023-03-17T12:02:08Z) - The Cost of Learning: Efficiency vs. Efficacy of Learning-Based RRM for
6G [10.28841351455586]
Deep Reinforcement Learning (DRL) has become a valuable solution to automatically learn efficient resource management strategies in complex networks.
In many scenarios, the learning task is performed in the Cloud, while experience samples are generated directly by edge nodes or users.
This creates a friction between the need to speed up convergence towards an effective strategy, which requires the allocation of resources to transmit learning samples.
We propose a dynamic balancing strategy between the learning and data planes, which allows the centralized learning agent to quickly converge to an efficient resource allocation strategy.
arXiv Detail & Related papers (2022-11-30T11:26:01Z) - Actively Learning Costly Reward Functions for Reinforcement Learning [56.34005280792013]
We show that it is possible to train agents in complex real-world environments orders of magnitudes faster.
By enabling the application of reinforcement learning methods to new domains, we show that we can find interesting and non-trivial solutions.
arXiv Detail & Related papers (2022-11-23T19:17:20Z) - Efficient Methods for Natural Language Processing: A Survey [76.34572727185896]
This survey synthesizes and relates current methods and findings in efficient NLP.
We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for developing more efficient methods.
arXiv Detail & Related papers (2022-08-31T20:32:35Z) - Active Learning for Argument Mining: A Practical Approach [2.535271349350579]
We show that Active Learning considerably decreases the effort necessary to get good deep learning performance on the task of Argument Unit Recognition and Classification (AURC)
Active Learning reduces the amount of data necessary for the training of machine learning models by querying the most informative samples for annotation and therefore is a promising method for resource creation.
arXiv Detail & Related papers (2021-09-28T10:58:47Z) - Persistent Reinforcement Learning via Subgoal Curricula [114.83989499740193]
Value-accelerated Persistent Reinforcement Learning (VaPRL) generates a curriculum of initial states.
VaPRL reduces the interventions required by three orders of magnitude compared to episodic reinforcement learning.
arXiv Detail & Related papers (2021-07-27T16:39:45Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Hierarchical Adaptive Contextual Bandits for Resource Constraint based
Recommendation [49.69139684065241]
Contextual multi-armed bandit (MAB) achieves cutting-edge performance on a variety of problems.
In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.
arXiv Detail & Related papers (2020-04-02T17:04:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.