How do Offline Measures for Exploration in Reinforcement Learning
behave?
- URL: http://arxiv.org/abs/2010.15533v1
- Date: Thu, 29 Oct 2020 12:58:30 GMT
- Title: How do Offline Measures for Exploration in Reinforcement Learning
behave?
- Authors: Jakob J. Hollenstein, Sayantan Auddy, Matteo Saveriano, Erwan Renaudo,
Justus Piater
- Abstract summary: We compare the behavior of three data-based, offline exploration metrics and highlight problems to be aware of when using them.
We propose a fourth metric, relative entropy, and implement it using either a k-nearest-neighbor or a nearest-neighbor-uniform estimator.
- Score: 5.573543601558405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sufficient exploration is paramount for the success of a reinforcement
learning agent. Yet, exploration is rarely assessed in an algorithm-independent
way. We compare the behavior of three data-based, offline exploration metrics
described in the literature on intuitive simple distributions and highlight
problems to be aware of when using them. We propose a fourth metric,uniform
relative entropy, and implement it using either a k-nearest-neighbor or a
nearest-neighbor-ratio estimator, highlighting that the implementation choices
have a profound impact on these measures.
Related papers
- Reinforcement Learning via Implicit Imitation Guidance [49.88208134736617]
A natural approach is to incorporate an imitation learning objective, either as regularization during training or to acquire a reference policy.<n>We propose to use prior data solely for guiding exploration via noise added to the policy, sidestepping the need for explicit behavior cloning constraints.<n>Our approach achieves up to 2-3x improvement over prior reinforcement learning from offline methods across seven simulated continuous control tasks.
arXiv Detail & Related papers (2025-06-09T07:32:52Z) - Discovering and Exploiting Sparse Rewards in a Learned Behavior Space [0.46736439782713946]
Learning optimal policies in sparse rewards settings is difficult as the learning agent has little to no feedback on the quality of its actions.
We introduce STAX, an algorithm designed to learn a behavior space on-the-fly and to explore it while efficiently optimizing any reward discovered.
arXiv Detail & Related papers (2021-11-02T22:21:11Z) - Residual Overfit Method of Exploration [78.07532520582313]
We propose an approximate exploration methodology based on fitting only two point estimates, one tuned and one overfit.
The approach drives exploration towards actions where the overfit model exhibits the most overfitting compared to the tuned model.
We compare ROME against a set of established contextual bandit methods on three datasets and find it to be one of the best performing.
arXiv Detail & Related papers (2021-10-06T17:05:33Z) - A Survey of Exploration Methods in Reinforcement Learning [64.01676570654234]
Reinforcement learning agents depend crucially on exploration to obtain informative data for the learning process.
In this article, we provide a survey of modern exploration methods in (Sequential) reinforcement learning, as well as a taxonomy of exploration methods.
arXiv Detail & Related papers (2021-09-01T02:36:14Z) - Combining Online Learning and Offline Learning for Contextual Bandits
with Deficient Support [53.11601029040302]
Current offline-policy learning algorithms are mostly based on inverse propensity score (IPS) weighting.
We propose a novel approach that uses a hybrid of offline learning with online exploration.
Our approach determines an optimal policy with theoretical guarantees using the minimal number of online explorations.
arXiv Detail & Related papers (2021-07-24T05:07:43Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Self-Supervised Metric Learning in Multi-View Data: A Downstream Task
Perspective [2.01243755755303]
We study how self-supervised metric learning can benefit downstream tasks in the context of multi-view data.
We show that the target distance of metric learning satisfies several desired properties for the downstream tasks.
Our analysis characterizes the improvement by self-supervised metric learning on four commonly used downstream tasks.
arXiv Detail & Related papers (2021-06-14T02:34:33Z) - Metric Learning for Session-based Recommendations [3.706222947143855]
We discuss and compare metric learning approaches to commonly used learning-to-rank methods.
We propose a simple architecture for problem analysis and demonstrate that neither extensively big nor deep architectures are necessary.
arXiv Detail & Related papers (2021-01-07T17:51:04Z) - Active Learning for Bayesian 3D Hand Pose Estimation [53.99104862192055]
We propose a Bayesian approximation to a deep learning architecture for 3D hand pose estimation.
Through this framework, we explore and analyse the two types of uncertainties that are influenced either by data or by the learning capability.
arXiv Detail & Related papers (2020-10-01T21:36:26Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action
Recognition [0.0]
We propose a metric learning approach to reduce the action recognition problem to a nearest neighbor search in embedding space.
We encode signals into images and extract features using a deep residual CNN.
The resulting encoder transforms features into an embedding space in which closer distances encode similar actions while higher distances encode different actions.
arXiv Detail & Related papers (2020-04-23T11:28:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.