Related papers: Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via Latent Model Ensembles

Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via Latent Model Ensembles

URL: http://arxiv.org/abs/2010.14641v3
Date: Sat, 11 Dec 2021 17:51:25 GMT
Title: Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via Latent Model Ensembles
Authors: Tim Seyde, Wilko Schwarting, Sertac Karaman, Daniela Rus
Abstract summary: This paper presents Latent Optimistic Value Exploration (LOVE), a strategy that enables deep exploration through optimism in the face of uncertain long-term rewards. We combine latent world models with value function estimation to predict infinite-horizon returns and recover associated uncertainty via ensembling. We apply LOVE to visual robot control tasks in continuous action spaces and demonstrate on average more than 20% improved sample efficiency in comparison to state-of-the-art and other exploration objectives.
Score: 73.15950858151594
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning complex robot behaviors through interaction requires structured exploration. Planning should target interactions with the potential to optimize long-term performance, while only reducing uncertainty where conducive to this objective. This paper presents Latent Optimistic Value Exploration (LOVE), a strategy that enables deep exploration through optimism in the face of uncertain long-term rewards. We combine latent world models with value function estimation to predict infinite-horizon returns and recover associated uncertainty via ensembling. The policy is then trained on an upper confidence bound (UCB) objective to identify and select the interactions most promising to improve long-term performance. We apply LOVE to visual robot control tasks in continuous action spaces and demonstrate on average more than 20% improved sample efficiency in comparison to state-of-the-art and other exploration objectives. In sparse and hard to explore environments we achieve an average improvement of over 30%.

Related papers

Information Gain Is Not All You Need [3.053906384469777]
This paper argues that information gain should not serve as an optimization objective in quality-constrained exploration. We propose a novel, distance advantage, which selects frontiers based on a trade-off between proximity to the robot and remoteness from other frontiers.
arXiv Detail & Related papers (2025-03-28T15:03:52Z)
Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling [11.478146371965984]
We propose a practical, theoretically grounded approach to optimistic exploration based on Thompson sampling. Our experiments demonstrate that optimistic exploration significantly accelerates learning in environments with sparse rewards. Furthermore, we provide insights into when optimism is beneficial and emphasize the critical role of model uncertainty in guiding exploration.
arXiv Detail & Related papers (2024-10-07T12:42:51Z)
Deep Pareto Reinforcement Learning for Multi-Objective Recommender Systems [60.91599969408029]
optimizing multiple objectives simultaneously is an important task for recommendation platforms. Existing multi-objective recommender systems do not systematically consider such dynamic relationships.
arXiv Detail & Related papers (2024-07-04T02:19:49Z)
Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents [49.85633804913796]
We present an exploration-based trajectory optimization approach, referred to as ETO. This learning method is designed to enhance the performance of open LLM agents. Our experiments on three complex tasks demonstrate that ETO consistently surpasses baseline performance by a large margin.
arXiv Detail & Related papers (2024-03-04T21:50:29Z)
Enhancing Robotic Navigation: An Evaluation of Single and Multi-Objective Reinforcement Learning Strategies [0.9208007322096532]
This study presents a comparative analysis between single-objective and multi-objective reinforcement learning methods for training a robot to navigate effectively to an end goal. By modifying the reward function to return a vector of rewards, each pertaining to a distinct objective, the robot learns a policy that effectively balances the different goals.
arXiv Detail & Related papers (2023-12-13T08:00:26Z)
Goal-conditioned Offline Planning from Curious Exploration [28.953718733443143]
We consider the challenge of extracting goal-conditioned behavior from the products of unsupervised exploration techniques. We find that conventional goal-conditioned reinforcement learning approaches for extracting a value function and policy fall short in this difficult offline setting. In order to mitigate their occurrence, we propose to combine model-based planning over learned value landscapes with a graph-based value aggregation scheme.
arXiv Detail & Related papers (2023-11-28T17:48:18Z)
Landmark Guided Active Exploration with State-specific Balance Coefficient [4.539657469634845]
We design a measure of prospect for sub-goals by planning in the goal space based on the goal-conditioned value function. We propose a landmark-guided exploration strategy by integrating the measures of prospect and novelty.
arXiv Detail & Related papers (2023-06-30T08:54:47Z)
Online reinforcement learning with sparse rewards through an active inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future. Our model is capable of solving sparse-reward problems with a very high sample efficiency. We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z)
Learning Long-term Visual Dynamics with Region Proposal Interaction Networks [75.06423516419862]
We build object representations that can capture inter-object and object-environment interactions over a long-range. Thanks to the simple yet effective object representation, our approach outperforms prior methods by a significant margin.
arXiv Detail & Related papers (2020-08-05T17:48:00Z)
Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning [35.44552072132894]
We argue that a learning agent should set its own intrinsic goals that maximize the entropy of the historical achieved goal distribution. We show that our strategy achieves an order of magnitude better sample efficiency than the prior state of the art on long-horizon multi-goal tasks.
arXiv Detail & Related papers (2020-07-06T15:36:05Z)
Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors. In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency. We propose setting up an automatic curriculum for goals that the agent needs to solve. We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.