Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via
Latent Model Ensembles
- URL: http://arxiv.org/abs/2010.14641v3
- Date: Sat, 11 Dec 2021 17:51:25 GMT
- Title: Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via
Latent Model Ensembles
- Authors: Tim Seyde, Wilko Schwarting, Sertac Karaman, Daniela Rus
- Abstract summary: This paper presents Latent Optimistic Value Exploration (LOVE), a strategy that enables deep exploration through optimism in the face of uncertain long-term rewards.
We combine latent world models with value function estimation to predict infinite-horizon returns and recover associated uncertainty via ensembling.
We apply LOVE to visual robot control tasks in continuous action spaces and demonstrate on average more than 20% improved sample efficiency in comparison to state-of-the-art and other exploration objectives.
- Score: 73.15950858151594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning complex robot behaviors through interaction requires structured
exploration. Planning should target interactions with the potential to optimize
long-term performance, while only reducing uncertainty where conducive to this
objective. This paper presents Latent Optimistic Value Exploration (LOVE), a
strategy that enables deep exploration through optimism in the face of
uncertain long-term rewards. We combine latent world models with value function
estimation to predict infinite-horizon returns and recover associated
uncertainty via ensembling. The policy is then trained on an upper confidence
bound (UCB) objective to identify and select the interactions most promising to
improve long-term performance. We apply LOVE to visual robot control tasks in
continuous action spaces and demonstrate on average more than 20% improved
sample efficiency in comparison to state-of-the-art and other exploration
objectives. In sparse and hard to explore environments we achieve an average
improvement of over 30%.
Related papers
- Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling [11.478146371965984]
We propose a practical, theoretically grounded approach to optimistic exploration based on Thompson sampling.
Our experiments demonstrate that optimistic exploration significantly accelerates learning in environments with sparse rewards.
Furthermore, we provide insights into when optimism is beneficial and emphasize the critical role of model uncertainty in guiding exploration.
arXiv Detail & Related papers (2024-10-07T12:42:51Z) - Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents [49.85633804913796]
We present an exploration-based trajectory optimization approach, referred to as ETO.
This learning method is designed to enhance the performance of open LLM agents.
Our experiments on three complex tasks demonstrate that ETO consistently surpasses baseline performance by a large margin.
arXiv Detail & Related papers (2024-03-04T21:50:29Z) - Enhancing Robotic Navigation: An Evaluation of Single and
Multi-Objective Reinforcement Learning Strategies [0.9208007322096532]
This study presents a comparative analysis between single-objective and multi-objective reinforcement learning methods for training a robot to navigate effectively to an end goal.
By modifying the reward function to return a vector of rewards, each pertaining to a distinct objective, the robot learns a policy that effectively balances the different goals.
arXiv Detail & Related papers (2023-12-13T08:00:26Z) - Goal-conditioned Offline Planning from Curious Exploration [28.953718733443143]
We consider the challenge of extracting goal-conditioned behavior from the products of unsupervised exploration techniques.
We find that conventional goal-conditioned reinforcement learning approaches for extracting a value function and policy fall short in this difficult offline setting.
In order to mitigate their occurrence, we propose to combine model-based planning over learned value landscapes with a graph-based value aggregation scheme.
arXiv Detail & Related papers (2023-11-28T17:48:18Z) - Landmark Guided Active Exploration with State-specific Balance Coefficient [4.539657469634845]
We design a measure of prospect for sub-goals by planning in the goal space based on the goal-conditioned value function.
We propose a landmark-guided exploration strategy by integrating the measures of prospect and novelty.
arXiv Detail & Related papers (2023-06-30T08:54:47Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Learning Long-term Visual Dynamics with Region Proposal Interaction
Networks [75.06423516419862]
We build object representations that can capture inter-object and object-environment interactions over a long-range.
Thanks to the simple yet effective object representation, our approach outperforms prior methods by a significant margin.
arXiv Detail & Related papers (2020-08-05T17:48:00Z) - Maximum Entropy Gain Exploration for Long Horizon Multi-goal
Reinforcement Learning [35.44552072132894]
We argue that a learning agent should set its own intrinsic goals that maximize the entropy of the historical achieved goal distribution.
We show that our strategy achieves an order of magnitude better sample efficiency than the prior state of the art on long-horizon multi-goal tasks.
arXiv Detail & Related papers (2020-07-06T15:36:05Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.