Exploration via Planning for Information about the Optimal Trajectory
- URL: http://arxiv.org/abs/2210.04642v1
- Date: Thu, 6 Oct 2022 20:28:55 GMT
- Title: Exploration via Planning for Information about the Optimal Trajectory
- Authors: Viraj Mehta and Ian Char and Joseph Abbate and Rory Conlin and Mark D.
Boyer and Stefano Ermon and Jeff Schneider and Willie Neiswanger
- Abstract summary: We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
- Score: 67.33886176127578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many potential applications of reinforcement learning (RL) are stymied by the
large numbers of samples required to learn an effective policy. This is
especially true when applying RL to real-world control tasks, e.g. in the
sciences or robotics, where executing a policy in the environment is costly. In
popular RL algorithms, agents typically explore either by adding stochasticity
to a reward-maximizing policy or by attempting to gather maximal information
about environment dynamics without taking the given task into account. In this
work, we develop a method that allows us to plan for exploration while taking
both the task and the current knowledge about the dynamics into account. The
key insight to our approach is to plan an action sequence that maximizes the
expected information gain about the optimal trajectory for the task at hand. We
demonstrate that our method learns strong policies with 2x fewer samples than
strong exploration baselines and 200x fewer samples than model free methods on
a diverse set of low-to-medium dimensional control tasks in both the open-loop
and closed-loop control settings.
Related papers
- Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide Exploration [13.053013407015628]
This paper addresses the problem of learning optimal control policies for systems with uncertain dynamics.
We propose an accelerated RL algorithm that can learn control policies significantly faster than competitive approaches.
arXiv Detail & Related papers (2024-10-16T00:53:41Z) - PLANRL: A Motion Planning and Imitation Learning Framework to Bootstrap Reinforcement Learning [13.564676246832544]
We introduce PLANRL, a framework that chooses when the robot should use classical motion planning and when it should learn a policy.
PLANRL switches between two modes of operation: reaching a waypoint using classical techniques when away from the objects and fine-grained manipulation control when about to interact with objects.
We evaluate our approach across multiple challenging simulation environments and real-world tasks, demonstrating superior performance in terms of adaptability, efficiency, and generalization compared to existing methods.
arXiv Detail & Related papers (2024-08-07T19:30:08Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Direct Random Search for Fine Tuning of Deep Reinforcement Learning
Policies [5.543220407902113]
We show that a direct random search is very effective at fine-tuning DRL policies by directly optimizing them using deterministic rollouts.
Our results show that this method yields more consistent and higher performing agents on the environments we tested.
arXiv Detail & Related papers (2021-09-12T20:12:46Z) - Policy Information Capacity: Information-Theoretic Measure for Task
Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty.
We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives.
These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z) - Ready Policy One: World Building Through Active Learning [35.358315617358976]
We introduce Ready Policy One (RP1), a framework that views Model-Based Reinforcement Learning as an active learning problem.
RP1 achieves this by utilizing a hybrid objective function, which crucially adapts during optimization.
We rigorously evaluate our method on a variety of continuous control tasks, and demonstrate statistically significant gains over existing approaches.
arXiv Detail & Related papers (2020-02-07T09:57:53Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.