Sub-Goal Trees -- a Framework for Goal-Based Reinforcement Learning
- URL: http://arxiv.org/abs/2002.12361v2
- Date: Mon, 21 Dec 2020 15:42:22 GMT
- Title: Sub-Goal Trees -- a Framework for Goal-Based Reinforcement Learning
- Authors: Tom Jurgenson, Or Avner, Edward Groshev, Aviv Tamar
- Abstract summary: Many AI problems, in robotics and other domains, are goal-based, essentially seeking trajectories leading to various goal states.
We propose a new RL framework, derived from a dynamic programming equation for the all pairs shortest path (APSP) problem.
We show that this approach has computational benefits for both standard and approximate dynamic programming.
- Score: 20.499747716864686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many AI problems, in robotics and other domains, are goal-based, essentially
seeking trajectories leading to various goal states. Reinforcement learning
(RL), building on Bellman's optimality equation, naturally optimizes for a
single goal, yet can be made multi-goal by augmenting the state with the goal.
Instead, we propose a new RL framework, derived from a dynamic programming
equation for the all pairs shortest path (APSP) problem, which naturally solves
multi-goal queries. We show that this approach has computational benefits for
both standard and approximate dynamic programming. Interestingly, our
formulation prescribes a novel protocol for computing a trajectory: instead of
predicting the next state given its predecessor, as in standard RL, a
goal-conditioned trajectory is constructed by first predicting an intermediate
state between start and goal, partitioning the trajectory into two. Then,
recursively, predicting intermediate points on each sub-segment, until a
complete trajectory is obtained. We call this trajectory structure a sub-goal
tree. Building on it, we additionally extend the policy gradient methodology to
recursively predict sub-goals, resulting in novel goal-based algorithms.
Finally, we apply our method to neural motion planning, where we demonstrate
significant improvements compared to standard RL on navigating a 7-DoF robot
arm between obstacles.
Related papers
- GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models [31.628341050846768]
Goal-conditioned Offline Planning (GOPlan) is a novel model-based framework that contains two key phases.
GOPlan pretrains a prior policy capable of capturing multi-modal action distribution within the multi-goal dataset.
The reanalysis method generates high-quality imaginary data by planning with learned models for both intra-trajectory and inter-trajectory goals.
arXiv Detail & Related papers (2023-10-30T21:19:52Z) - Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy.
We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - Long-Horizon Visual Planning with Goal-Conditioned Hierarchical
Predictors [124.30562402952319]
The ability to predict and plan into the future is fundamental for agents acting in the world.
Current learning approaches for visual prediction and planning fail on long-horizon tasks.
We propose a framework for visual prediction and planning that is able to overcome both of these limitations.
arXiv Detail & Related papers (2020-06-23T17:58:56Z) - PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals [14.315501760755609]
PlanGAN is a model-based algorithm for solving multi-goal tasks in environments with sparse rewards.
Our studies indicate that PlanGAN can achieve comparable performance whilst being around 4-8 times more sample efficient.
arXiv Detail & Related papers (2020-06-01T12:53:09Z) - Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning [78.65083326918351]
We consider alternatives to an implicit sequential planning assumption.
We propose Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS) for approximating the optimal plan.
We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds.
arXiv Detail & Related papers (2020-04-23T18:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.