GrASP: Gradient-Based Affordance Selection for Planning
- URL: http://arxiv.org/abs/2202.04772v1
- Date: Tue, 8 Feb 2022 03:24:36 GMT
- Title: GrASP: Gradient-Based Affordance Selection for Planning
- Authors: Vivek Veeriah, Zeyu Zheng, Richard Lewis, Satinder Singh
- Abstract summary: Planning with a learned model is arguably a key component of intelligence.
We present a method for selecting affordances useful for planning.
We show that it is feasible to learn to select both primitive-action and option affordances.
- Score: 25.548880832898757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Planning with a learned model is arguably a key component of intelligence.
There are several challenges in realizing such a component in large-scale
reinforcement learning (RL) problems. One such challenge is dealing effectively
with continuous action spaces when using tree-search planning (e.g., it is not
feasible to consider every action even at just the root node of the tree). In
this paper we present a method for selecting affordances useful for planning --
for learning which small number of actions/options from a continuous space of
actions/options to consider in the tree-expansion process during planning. We
consider affordances that are goal-and-state-conditional mappings to
actions/options as well as unconditional affordances that simply select
actions/options available in all states. Our selection method is gradient
based: we compute gradients through the planning procedure to update the
parameters of the function that represents affordances. Our empirical work
shows that it is feasible to learn to select both primitive-action and option
affordances, and that simultaneously learning to select affordances and
planning with a learned value-equivalent model can outperform model-free RL.
Related papers
- Decision-Focused Learning to Predict Action Costs for Planning [6.729103498871947]
Decision-Focused Learning (DFL) has been successful in learning to predict the parameters of optimization problems.
This paper investigates the challenges of implementing DFL for automated planning in order to learn to predict the action costs.
arXiv Detail & Related papers (2024-08-13T13:14:54Z) - Tree-Planner: Efficient Close-loop Task Planning with Large Language Models [63.06270302774049]
Tree-Planner reframes task planning with Large Language Models into three distinct phases.
Tree-Planner achieves state-of-the-art performance while maintaining high efficiency.
arXiv Detail & Related papers (2023-10-12T17:59:50Z) - Learning To Cut By Looking Ahead: Cutting Plane Selection via Imitation
Learning [80.45697245527019]
We show that a greedy selection rule explicitly looking ahead to select cuts that yield the best bound improvement delivers strong decisions for cut selection.
We propose a new neural architecture (NeuralCut) for imitation learning on the lookahead expert.
arXiv Detail & Related papers (2022-06-27T16:07:27Z) - Provably Efficient Lifelong Reinforcement Learning with Linear Function
Approximation [41.460894569204065]
We study lifelong reinforcement learning (RL) in a regret setting of linear contextual Markov decision process (MDP)
We propose an algorithm, called UCB Lifelong Value Distillation (UCBlvd), that provably achieves sublinear regret for any sequence of tasks.
arXiv Detail & Related papers (2022-06-01T06:53:28Z) - Reward-Respecting Subtasks for Model-Based Reinforcement Learning [13.906158484935098]
Reinforcement learning must include planning with a model of the world that is abstract in state and time.
One reason for this is that the space of possible options is immense, and the methods previously proposed for option discovery do not take into account how the option models will be used in planning.
We show that option models obtained from reward-respecting subtasks are much more likely to be useful in planning than eigenoptions, shortest path options based on bottleneck states, or reward-respecting options generated by the option-critic.
arXiv Detail & Related papers (2022-02-07T19:09:27Z) - The Paradox of Choice: Using Attention in Hierarchical Reinforcement
Learning [59.777127897688594]
We present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options.
We investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices.
arXiv Detail & Related papers (2022-01-24T13:18:02Z) - Learning Models as Functionals of Signed-Distance Fields for
Manipulation Planning [51.74463056899926]
This work proposes an optimization-based manipulation planning framework where the objectives are learned functionals of signed-distance fields that represent objects in the scene.
We show that representing objects as signed-distance fields not only enables to learn and represent a variety of models with higher accuracy compared to point-cloud and occupancy measure representations.
arXiv Detail & Related papers (2021-10-02T12:36:58Z) - Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization.
Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z) - Think Too Fast Nor Too Slow: The Computational Trade-off Between
Planning And Reinforcement Learning [6.26592851697969]
Planning and reinforcement learning are two key approaches to sequential decision making.
We show that the trade-off between planning and learning is of key importance.
We identify a new spectrum of planning-learning algorithms which ranges from exhaustive search (long planning) to model-free RL (no planning), with optimal performance achieved midway.
arXiv Detail & Related papers (2020-05-15T08:20:08Z) - Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning [78.65083326918351]
We consider alternatives to an implicit sequential planning assumption.
We propose Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS) for approximating the optimal plan.
We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds.
arXiv Detail & Related papers (2020-04-23T18:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.