Think Too Fast Nor Too Slow: The Computational Trade-off Between
Planning And Reinforcement Learning
- URL: http://arxiv.org/abs/2005.07404v1
- Date: Fri, 15 May 2020 08:20:08 GMT
- Title: Think Too Fast Nor Too Slow: The Computational Trade-off Between
Planning And Reinforcement Learning
- Authors: Thomas M. Moerland, Anna Deichler, Simone Baldi, Joost Broekens and
Catholijn M. Jonker
- Abstract summary: Planning and reinforcement learning are two key approaches to sequential decision making.
We show that the trade-off between planning and learning is of key importance.
We identify a new spectrum of planning-learning algorithms which ranges from exhaustive search (long planning) to model-free RL (no planning), with optimal performance achieved midway.
- Score: 6.26592851697969
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Planning and reinforcement learning are two key approaches to sequential
decision making. Multi-step approximate real-time dynamic programming, a
recently successful algorithm class of which AlphaZero [Silver et al., 2018] is
an example, combines both by nesting planning within a learning loop. However,
the combination of planning and learning introduces a new question: how should
we balance time spend on planning, learning and acting? The importance of this
trade-off has not been explicitly studied before. We show that it is actually
of key importance, with computational results indicating that we should neither
plan too long nor too short. Conceptually, we identify a new spectrum of
planning-learning algorithms which ranges from exhaustive search (long
planning) to model-free RL (no planning), with optimal performance achieved
midway.
Related papers
- A New View on Planning in Online Reinforcement Learning [19.35031543927374]
This paper investigates a new approach to model-based reinforcement learning using background planning.
We show that our GSP algorithm can propagate value from an abstract space in a manner that helps a variety of base learners learn significantly faster in different domains.
arXiv Detail & Related papers (2024-06-03T17:45:19Z) - LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning [65.86754998249224]
We develop a novel hybrid planner that leverages a conventional rule-based planner in conjunction with an LLM-based planner.
Our approach navigates complex scenarios which existing planners struggle with, produces well-reasoned outputs while also remaining grounded through working alongside the rule-based approach.
arXiv Detail & Related papers (2023-12-30T02:53:45Z) - Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems.
We propose a task-agnostic method named 'planning as in-painting'
The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z) - PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning [77.03847056008598]
PlaSma is a novel two-pronged approach to endow small language models with procedural knowledge and (constrained) language planning capabilities.
We develop symbolic procedural knowledge distillation to enhance the commonsense knowledge in small language models and an inference-time algorithm to facilitate more structured and accurate reasoning.
arXiv Detail & Related papers (2023-05-31T00:55:40Z) - PALMER: Perception-Action Loop with Memory for Long-Horizon Planning [1.5469452301122177]
We introduce a general-purpose planning algorithm called PALMER.
Palmer combines classical sampling-based planning algorithms with learning-based perceptual representations.
This creates a tight feedback loop between representation learning, memory, reinforcement learning, and sampling-based planning.
arXiv Detail & Related papers (2022-12-08T22:11:49Z) - Goal-Space Planning with Subgoal Models [18.43265820052893]
This paper investigates a new approach to model-based reinforcement learning using background planning.
We show that our GSP algorithm can propagate value from an abstract space in a manner that helps a variety of base learners learn significantly faster in different domains.
arXiv Detail & Related papers (2022-06-06T20:59:07Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - Learning off-road maneuver plans for autonomous vehicles [0.0]
This thesis explores the benefits machine learning algorithms can bring to online planning and scheduling for autonomous vehicles in off-road situations.
We present a range of learning-baseds to assist different itinerary planners.
In order to synthesize strategies to execute synchronized maneuvers, we propose a novel type of scheduling controllability and a learning-assisted algorithm.
arXiv Detail & Related papers (2021-08-02T16:27:59Z) - Planning with Learned Object Importance in Large Problem Instances using
Graph Neural Networks [28.488201307961624]
Real-world planning problems often involve hundreds or even thousands of objects.
We propose a graph neural network architecture for predicting object importance in a single inference pass.
Our approach treats the planner and transition model as black boxes, and can be used with any off-the-shelf planner.
arXiv Detail & Related papers (2020-09-11T18:55:08Z) - Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning [78.65083326918351]
We consider alternatives to an implicit sequential planning assumption.
We propose Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS) for approximating the optimal plan.
We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds.
arXiv Detail & Related papers (2020-04-23T18:08:58Z) - STRIPS Action Discovery [67.73368413278631]
Recent approaches have shown the success of classical planning at synthesizing action models even when all intermediate states are missing.
We propose a new algorithm to unsupervisedly synthesize STRIPS action models with a classical planner when action signatures are unknown.
arXiv Detail & Related papers (2020-01-30T17:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.