Think Too Fast Nor Too Slow: The Computational Trade-off Between
Planning And Reinforcement Learning
- URL: http://arxiv.org/abs/2005.07404v1
- Date: Fri, 15 May 2020 08:20:08 GMT
- Title: Think Too Fast Nor Too Slow: The Computational Trade-off Between
Planning And Reinforcement Learning
- Authors: Thomas M. Moerland, Anna Deichler, Simone Baldi, Joost Broekens and
Catholijn M. Jonker
- Abstract summary: Planning and reinforcement learning are two key approaches to sequential decision making.
We show that the trade-off between planning and learning is of key importance.
We identify a new spectrum of planning-learning algorithms which ranges from exhaustive search (long planning) to model-free RL (no planning), with optimal performance achieved midway.
- Score: 6.26592851697969
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Planning and reinforcement learning are two key approaches to sequential
decision making. Multi-step approximate real-time dynamic programming, a
recently successful algorithm class of which AlphaZero [Silver et al., 2018] is
an example, combines both by nesting planning within a learning loop. However,
the combination of planning and learning introduces a new question: how should
we balance time spend on planning, learning and acting? The importance of this
trade-off has not been explicitly studied before. We show that it is actually
of key importance, with computational results indicating that we should neither
plan too long nor too short. Conceptually, we identify a new spectrum of
planning-learning algorithms which ranges from exhaustive search (long
planning) to model-free RL (no planning), with optimal performance achieved
midway.
Related papers
- A New View on Planning in Online Reinforcement Learning [19.35031543927374]
This paper investigates a new approach to model-based reinforcement learning using background planning.
We show that our GSP algorithm can propagate value from an abstract space in a manner that helps a variety of base learners learn significantly faster in different domains.
arXiv Detail & Related papers (2024-06-03T17:45:19Z) - Can Graph Learning Improve Task Planning? [61.47027387839096]
Task planning is emerging as an important research topic alongside the development of large language models (LLMs)
In this paper, we explore graph learning-based methods for task planning.
Our approach complements prompt engineering and fine-tuning techniques, with performance further enhanced by improved prompts or a fine-tuned model.
arXiv Detail & Related papers (2024-05-29T14:26:24Z) - The Road Less Scheduled [75.09232139131437]
Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T.
We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely.
arXiv Detail & Related papers (2024-05-24T16:20:46Z) - Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems.
We propose a task-agnostic method named 'planning as in-painting'
The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z) - PlaSma: Making Small Language Models Better Procedural Knowledge Models
for (Counterfactual) Planning [72.0564921186518]
PlaSma is a novel two-pronged approach to endow small language models with procedural knowledge and (counterfactual) planning capabilities.
More concretely, we develop symbolic procedural knowledge distillation to enhance the implicit knowledge in small language models.
In addition, we introduce a novel task, Counterfactual Planning, that requires a revision of a plan to cope with a counterfactual situation.
arXiv Detail & Related papers (2023-05-31T00:55:40Z) - PALMER: Perception-Action Loop with Memory for Long-Horizon Planning [1.5469452301122177]
We introduce a general-purpose planning algorithm called PALMER.
Palmer combines classical sampling-based planning algorithms with learning-based perceptual representations.
This creates a tight feedback loop between representation learning, memory, reinforcement learning, and sampling-based planning.
arXiv Detail & Related papers (2022-12-08T22:11:49Z) - Understanding Decision-Time vs. Background Planning in Model-Based
Reinforcement Learning [56.50123642237106]
Two prevalent approaches are decision-time planning and background planning.
This study is interested in understanding under what conditions and in which settings one of these two planning styles will perform better than the other.
Overall, our findings suggest that even though decision-time planning does not perform as well as background planning in their classical instantiations, in their modern instantiations, it can perform on par or better than background planning.
arXiv Detail & Related papers (2022-06-16T20:48:19Z) - Goal-Space Planning with Subgoal Models [18.43265820052893]
This paper investigates a new approach to model-based reinforcement learning using background planning.
We show that our GSP algorithm can propagate value from an abstract space in a manner that helps a variety of base learners learn significantly faster in different domains.
arXiv Detail & Related papers (2022-06-06T20:59:07Z) - Learning off-road maneuver plans for autonomous vehicles [0.0]
This thesis explores the benefits machine learning algorithms can bring to online planning and scheduling for autonomous vehicles in off-road situations.
We present a range of learning-baseds to assist different itinerary planners.
In order to synthesize strategies to execute synchronized maneuvers, we propose a novel type of scheduling controllability and a learning-assisted algorithm.
arXiv Detail & Related papers (2021-08-02T16:27:59Z) - Planning with Learned Object Importance in Large Problem Instances using
Graph Neural Networks [28.488201307961624]
Real-world planning problems often involve hundreds or even thousands of objects.
We propose a graph neural network architecture for predicting object importance in a single inference pass.
Our approach treats the planner and transition model as black boxes, and can be used with any off-the-shelf planner.
arXiv Detail & Related papers (2020-09-11T18:55:08Z) - Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning [78.65083326918351]
We consider alternatives to an implicit sequential planning assumption.
We propose Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS) for approximating the optimal plan.
We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds.
arXiv Detail & Related papers (2020-04-23T18:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.