System-1.x: Learning to Balance Fast and Slow Planning with Language Models
- URL: http://arxiv.org/abs/2407.14414v1
- Date: Fri, 19 Jul 2024 15:40:59 GMT
- Title: System-1.x: Learning to Balance Fast and Slow Planning with Language Models
- Authors: Swarnadeep Saha, Archiki Prasad, Justin Chih-Yao Chen, Peter Hase, Elias Stengel-Eskin, Mohit Bansal,
- Abstract summary: Language models can be used to solve long-horizon planning problems in two distinct modes.
A fast 'System-1' mode, directly generating plans without any explicit search or backtracking, and a slow 'System-2' mode, planning step-by-step.
We propose the System-1.x Planner, a controllable planning framework with LLMs.
- Score: 68.77277620915143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models can be used to solve long-horizon planning problems in two distinct modes: a fast 'System-1' mode, directly generating plans without any explicit search or backtracking, and a slow 'System-2' mode, planning step-by-step by explicitly searching over possible actions. While System-2 is typically more effective, it is also more computationally expensive, making it infeasible for long plans or large action spaces. Moreover, isolated System-1 or 2 ignores the user's end goals, failing to provide ways to control the model's behavior. To this end, we propose the System-1.x Planner, a controllable planning framework with LLMs that is capable of generating hybrid plans and balancing between the two planning modes based on the difficulty of the problem at hand. System-1.x consists of (i) a controller, (ii) a System-1 Planner, and (iii) a System-2 Planner. Based on a user-specified hybridization factor (x) governing the mixture between System-1 and 2, the controller decomposes a problem into sub-goals, and classifies them as easy or hard to be solved by either System-1 or 2, respectively. We fine-tune all three components on top of a single base LLM, requiring only search traces as supervision. Experiments with two diverse planning tasks -- Maze Navigation and Blocksworld -- show that our System-1.x Planner outperforms a System-1 Planner, a System-2 Planner trained to approximate A* search, and also a symbolic planner (A*). We demonstrate the following key properties of our planner: (1) controllability: increasing the hybridization factor (e.g., System-1.75 vs 1.5) performs more search, improving performance, (2) flexibility: by building a neuro-symbolic variant with a neural System-1 and a symbolic System-2, we can use existing symbolic methods, and (3) generalizability: by being able to learn from different search algorithms, our method is robust to the choice of search algorithm.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - When is Tree Search Useful for LLM Planning? It Depends on the Discriminator [15.75807429396126]
Large language models (LLMs) solve multi-step problems under a language agent framework with three components: a generator, a discriminator, and a planning method.
We present a comprehensive analysis of how discrimination accuracy affects the overall performance of agents when using advanced planning methods.
arXiv Detail & Related papers (2024-02-16T18:45:58Z) - Layered controller synthesis for dynamic multi-agent systems [0.0]
We present a layered approach for multi-agent control problem, decomposed into three stages.
We use SWA-SMT solutions as the initial training dataset for our last stage, which aims at obtaining a neural network control policy.
arXiv Detail & Related papers (2023-07-13T13:56:27Z) - Matching Pursuit Based Scheduling for Over-the-Air Federated Learning [67.59503935237676]
This paper develops a class of low-complexity device scheduling algorithms for over-the-air learning via the method of federated learning.
Compared to the state-of-the-art proposed scheme, the proposed scheme poses a drastically lower efficiency system.
The efficiency of the proposed scheme is confirmed via experiments on the CIFAR dataset.
arXiv Detail & Related papers (2022-06-14T08:14:14Z) - Extended Task and Motion Planning of Long-horizon Robot Manipulation [28.951816622135922]
Task and Motion Planning (TAMP) requires integration of symbolic reasoning with metric motion planning.
Most TAMP approaches fail to provide feasible solutions when there is missing knowledge about the environment at the symbolic level.
We propose a novel approach for decision-making on extended decision spaces over plan skeletons and action parameters.
arXiv Detail & Related papers (2021-03-09T14:44:08Z) - Interleaving Fast and Slow Decision Making [7.41244589428771]
Kahneman proposes that we use two different styles of thinking -- a fast and intuitive System 1 for certain tasks, along with a slower but more analytical System 2 for others.
We propose a novel and general framework which includes a new System 0 to oversee Systems 1 and 2.
We evaluate such a framework on a modified version of the classic Pac-Man game, with an already-trained RL algorithm for System 1, a Monte-Carlo tree search for System 2, and several different possible strategies for System 0.
arXiv Detail & Related papers (2020-10-30T13:16:10Z) - Exploration in two-stage recommender systems [79.50534282841618]
Two-stage recommender systems are widely adopted in industry due to their scalability and maintainability.
A key challenge of this setup is that optimal performance of each stage in isolation does not imply optimal global performance.
We propose a method of synchronising the exploration strategies between the ranker and the nominators.
arXiv Detail & Related papers (2020-09-01T16:52:51Z) - Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning [78.65083326918351]
We consider alternatives to an implicit sequential planning assumption.
We propose Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS) for approximating the optimal plan.
We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds.
arXiv Detail & Related papers (2020-04-23T18:08:58Z) - STRIPS Action Discovery [67.73368413278631]
Recent approaches have shown the success of classical planning at synthesizing action models even when all intermediate states are missing.
We propose a new algorithm to unsupervisedly synthesize STRIPS action models with a classical planner when action signatures are unknown.
arXiv Detail & Related papers (2020-01-30T17:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.