Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
- URL: http://arxiv.org/abs/2402.14083v2
- Date: Fri, 26 Apr 2024 21:05:19 GMT
- Title: Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
- Authors: Lucas Lehnert, Sainbayar Sukhbaatar, DiJia Su, Qinqing Zheng, Paul Mcvay, Michael Rabbat, Yuandong Tian,
- Abstract summary: We train an encoder-decoder Transformer model to predict the search dynamics of the $A*$ search algorithm.
We fine tune this model to obtain a Searchformer, a Transformer model that optimally solves Sokoban puzzles 93.7% of the time.
- Score: 37.59733248822887
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While Transformers have enabled tremendous progress in various application settings, such architectures still trail behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks. This is accomplished by training an encoder-decoder Transformer model to predict the search dynamics of the $A^*$ search algorithm. We fine tune this model to obtain a Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93.7% of the time, while using up to 26.8% fewer search steps than the $A^*$ implementation that was used for training initially. In our training method, $A^*$'s search dynamics are expressed as a token sequence outlining when task states are added and removed into the search tree during symbolic planning. Searchformer significantly outperforms baselines that predict the optimal plan directly with a 5-10$\times$ smaller model size and a 10$\times$ smaller training dataset. Lastly, we demonstrate how Searchformer scales to larger and more complex decision making tasks with improved percentage of solved tasks and shortened search dynamics.
Related papers
- ETS: Efficient Tree Search for Inference-Time Scaling [61.553681244572914]
One promising approach for test-time compute scaling is search against a process reward model.
diversity of trajectories in the tree search process affects the accuracy of the search, since increasing diversity promotes more exploration.
We propose Efficient Tree Search (ETS), which promotes KV sharing by pruning redundant trajectories while maintaining necessary diverse trajectories.
arXiv Detail & Related papers (2025-02-19T09:30:38Z) - Transformers Struggle to Learn to Search [32.231381064112085]
We use the foundational graph connectivity problem as a testbed to generate effectively limitless high-coverage data to train small transformers.
We find that, when given the right training distribution, the transformer is able to learn to search.
We also find that performing search in-context (i.e., chain-of-thought) does not resolve this inability to learn to search on larger graphs.
arXiv Detail & Related papers (2024-12-06T01:29:24Z) - A Large-Scale Exploration of $μ$-Transfer [0.0]
$mu$-Transfer yields scaling rules for model.
introductors and learning rates.
$mu$-Transfer is not yet widely adopted, perhaps due to higher implementation complexity, many variations, or complex theoretical background.
We study models of up to 10B parameters and training budgets of up to 190B tokens, and find $mu$-Transfer works as intended for the majority of important cases, yet also identify a few cases where it may not.
arXiv Detail & Related papers (2024-04-08T17:59:44Z) - Pre-train and Search: Efficient Embedding Table Sharding with
Pre-trained Neural Cost Models [56.65200574282804]
We propose a "pre-train, and search" paradigm for efficient sharding.
NeuroShard pre-trains neural cost models on augmented tables to cover various sharding scenarios.
NeuroShard significantly and consistently outperforms the state-of-the-art on the benchmark sharding dataset.
arXiv Detail & Related papers (2023-05-03T02:52:03Z) - Transformers Learn Shortcuts to Automata [52.015990420075944]
We find that a low-depth Transformer can represent the computations of any finite-state automaton.
We show that a Transformer with $O(log T)$ layers can exactly replicate the computation of an automaton on an input sequence of length $T$.
We further investigate the brittleness of these solutions and propose potential mitigations.
arXiv Detail & Related papers (2022-10-19T17:45:48Z) - CorpusBrain: Pre-train a Generative Retrieval Model for
Knowledge-Intensive Language Tasks [62.22920673080208]
Single-step generative model can dramatically simplify the search process and be optimized in end-to-end manner.
We name the pre-trained generative retrieval model as CorpusBrain as all information about the corpus is encoded in its parameters without the need of constructing additional index.
arXiv Detail & Related papers (2022-08-16T10:22:49Z) - Distilling Transformers for Neural Cross-Domain Search [9.865125804658991]
We argue that sequence-to-sequence models are a conceptually ideal---albeit highly impractical---retriever.
We derive a new distillation objective, implementing it as a data augmentation scheme.
Using natural language source code search as a case study for cross-domain search, we demonstrate the validity of this idea by significantly improving upon the current leader of the CodeSearchNet challenge, a recent natural language code search benchmark.
arXiv Detail & Related papers (2021-08-06T22:30:19Z) - Memory-Efficient Differentiable Transformer Architecture Search [59.47253706925725]
We propose a multi-split reversible network and combine it with DARTS.
Specifically, we devise a backpropagation-with-reconstruction algorithm so that we only need to store the last layer's outputs.
We evaluate the searched architecture on three sequence-to-sequence datasets, i.e., WMT'14 English-German, WMT'14 English-French, and WMT'14 English-Czech.
arXiv Detail & Related papers (2021-05-31T01:52:36Z) - AutoTrans: Automating Transformer Design via Reinforced Architecture
Search [52.48985245743108]
This paper empirically explore how to set layer-norm, whether to scale, number of layers, number of heads, activation function, etc, so that one can obtain a transformer architecture that better suits the tasks at hand.
Experiments on the CoNLL03, Multi-30k, IWSLT14 and WMT-14 shows that the searched transformer model can outperform the standard transformers.
arXiv Detail & Related papers (2020-09-04T08:46:22Z) - Deep-n-Cheap: An Automated Search Framework for Low Complexity Deep
Learning [3.479254848034425]
We present Deep-n-Cheap -- an open-source AutoML framework to search for deep learning models.
Our framework is targeted for deployment on both benchmark and custom datasets.
Deep-n-Cheap includes a user-customizable complexity penalty which trades off performance with training time or number of parameters.
arXiv Detail & Related papers (2020-03-27T13:00:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.