Automating Thought of Search: A Journey Towards Soundness and Completeness
- URL: http://arxiv.org/abs/2408.11326v1
- Date: Wed, 21 Aug 2024 04:19:52 GMT
- Title: Automating Thought of Search: A Journey Towards Soundness and Completeness
- Authors: Daniel Cao, Michael Katz, Harsha Kokel, Kavitha Srinivas, Shirin Sohrabi,
- Abstract summary: Planning remains one of the last standing bastions for large language models (LLMs)
We automate Thought of Search (ToS) completely taking the human out of the loop of solving planning problems.
We achieve 100% accuracy, with minimal feedback, using LLMs of various sizes on all evaluated domains.
- Score: 20.944440404347908
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Planning remains one of the last standing bastions for large language models (LLMs), which now turn their attention to search. Most of the literature uses the language models as world models to define the search space, forgoing soundness for the sake of flexibility. A recent work, Thought of Search (ToS), proposed defining the search space with code, having the language models produce that code. ToS requires a human in the loop, collaboratively producing a sound successor function and goal test. The result, however, is worth the effort: all the tested datasets were solved with 100% accuracy. At the same time LLMs have demonstrated significant progress in code generation and refinement for complex reasoning tasks. In this work, we automate ToS (AutoToS), completely taking the human out of the loop of solving planning problems. AutoToS guides the language model step by step towards the generation of sound and complete search components, through feedback from both generic and domain specific unit tests. We achieve 100% accuracy, with minimal feedback iterations, using LLMs of various sizes on all evaluated domains.
Related papers
- Idiosyncrasies in Large Language Models [54.26923012617675]
We unveil and study idiosyncrasies in Large Language Models (LLMs)
We find that fine-tuning existing text embedding models on LLM-generated texts yields excellent classification accuracy.
We leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies.
arXiv Detail & Related papers (2025-02-17T18:59:02Z) - ChaI-TeA: A Benchmark for Evaluating Autocompletion of Interactions with LLM-based Chatbots [37.80024426010599]
We present ChaI-TeA: CHat InTEraction Autocomplete; An autcomplete evaluation framework for LLM-based interactions.
The framework includes a formal definition of the task, coupled with suitable datasets and metrics.
We use the framework to evaluate After formally defining the task along with suitable datasets and metrics, we test 9 models on the defined auto completion task.
arXiv Detail & Related papers (2024-12-24T12:03:36Z) - Stream of Search (SoS): Learning to Search in Language [29.841835308845948]
We show how language models can be taught to search by representing the process of search in language as a flattened string.
We propose a unified language for search that captures an array of different symbolic search strategies.
Our results indicate that language models can learn to solve problems via search, self-improve to flexibly use different search strategies, and potentially discover new ones.
arXiv Detail & Related papers (2024-04-01T06:50:52Z) - Continuously Learning New Words in Automatic Speech Recognition [56.972851337263755]
We propose a self-supervised continual learning approach for Automatic Speech Recognition.
We use a memory-enhanced ASR model from the literature to decode new words from the slides.
We show that with this approach, we obtain increasing performance on the new words when they occur more frequently.
arXiv Detail & Related papers (2024-01-09T10:39:17Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - L2CEval: Evaluating Language-to-Code Generation Capabilities of Large
Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs)
We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods.
In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z) - Language-Conditioned Observation Models for Visual Object Search [12.498575839909334]
We bridge the gap in realistic object search by posing the problem as a partially observable Markov decision process (POMDP)
We incorporate the neural network's outputs into our language-conditioned observation model (LCOM) to represent dynamically changing sensor noise.
We demonstrate our method on a Boston Dynamics Spot robot, enabling it to handle complex natural language object descriptions and efficiently find objects in a room-scale environment.
arXiv Detail & Related papers (2023-09-13T19:30:53Z) - AutoTAMP: Autoregressive Task and Motion Planning with LLMs as Translators and Checkers [20.857692296678632]
For effective human-robot interaction, robots need to understand, plan, and execute complex, long-horizon tasks.
Recent advances in large language models have shown promise for translating natural language into robot action sequences.
We show that our approach outperforms several methods using LLMs as planners in complex task domains.
arXiv Detail & Related papers (2023-06-10T21:58:29Z) - AutoScrum: Automating Project Planning Using Large Language Models [0.0]
Large language models have made it possible to use language models for advanced reasoning.
In this paper we leverage this ability for designing complex project plans based only on knowing the current state and the desired state.
Two approaches are demonstrated - a scrum based approach and a shortcut plan approach.
arXiv Detail & Related papers (2023-06-05T19:16:37Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.