Don't Fine-Tune, Decode: Syntax Error-Free Tool Use via Constrained Decoding
- URL: http://arxiv.org/abs/2310.07075v3
- Date: Tue, 4 Jun 2024 15:50:22 GMT
- Title: Don't Fine-Tune, Decode: Syntax Error-Free Tool Use via Constrained Decoding
- Authors: Kexun Zhang, Hongqiao Chen, Lei Li, William Wang,
- Abstract summary: Large language models (LLMs) excel at many tasks but often fail to use external tools due to complicated and unfamiliar syntax constraints.
We propose TOOLDEC, a decoding algorithm using finite state machines to force LLMs to follow tool syntax.
Experiments show that TOOLDEC eliminates all syntax errors, achieving significantly better performance on various base models and benchmarks.
- Score: 11.51687663492722
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Instruction-tuned large language models (LLMs) excel at many tasks but often fail to use external tools due to complicated and unfamiliar syntax constraints. While extensive fine-tuning and prompting can mitigate the issue, these approaches are expensive and hard to generalize. Furthermore, because syntax constraints are only learned implicitly during fine-tuning, models still make frequent syntax errors. Motivated by the fact that these constraints can be better satisfied explicitly with constrained decoding, we propose TOOLDEC, a decoding algorithm using finite state machines to force LLMs to follow tool syntax. Our experiments show that TOOLDEC eliminates all syntax errors, achieving significantly better performance on various base models and benchmarks. More surprisingly, when applied to generalist out-of-the-box LLMs such as Mistral-Instruct, TOOLDEC improves its accuracy in tool use from the initial 0% to an impressive 52%, matching the performance of specialized fine-tuned models such as ToolLLM.
Related papers
- Benchmarking Failures in Tool-Augmented Language Models [41.94295877935867]
Tool-augmented language models (TaLMs) assume 'perfect' information access and tool availability, which may not hold in the real world.
We introduce the FAIL-TALMS benchmark, featuring two major failures: under-specified user queries and non-available tools.
We evaluate top-performing proprietary and open-source models, and find all current models except for Claude struggle to recognize missing tools or information.
arXiv Detail & Related papers (2025-03-18T13:04:55Z) - SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs [77.79172008184415]
SpecTool is a new benchmark to identify error patterns in LLM output on tool-use tasks.
We show that even the most prominent LLMs exhibit these error patterns in their outputs.
Researchers can use the analysis and insights from SPECTOOL to guide their error mitigation strategies.
arXiv Detail & Related papers (2024-11-20T18:56:22Z) - Learning to Ask: When LLMs Meet Unclear Instruction [49.256630152684764]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone.
We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench.
We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z) - Automata-based constraints for language model decoding [9.137697105669142]
Language models (LMs) are often expected to generate strings in some formal language.
tuning requires significant resources, making it impractical for uncommon or task-specific formats.
We solve these issues through the application of automata theory.
Our system compiles constraints 7,000x faster, is provably correct, and can be extended in a modular fashion.
arXiv Detail & Related papers (2024-07-11T00:25:01Z) - Contrastive Instruction Tuning [61.97704869248903]
We propose Contrastive Instruction Tuning to maximize the similarity between semantically equivalent instruction-instance pairs.
Experiments on the PromptBench benchmark show that CoIN consistently improves LLMs' robustness to unseen instructions with variations across character, word, sentence, and semantic levels by an average of +2.5% in accuracy.
arXiv Detail & Related papers (2024-02-17T00:09:32Z) - Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation [7.687678490751105]
We present a novel decoding algorithm, DOMINO, that can enforce constraints in a fully subword-aligned fashion, while leveraging pre-computation and speculative decoding to achieve virtually no overhead and in some cases even almost 2$times$ speedup over unconstrained decoding -- thereby outperforming existing approaches by a wide margin.
arXiv Detail & Related papers (2024-02-07T13:36:02Z) - ControlLLM: Augment Language Models with Tools by Searching on Graphs [97.62758830255002]
We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving real-world tasks.
Our framework comprises three key components: (1) a textittask decomposer that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a textitThoughts-on-Graph (ToG) paradigm that searches the optimal solution path on a pre-built tool graph; and (3) an textitexecution engine with a rich toolbox that interprets the solution path and runs the
arXiv Detail & Related papers (2023-10-26T21:57:21Z) - CRAFT: Customizing LLMs by Creating and Retrieving from Specialized
Toolsets [75.64181719386497]
We present CRAFT, a tool creation and retrieval framework for large language models (LLMs)
It creates toolsets specifically curated for the tasks and equips LLMs with a component that retrieves tools from these sets to enhance their capability to solve complex tasks.
Our method is designed to be flexible and offers a plug-and-play approach to adapt off-the-shelf LLMs to unseen domains and modalities, without any finetuning.
arXiv Detail & Related papers (2023-09-29T17:40:26Z) - Toward Unified Controllable Text Generation via Regular Expression
Instruction [56.68753672187368]
Our paper introduces Regular Expression Instruction (REI), which utilizes an instruction-based mechanism to fully exploit regular expressions' advantages to uniformly model diverse constraints.
Our method only requires fine-tuning on medium-scale language models or few-shot, in-context learning on large language models, and requires no further adjustment when applied to various constraint combinations.
arXiv Detail & Related papers (2023-09-19T09:05:14Z) - GPT4Tools: Teaching Large Language Model to Use Tools via
Self-instruction [41.36474802204914]
GPT4Tools is based on self-instruct to enable open-source LLMs, such as LLaMA and OPT, to use tools.
It generates an instruction-following dataset by prompting an advanced teacher with various multi-modal contexts.
arXiv Detail & Related papers (2023-05-30T05:27:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.