Automata-based constraints for language model decoding
- URL: http://arxiv.org/abs/2407.08103v2
- Date: Fri, 12 Jul 2024 01:02:43 GMT
- Title: Automata-based constraints for language model decoding
- Authors: Terry Koo, Frederick Liu, Luheng He,
- Abstract summary: LMs are often expected to generate strings in some formal language.
tuning requires significant resources, making it impractical for uncommon or task-specific formats.
We solve these issues through application of automata theory.
- Score: 9.137697105669142
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LMs are often expected to generate strings in some formal language; for example, structured data, API calls, or code snippets. Although LMs can be tuned to improve their adherence to formal syntax, this does not guarantee conformance, especially with smaller LMs suitable for large-scale deployment. In addition, tuning requires significant resources, making it impractical for uncommon or task-specific formats. To prevent downstream parsing errors we would ideally constrain the LM to only produce valid output, but this is severely complicated by tokenization, which is typically both ambiguous and misaligned with the formal grammar. We solve these issues through the application of automata theory, deriving an efficient closed-form solution for the regular languages, a broad class of formal languages with many practical applications, including API calls or schema-guided JSON and YAML. We also discuss pragmatic extensions for coping with the issue of high branching factor. Finally, we extend our techniques to deterministic context-free languages, which similarly admit an efficient closed-form solution. In spite of its flexibility and representative power, our approach only requires access to per-token decoding logits and lowers into simple calculations that are independent of LM size, making it both efficient and easy to apply to almost any LM architecture.
Related papers
- Adaptive Draft-Verification for Efficient Large Language Model Decoding [24.347886232342862]
Large language model (LLM) decoding involves generating a sequence of tokens based on a given context.
The typical autoregressive decoding method requires a separate forward pass through the model for each token generated.
We introduce ADED, which accelerates LLM decoding without requiring fine-tuning.
arXiv Detail & Related papers (2024-06-27T22:20:39Z) - Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
Large language models have demonstrated exceptional capability in natural language understanding and generation.
However, their generation speed is limited by the inherently sequential nature of their decoding process.
This paper introduces Lexical Unit Decoding, a novel decoding methodology implemented in a data-driven manner.
arXiv Detail & Related papers (2024-05-24T04:35:13Z) - Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs [61.40047491337793]
We present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations of large language models.
HomeR uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks.
A token reduction technique precedes each merging, ensuring memory usage efficiency.
arXiv Detail & Related papers (2024-04-16T06:34:08Z) - How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering [52.86931192259096]
Knowledge Base Question Answering (KBQA) aims to answer natural language questions based on facts in knowledge bases.
Recent works leverage the capabilities of large language models (LLMs) for logical form generation to improve performance.
arXiv Detail & Related papers (2024-01-11T09:27:50Z) - Can Large Language Models Understand Real-World Complex Instructions? [54.86632921036983]
Large language models (LLMs) can understand human instructions, but struggle with complex instructions.
Existing benchmarks are insufficient to assess LLMs' ability to understand complex instructions.
We propose CELLO, a benchmark for evaluating LLMs' ability to follow complex instructions systematically.
arXiv Detail & Related papers (2023-09-17T04:18:39Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - The potential of LLMs for coding with low-resource and domain-specific
programming languages [0.0]
This study focuses on the econometric scripting language named hansl of the open-source software gretl.
Our findings suggest that LLMs can be a useful tool for writing, understanding, improving, and documenting gretl code.
arXiv Detail & Related papers (2023-07-24T17:17:13Z) - Coupling Large Language Models with Logic Programming for Robust and
General Reasoning from Text [5.532477732693001]
We show that a large language model can serve as a highly effective few-shot semantically.
It can convert natural language sentences into a logical form that serves as input for answer set programs.
We demonstrate that this method achieves state-of-the-art performance on several benchmarks, including bAbI, StepGame, CLUTRR, and gSCAN.
arXiv Detail & Related papers (2023-07-15T03:29:59Z) - Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning [27.59524153097858]
grammar-constrained decoding (GCD) can be used to control the generation of large language models (LMs)
GCD can serve as a unified framework for structured NLP tasks in general.
We show that grammar-constrained LMs substantially outperform unconstrained LMs or even beat task-specific finetuned models.
arXiv Detail & Related papers (2023-05-23T11:54:37Z) - Prompting Is Programming: A Query Language for Large Language Models [5.8010446129208155]
We present the novel idea of Language Model Programming (LMP)
LMP generalizes language model prompting from pure text prompts to an intuitive combination of text prompting and scripting.
We show that LMQL can capture a wide range of state-of-the-art prompting methods in an intuitive way.
arXiv Detail & Related papers (2022-12-12T18:09:09Z) - Language Models of Code are Few-Shot Commonsense Learners [106.1531522893209]
Given a natural language input, the goal is to generate a graph such as an event -- or a reasoning-graph.
Existing approaches serialize the output graph as a flat list of nodes and edges.
We show that when we instead frame structured commonsense reasoning tasks as code generation tasks, pre-trained LMs of code are better structured commonsense reasoners than LMs of natural language.
arXiv Detail & Related papers (2022-10-13T16:09:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.