Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning
- URL: http://arxiv.org/abs/2305.13971v6
- Date: Thu, 18 Jan 2024 13:35:55 GMT
- Title: Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning
- Authors: Saibo Geng, Martin Josifoski, Maxime Peyrard, Robert West
- Abstract summary: grammar-constrained decoding (GCD) can be used to control the generation of large language models (LMs)
GCD can serve as a unified framework for structured NLP tasks in general.
We show that grammar-constrained LMs substantially outperform unconstrained LMs or even beat task-specific finetuned models.
- Score: 27.59524153097858
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite their impressive performance, large language models (LMs) still
struggle with reliably generating complex output structures when not finetuned
to follow the required output format exactly. To address this issue,
grammar-constrained decoding (GCD) can be used to control the generation of
LMs, guaranteeing that the output follows a given structure. Most existing GCD
methods are, however, limited to specific tasks, such as parsing or code
generation. In this work, we demonstrate that formal grammars can describe the
output space for a much wider range of tasks and argue that GCD can serve as a
unified framework for structured NLP tasks in general. For increased
flexibility, we introduce input-dependent grammars, which allow the grammar to
depend on the input and thus enable the generation of different output
structures for different inputs. We then empirically demonstrate the power and
flexibility of GCD-enhanced LMs on (1) information extraction, (2) entity
disambiguation, and (3) constituency parsing. Our results indicate that
grammar-constrained LMs substantially outperform unconstrained LMs or even beat
task-specific finetuned models. Grammar constraints thus hold great promise for
harnessing off-the-shelf LMs for a wide range of structured NLP tasks,
especially where training data is scarce or finetuning is expensive. Code and
data: https://github.com/epfl-dlab/GCD.
Related papers
- Graph-DPEP: Decomposed Plug and Ensemble Play for Few-Shot Document Relation Extraction with Graph-of-Thoughts Reasoning [34.85741925091139]
Graph-DPEP framework is grounded in the reasoning behind triplet explanation thoughts presented in natural language.
We develop "ensemble-play", reapplying generation on the entire type list by leveraging the reasoning thoughts embedded in a sub-graph.
arXiv Detail & Related papers (2024-11-05T07:12:36Z) - Domain-Specific Shorthand for Generation Based on Context-Free Grammar [0.0]
Generation of structured data in formats such as YAML and XML is a critical task in Generative AI (GenAI) applications.
We introduce a domain-specific shorthand (DSS) format, underpinned by a context-free grammar (CFG)
This paper outlines the development of the DSS and the accompanying CFG, and the implications of this approach for GenAI applications.
arXiv Detail & Related papers (2024-06-14T23:26:41Z) - Grammar-Aligned Decoding [30.972850034752884]
Large Language Models (LLMs) struggle with reliably generating highly structured outputs, such as program code, mathematical formulas, or well-formed markup.
Constrained decoding approaches mitigate this problem by greedily restricting what tokens an LLM can output at each step to guarantee that the output matches a given constraint.
In this paper, we demonstrate that GCD techniques can distort the LLM's distribution, leading to outputs that are grammatical but appear with likelihoods that are not proportional to the ones given by the LLM.
arXiv Detail & Related papers (2024-05-31T17:39:15Z) - A Simple but Effective Approach to Improve Structured Language Model
Output for Information Extraction [11.165093163378152]
Large language models (LLMs) have demonstrated impressive abilities in generating unstructured natural language according to instructions.
This paper introduces an efficient method, G&O, to enhance their structured text generation capabilities.
arXiv Detail & Related papers (2024-02-20T20:42:02Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - Grammar Prompting for Domain-Specific Language Generation with Large
Language Models [40.831045850285776]
Large language models (LLMs) can learn to perform a wide range of natural language tasks from just a handful of in-context examples.
We propose emphgrammar prompting, a simple approach to enable LLMs to use external knowledge and domain-specific constraints.
arXiv Detail & Related papers (2023-05-30T17:26:01Z) - Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings is a significant challenge.
We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences.
We demonstrate that generative models like GPT can accurately learn this CFG language and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z) - LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback.
Our focus is the code generation task, where the model produces code based on natural language instructions.
LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z) - MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text
Generation [102.20036684996248]
We propose MURMUR, a neuro-symbolic modular approach to text generation from semi-structured data with multi-step reasoning.
We conduct experiments on two data-to-text generation tasks like WebNLG and LogicNLG.
arXiv Detail & Related papers (2022-12-16T17:36:23Z) - Language Models of Code are Few-Shot Commonsense Learners [106.1531522893209]
Given a natural language input, the goal is to generate a graph such as an event -- or a reasoning-graph.
Existing approaches serialize the output graph as a flat list of nodes and edges.
We show that when we instead frame structured commonsense reasoning tasks as code generation tasks, pre-trained LMs of code are better structured commonsense reasoners than LMs of natural language.
arXiv Detail & Related papers (2022-10-13T16:09:36Z) - Improving Mandarin End-to-End Speech Recognition with Word N-gram
Language Model [57.92200214957124]
External language models (LMs) are used to improve the recognition performance of end-to-end (E2E) automatic speech recognition (ASR) systems.
We propose a novel decoding algorithm where a word-level lattice is constructed on-the-fly to consider all possible word sequences.
Our method consistently outperforms subword-level LMs, including N-gram LM and neural network LM.
arXiv Detail & Related papers (2022-01-06T10:04:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.