Related papers: Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning

Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning

URL: http://arxiv.org/abs/2305.13971v6
Date: Thu, 18 Jan 2024 13:35:55 GMT
Title: Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning
Authors: Saibo Geng, Martin Josifoski, Maxime Peyrard, Robert West
Abstract summary: grammar-constrained decoding (GCD) can be used to control the generation of large language models (LMs) GCD can serve as a unified framework for structured NLP tasks in general. We show that grammar-constrained LMs substantially outperform unconstrained LMs or even beat task-specific finetuned models.
Score: 27.59524153097858
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite their impressive performance, large language models (LMs) still struggle with reliably generating complex output structures when not finetuned to follow the required output format exactly. To address this issue, grammar-constrained decoding (GCD) can be used to control the generation of LMs, guaranteeing that the output follows a given structure. Most existing GCD methods are, however, limited to specific tasks, such as parsing or code generation. In this work, we demonstrate that formal grammars can describe the output space for a much wider range of tasks and argue that GCD can serve as a unified framework for structured NLP tasks in general. For increased flexibility, we introduce input-dependent grammars, which allow the grammar to depend on the input and thus enable the generation of different output structures for different inputs. We then empirically demonstrate the power and flexibility of GCD-enhanced LMs on (1) information extraction, (2) entity disambiguation, and (3) constituency parsing. Our results indicate that grammar-constrained LMs substantially outperform unconstrained LMs or even beat task-specific finetuned models. Grammar constraints thus hold great promise for harnessing off-the-shelf LMs for a wide range of structured NLP tasks, especially where training data is scarce or finetuning is expensive. Code and data: https://github.com/epfl-dlab/GCD.

Related papers

WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding [58.1177179119881]
We introduce wgrammar, a lightweight decoding engine that integrates domain-aware simplification, constraint decomposition, and mask caching.<n> wgrammar achieves up to 250x speedup over existing systems.
arXiv Detail & Related papers (2025-07-22T17:13:47Z)
Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation [75.9865035064794]
Large language models (LLMs) have demonstrated remarkable capabilities, but still struggle with issues like hallucinations and outdated information.<n>Retrieval-augmented generation (RAG) addresses these issues by grounding LLM outputs in external knowledge with an Information Retrieval (IR) system.<n>We propose Align-GRAG, a novel reasoning-guided dual alignment framework in post-retrieval phrase.
arXiv Detail & Related papers (2025-05-22T05:15:27Z)
$\texttt{SEM-CTRL}$: Semantically Controlled Decoding [53.86639808659575]
$texttSEM-CTRL$ is a unified approach that enforces rich context-sensitive constraints and task- and instance-specific semantics directly on an LLM decoder. texttSEM-CTRL$ allows small pre-trained LLMs to efficiently outperform larger variants and state-of-the-art reasoning models.
arXiv Detail & Related papers (2025-03-03T18:33:46Z)
Enhancing LLM Character-Level Manipulation via Divide and Conquer [74.55804812450164]
Large Language Models (LLMs) have demonstrated strong generalization capabilities across a wide range of natural language processing (NLP) tasks. They exhibit notable weaknesses in character-level string manipulation, struggling with fundamental operations such as character deletion, insertion, and substitution. We propose Character-Level Manipulation via Divide and Conquer, a novel approach designed to bridge the gap between token-level processing and character-level manipulation.
arXiv Detail & Related papers (2025-02-12T07:37:39Z)
Flexible and Efficient Grammar-Constrained Decoding [5.671312847528642]
Grammar-constrained decoding (GCD) can guarantee that LLM outputs matches such rules. Existing GCD algorithms require tens of minutes to preprocess common grammars. We present a new GCD algorithm together with an implementation that offers 17.71x faster offline preprocessing than existing approaches.
arXiv Detail & Related papers (2025-02-07T17:35:17Z)
Filter-then-Generate: Large Language Models with Structure-Text Adapter for Knowledge Graph Completion [20.973071287301067]
Large Language Models (LLMs) present massive inherent knowledge and superior semantic comprehension capability. Empirical evidence suggests that LLMs consistently perform worse than conventional knowledge graph completion approaches. We propose a novel instruction-tuning-based method, namely FtG, to address these challenges.
arXiv Detail & Related papers (2024-12-12T09:22:04Z)
Graph-DPEP: Decomposed Plug and Ensemble Play for Few-Shot Document Relation Extraction with Graph-of-Thoughts Reasoning [34.85741925091139]
Graph-DPEP framework is grounded in the reasoning behind triplet explanation thoughts presented in natural language. We develop "ensemble-play", reapplying generation on the entire type list by leveraging the reasoning thoughts embedded in a sub-graph.
arXiv Detail & Related papers (2024-11-05T07:12:36Z)
Domain-Specific Shorthand for Generation Based on Context-Free Grammar [0.0]
Generation of structured data in formats such as YAML and XML is a critical task in Generative AI (GenAI) applications. We introduce a domain-specific shorthand (DSS) format, underpinned by a context-free grammar (CFG) This paper outlines the development of the DSS and the accompanying CFG, and the implications of this approach for GenAI applications.
arXiv Detail & Related papers (2024-06-14T23:26:41Z)
Grammar-Aligned Decoding [30.972850034752884]
Large Language Models (LLMs) struggle with reliably generating highly structured outputs, such as program code, mathematical formulas, or well-formed markup. Constrained decoding approaches mitigate this problem by greedily restricting what tokens an LLM can output at each step to guarantee that the output matches a given constraint. In this paper, we demonstrate that GCD techniques can distort the LLM's distribution, leading to outputs that are grammatical but appear with likelihoods that are not proportional to the ones given by the LLM.
arXiv Detail & Related papers (2024-05-31T17:39:15Z)
A Simple but Effective Approach to Improve Structured Language Model Output for Information Extraction [11.165093163378152]
Large language models (LLMs) have demonstrated impressive abilities in generating unstructured natural language according to instructions. This paper introduces an efficient method, G&O, to enhance their structured text generation capabilities.
arXiv Detail & Related papers (2024-02-20T20:42:02Z)
Instruction Position Matters in Sequence Generation with Large Language Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization. We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z)
Grammar Prompting for Domain-Specific Language Generation with Large Language Models [40.831045850285776]
Large language models (LLMs) can learn to perform a wide range of natural language tasks from just a handful of in-context examples. We propose emphgrammar prompting, a simple approach to enable LLMs to use external knowledge and domain-specific constraints.
arXiv Detail & Related papers (2023-05-30T17:26:01Z)
Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings is a significant challenge. We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences. We demonstrate that generative models like GPT can accurately learn this CFG language and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z)
LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback. Our focus is the code generation task, where the model produces code based on natural language instructions. LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z)
MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text Generation [102.20036684996248]
We propose MURMUR, a neuro-symbolic modular approach to text generation from semi-structured data with multi-step reasoning. We conduct experiments on two data-to-text generation tasks like WebNLG and LogicNLG.
arXiv Detail & Related papers (2022-12-16T17:36:23Z)
Language Models of Code are Few-Shot Commonsense Learners [106.1531522893209]
Given a natural language input, the goal is to generate a graph such as an event -- or a reasoning-graph. Existing approaches serialize the output graph as a flat list of nodes and edges. We show that when we instead frame structured commonsense reasoning tasks as code generation tasks, pre-trained LMs of code are better structured commonsense reasoners than LMs of natural language.
arXiv Detail & Related papers (2022-10-13T16:09:36Z)
Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model [57.92200214957124]
External language models (LMs) are used to improve the recognition performance of end-to-end (E2E) automatic speech recognition (ASR) systems. We propose a novel decoding algorithm where a word-level lattice is constructed on-the-fly to consider all possible word sequences. Our method consistently outperforms subword-level LMs, including N-gram LM and neural network LM.
arXiv Detail & Related papers (2022-01-06T10:04:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.