High-Performance Generation of Constrained Inputs
- URL: http://arxiv.org/abs/2511.05987v2
- Date: Thu, 13 Nov 2025 01:43:11 GMT
- Title: High-Performance Generation of Constrained Inputs
- Authors: Addison Crump, Alexi Turcotte, José Antonio Zamudio Amaya, Andreas Zeller,
- Abstract summary: Language-based testing combines context-free grammar definitions with semantic constraints to generate test inputs.<n>We present a novel approach for evolutionary language-based testing that improves performance by 3-4 orders of magnitude over the current state of the art.<n>We demonstrate this by a case study for a C subset, in which FANDANGO-RS is able to generate 401 diverse, complex, and valid test inputs for a C compiler per minute.
- Score: 4.837737516460689
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Language-based testing combines context-free grammar definitions with semantic constraints over grammar elements to generate test inputs. By pairing context-free grammars with constraints, users have the expressiveness of unrestricted grammars while retaining simple structure. However, producing inputs in the presence of such constraints can be challenging. In past approaches, SMT solvers have been found to be very slow at finding string solutions; evolutionary algorithms are faster and more general, but current implementations still struggle with complex constraints that would be required for domains such as compiler testing. In this paper, we present a novel approach for evolutionary language-based testing that improves performance by 3-4 orders of magnitude over the current state of the art, reducing hours of generation and constraint solving time to seconds. We accomplish this by (1) carefully transforming grammar definitions into Rust types and trait implementations, ensuring that the compiler may near-maximally optimize arbitrary operations on arbitrary grammars; and (2) using better evolutionary algorithms that improve the ability of language-based testing to solve complex constraint systems. These performance and algorithmic improvements allow our prototype, FANDANGO-RS, to solve constraints that previous strategies simply cannot handle. We demonstrate this by a case study for a C subset, in which FANDANGO-RS is able to generate 401 diverse, complex, and valid test inputs for a C compiler per minute.
Related papers
- Constrained Decoding of Diffusion LLMs with Context-Free Grammars [1.0923877073891446]
Large language models (LLMs) have shown promising performance across diverse domains.<n>This paper presents the first constrained decoding method for diffusion models.<n>We show that our method achieves near-perfect syntactic correctness while consistently preserving or improving functional correctness.
arXiv Detail & Related papers (2025-08-13T18:09:09Z) - Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive.<n> LCD can distort the global distribution over strings, sampling tokens based only on local information.<n>We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z) - Constrained Decoding for Fill-in-the-Middle Code Language Models via Efficient Left and Right Quotienting of Context-Sensitive Grammars [11.279507894576213]
This paper contributes an incremental synthesis that allows early rejection of syntactically incorrect code.
We extend the Earley parsing algorithm to allow for left and right quotients of context-free grammars.
arXiv Detail & Related papers (2024-02-28T02:12:47Z) - Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation [7.687678490751105]
We present a novel decoding algorithm, DOMINO, that can enforce constraints in a fully subword-aligned fashion, while leveraging pre-computation and speculative decoding to achieve virtually no overhead and in some cases even almost 2$times$ speedup over unconstrained decoding -- thereby outperforming existing approaches by a wide margin.
arXiv Detail & Related papers (2024-02-07T13:36:02Z) - Toward Unified Controllable Text Generation via Regular Expression
Instruction [56.68753672187368]
Our paper introduces Regular Expression Instruction (REI), which utilizes an instruction-based mechanism to fully exploit regular expressions' advantages to uniformly model diverse constraints.
Our method only requires fine-tuning on medium-scale language models or few-shot, in-context learning on large language models, and requires no further adjustment when applied to various constraint combinations.
arXiv Detail & Related papers (2023-09-19T09:05:14Z) - COLLIE: Systematic Construction of Constrained Text Generation Tasks [33.300039566331876]
COLLIE is a grammar-based framework that allows the specification of rich, compositional constraints with diverse generation levels.
We develop tools for automatic extraction of task instances given a constraint structure and a raw text corpus.
We perform systematic experiments across five state-of-the-art instruction-tuned language models and analyze their performances to reveal shortcomings.
arXiv Detail & Related papers (2023-07-17T17:48:51Z) - Controlled Text Generation with Natural Language Instructions [74.88938055638636]
InstructCTG is a controlled text generation framework that incorporates different constraints.
We first extract the underlying constraints of natural texts through a combination of off-the-shelf NLP tools and simple verbalizes.
By prepending natural language descriptions of the constraints and a few demonstrations, we fine-tune a pre-trained language model to incorporate various types of constraints.
arXiv Detail & Related papers (2023-04-27T15:56:34Z) - A Template-based Method for Constrained Neural Machine Translation [100.02590022551718]
We propose a template-based method that can yield results with high translation quality and match accuracy while keeping the decoding speed.
The generation and derivation of the template can be learned through one sequence-to-sequence training framework.
Experimental results show that the proposed template-based methods can outperform several representative baselines in lexically and structurally constrained translation tasks.
arXiv Detail & Related papers (2022-05-23T12:24:34Z) - NeuroLogic Decoding: (Un)supervised Neural Text Generation with
Predicate Logic Constraints [75.66980495245926]
Conditional text generation often requires lexical constraints, i.e., which words should or shouldn't be included in the output text.
We propose NeuroLogic Decoding, a simple yet effective algorithm that enables neural language models -- supervised or not -- to generate fluent text.
Our results suggest the limit of large-scale neural networks for fine-grained controllable generation and the promise of inference-time algorithms.
arXiv Detail & Related papers (2020-10-24T11:55:22Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.