FOREST: An Interactive Multi-tree Synthesizer for Regular Expressions
- URL: http://arxiv.org/abs/2012.14235v1
- Date: Mon, 28 Dec 2020 14:06:01 GMT
- Title: FOREST: An Interactive Multi-tree Synthesizer for Regular Expressions
- Authors: Margarida Ferreira and Miguel Terra-Neves and Miguel Ventura and
In\^es Lynce and Ruben Martins
- Abstract summary: We present FOREST, a regular expression synthesizer for digital form validations.
Forestry produces a regular expression that matches the desired pattern for the input values.
We also present a new SMT encoding to synthesize capture conditions for a given regular expression.
- Score: 5.21480688623047
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Form validators based on regular expressions are often used on digital forms
to prevent users from inserting data in the wrong format. However, writing
these validators can pose a challenge to some users. We present FOREST, a
regular expression synthesizer for digital form validations. FOREST produces a
regular expression that matches the desired pattern for the input values and a
set of conditions over capturing groups that ensure the validity of integer
values in the input. Our synthesis procedure is based on enumerative search and
uses a Satisfiability Modulo Theories (SMT) solver to explore and prune the
search space. We propose a novel representation for regular expressions
synthesis, multi-tree, which induces patterns in the examples and uses them to
split the problem through a divide-and-conquer approach. We also present a new
SMT encoding to synthesize capture conditions for a given regular expression.
To increase confidence in the synthesized regular expression, we implement user
interaction based on distinguishing inputs. We evaluated FOREST on real-world
form-validation instances using regular expressions. Experimental results show
that FOREST successfully returns the desired regular expression in 72% of the
instances and outperforms REGEL, a state-of-the-art regular expression
synthesizer.
Related papers
- Coinductive Proofs of Regular Expression Equivalence in Zero Knowledge [4.215558175939218]
Crepe is the first protocol for encoding regular expression equivalence proofs.
We also introduce the first ZK protocol to target a PSPACE-complete problem.
Crepe can validate large proofs in only a few seconds each.
arXiv Detail & Related papers (2025-04-01T21:25:34Z) - Is Reuse All You Need? A Systematic Comparison of Regular Expression Composition Strategies [5.503553586086489]
Authors: Are composition tasks unique enough to merit dedicated machinery, or is reuse all we need?
We collect a novel dataset of composition tasks mined from GitHub and RegExLib.
Our evaluation uses multiple dimensions, including a novel metric, to compare reuse-by-example against two synthesis approaches.
arXiv Detail & Related papers (2025-03-26T14:25:27Z) - Handling Numeric Expressions in Automatic Speech Recognition [56.972851337263755]
We compare cascaded and end-to-end approaches to recognize and format numeric expression.
Results show that adapted end-to-end models offer competitive performance with the advantage of lower latency and inference cost.
arXiv Detail & Related papers (2024-07-18T09:46:19Z) - Token Alignment via Character Matching for Subword Completion [34.76794239097628]
This paper examines a technique to alleviate the tokenization artifact on text completion in generative models.
The method, termed token alignment, involves backtracking to the last complete tokens and ensuring the model's generation aligns with the prompt.
arXiv Detail & Related papers (2024-03-13T16:44:39Z) - Real-time Regular Expression Matching [65.268245109828]
This paper is devoted to finite state automata, regular expression matching, pattern recognition, and the exponential blow-up problem.
This paper presents a theoretical and hardware solution to the exponential blow-up problem for some complicated classes of regular languages.
arXiv Detail & Related papers (2023-08-20T09:25:40Z) - Don't Prompt, Search! Mining-based Zero-Shot Learning with Language
Models [37.8952605358518]
Masked language models like BERT can perform text classification in a zero-shot fashion.
We propose an alternative mining-based approach for zero-shot learning.
arXiv Detail & Related papers (2022-10-26T15:52:30Z) - Neuro-Symbolic Regex Synthesis Framework via Neural Example Splitting [8.076841611508488]
We tackle the problem of learningconqueres faster from positive and negative strings by relying on a novel approach called neural example splitting'
Our approach essentially split up each example string into multiple parts using a neural network trained to group similar strings from positive strings.
We propose an effective synthesis framework called SplitRegex' that synthesizes subregexes from split' positives and produces the final by concatenating synthesized subregexes.
arXiv Detail & Related papers (2022-05-20T05:55:24Z) - Improving Structured Text Recognition with Regular Expression Biasing [13.801707647700727]
We study the problem of recognizing structured text that follows certain formats.
We propose to improve the recognition accuracy of structured text by specifying regular expressions (regexes) for biasing.
arXiv Detail & Related papers (2021-11-10T23:12:05Z) - Explicitly Modeling Syntax in Language Models with Incremental Parsing
and a Dynamic Oracle [88.65264818967489]
We propose a new syntax-aware language model: Syntactic Ordered Memory (SOM)
The model explicitly models the structure with an incremental and maintains the conditional probability setting of a standard language model.
Experiments show that SOM can achieve strong results in language modeling, incremental parsing and syntactic generalization tests.
arXiv Detail & Related papers (2020-10-21T17:39:15Z) - Benchmarking Multimodal Regex Synthesis with Complex Structures [45.35689345004124]
Existing datasets for regular expression (regex) generation from natural language are limited in complexity.
We introduce StructuredRegex, a new synthesis dataset differing from prior ones in three aspects.
arXiv Detail & Related papers (2020-05-02T00:16:09Z) - ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification
Models with Multiple Rewriting Transformations [97.27005783856285]
This paper introduces ASSET, a new dataset for assessing sentence simplification in English.
We show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task.
arXiv Detail & Related papers (2020-05-01T16:44:54Z) - Multi-level Head-wise Match and Aggregation in Transformer for Textual
Sequence Matching [87.97265483696613]
We propose a new approach to sequence pair matching with Transformer, by learning head-wise matching representations on multiple levels.
Experiments show that our proposed approach can achieve new state-of-the-art performance on multiple tasks.
arXiv Detail & Related papers (2020-01-20T20:02:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.