Incremental Context-free Grammar Inference in Black Box Settings
- URL: http://arxiv.org/abs/2408.16706v1
- Date: Thu, 29 Aug 2024 17:00:38 GMT
- Title: Incremental Context-free Grammar Inference in Black Box Settings
- Authors: Feifei Li, Xiao Chen, Xi Xiao, Xiaoyu Sun, Chuan Chen, Shaohua Wang, Jitao Han,
- Abstract summary: Black-box context-free grammar inference is a significant challenge in many practical settings.
We propose a novel method that segments example strings into smaller units and incrementally infers the grammar.
Our approach, named Kedavra, has demonstrated superior grammar quality (enhanced precision and recall), faster runtime, and improved readability through empirical comparison.
- Score: 17.601446198181048
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Black-box context-free grammar inference presents a significant challenge in many practical settings due to limited access to example programs. The state-of-the-art methods, Arvada and Treevada, employ heuristic approaches to generalize grammar rules, initiating from flat parse trees and exploring diverse generalization sequences. We have observed that these approaches suffer from low quality and readability, primarily because they process entire example strings, adding to the complexity and substantially slowing down computations. To overcome these limitations, we propose a novel method that segments example strings into smaller units and incrementally infers the grammar. Our approach, named Kedavra, has demonstrated superior grammar quality (enhanced precision and recall), faster runtime, and improved readability through empirical comparison.
Related papers
- Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods [69.36397993451742]
This work introduces Context-aware Prompt Tuning (CPT), a method inspired by ICL, PT, and adversarial attacks.
We modify specific context tokens, considering the unique structure of input and output formats.
Inspired by adversarial attacks, we adjust the input based on the labels present in the context, focusing on minimizing, rather than maximizing, the loss.
arXiv Detail & Related papers (2024-10-22T17:45:47Z) - Detecting and explaining (in)equivalence of context-free grammars [0.6282171844772422]
We propose a scalable framework for deciding, proving, and explaining (in)equivalence of context-free grammars.
We present an implementation of the framework and evaluate it on large data sets collected within educational support systems.
arXiv Detail & Related papers (2024-07-25T17:36:18Z) - Understanding and Mitigating Classification Errors Through Interpretable
Token Patterns [58.91023283103762]
Characterizing errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors.
We propose to discover those patterns of tokens that distinguish correct and erroneous predictions.
We show that our method, Premise, performs well in practice.
arXiv Detail & Related papers (2023-11-18T00:24:26Z) - Fast Deterministic Black-box Context-free Grammar Inference [7.637155559284357]
State-of-the-art approach generalizes grammar rules starting from flat parse trees.
We observe that many of Arvada's generalizations violate common language concept nesting rules.
The resulting TreeVada yielded faster and higher-quality grammars in an empirical comparison.
arXiv Detail & Related papers (2023-08-11T14:45:26Z) - Free Lunch for Efficient Textual Commonsense Integration in Language
Models [20.02647320786556]
We group training samples with similar commonsense descriptions into a single batch, thus reusing the encoded description across multiple samples.
Extensive experiments illustrate that the proposed batch partitioning approach effectively reduces the computational cost while preserving performance.
The efficiency improvement is more pronounced on larger datasets and on devices with more memory capacity, attesting to its practical utility for large-scale applications.
arXiv Detail & Related papers (2023-05-24T19:14:57Z) - Alleviating Over-smoothing for Unsupervised Sentence Representation [96.19497378628594]
We present a Simple method named Self-Contrastive Learning (SSCL) to alleviate this issue.
Our proposed method is quite simple and can be easily extended to various state-of-the-art models for performance boosting.
arXiv Detail & Related papers (2023-05-09T11:00:02Z) - Structured Prompting: Scaling In-Context Learning to 1,000 Examples [78.41281805608081]
We introduce structured prompting that breaks the length limit and scales in-context learning to thousands of examples.
Specifically, demonstration examples are separately encoded with well-designed position embeddings, and then they are jointly attended by the test example using a rescaled attention mechanism.
arXiv Detail & Related papers (2022-12-13T16:31:21Z) - On Parsing as Tagging [66.31276017088477]
We show how to reduce tetratagging, a state-of-the-art constituency tagger, to shift--reduce parsing.
We empirically evaluate our taxonomy of tagging pipelines with different choices of linearizers, learners, and decoders.
arXiv Detail & Related papers (2022-11-14T13:37:07Z) - A Neural Model for Regular Grammar Induction [8.873449722727026]
We treat grammars as a model of computation and propose a novel neural approach to induction of regular grammars from positive and negative examples.
Our model is fully explainable, its intermediate results are directly interpretable as partial parses, and it can be used to learn arbitrary regular grammars when provided with sufficient data.
arXiv Detail & Related papers (2022-09-23T14:53:23Z) - Learning grammar with a divide-and-concur neural network [4.111899441919164]
We implement a divide-and-concur iterative projection approach to context-free grammar inference.
Our method requires a relatively small number of discrete parameters, making the inferred grammar directly interpretable.
arXiv Detail & Related papers (2022-01-18T22:42:43Z) - Obtaining Better Static Word Embeddings Using Contextual Embedding
Models [53.86080627007695]
Our proposed distillation method is a simple extension of CBOW-based training.
As a side-effect, our approach also allows a fair comparison of both contextual and static embeddings.
arXiv Detail & Related papers (2021-06-08T12:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.