Inferring Attributed Grammars from Parser Implementations
- URL: http://arxiv.org/abs/2507.13117v1
- Date: Thu, 17 Jul 2025 13:32:59 GMT
- Title: Inferring Attributed Grammars from Parser Implementations
- Authors: Andreas Pointner, Josef Pichler, Herbert Prähofer,
- Abstract summary: We introduce a novel approach for inferring attributed grammars from implementations of input grammars.<n>By observing runtime executions and mapping the program's behavior to the grammar, we systematically extract and embed semantic actions into the grammar rules.<n>We demonstrate the feasibility of our approach using an initial set of programs, showing that it can accurately reproduce program behavior through the generated attributed grammars.
- Score: 1.0217990949413291
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Software systems that process structured inputs often lack complete and up-to-date specifications, which specify the input syntax and the semantics of input processing. While grammar mining techniques have focused on recovering syntactic structures, the semantics of input processing remains largely unexplored. In this work, we introduce a novel approach for inferring attributed grammars from parser implementations. Given an input grammar, our technique dynamically analyzes the implementation of recursive descent parsers to reconstruct the semantic aspects of input handling, resulting in specifications in the form of attributed grammars. By observing program executions and mapping the program's runtime behavior to the grammar, we systematically extract and embed semantic actions into the grammar rules. This enables comprehensive specification recovery. We demonstrate the feasibility of our approach using an initial set of programs, showing that it can accurately reproduce program behavior through the generated attributed grammars.
Related papers
- Leveraging Grammar Induction for Language Understanding and Generation [7.459693992079273]
We introduce an unsupervised grammar induction method for language understanding and generation.
We construct a grammar to induce constituency structures and dependency relations, which is simultaneously trained on downstream tasks.
We evaluate and apply our method to multiple machine translation tasks natural language understanding tasks.
arXiv Detail & Related papers (2024-10-07T09:57:59Z) - Semantic Parsing with Candidate Expressions for Knowledge Base Question Answering [4.795837146925278]
Semantics convert natural language to logical forms, which can be evaluated on knowledge bases (KBs) to produce denotations.<n>Recent semantics have been developed with sequence-to-sequence (seq2seq) pre-trained language models (PLMs)<n>We propose a grammar augmented with candidate expressions for semantic parsing on a large KB with a seq2seq PLM.
arXiv Detail & Related papers (2024-10-01T05:46:22Z) - Grammar Induction from Visual, Speech and Text [91.98797120799227]
This work introduces a novel visual-audio-text grammar induction task (textbfVAT-GI)<n>Inspired by the fact that language grammar exists beyond the texts, we argue that the text has not to be the predominant modality in grammar induction.<n>We propose a visual-audio-text inside-outside autoencoder (textbfVaTiora) framework, which leverages rich modal-specific and complementary features for effective grammar parsing.
arXiv Detail & Related papers (2024-10-01T02:24:18Z) - Reverse Engineering Structure and Semantics of Input of a Binary Executable [0.0]
This paper presents an algorithm to recover the structure and semantic relations between fields of the input of binary executables.
The algorithm was implemented in a prototype system named ByteRI 2.0.
arXiv Detail & Related papers (2024-05-22T22:47:33Z) - Weakly Supervised Semantic Parsing with Execution-based Spurious Program
Filtering [19.96076749160955]
We propose a domain-agnostic filtering mechanism based on program execution results.
We run a majority vote on these representations to identify and filter out programs with significantly different semantics from the other programs.
arXiv Detail & Related papers (2023-11-02T11:45:40Z) - Compositional Program Generation for Few-Shot Systematic Generalization [59.57656559816271]
This study on a neuro-symbolic architecture called the Compositional Program Generator (CPG)
CPG has three key features: textitmodularity, textitcomposition, and textitabstraction, in the form of grammar rules.
It perfect achieves generalization on both the SCAN and COGS benchmarks using just 14 examples for SCAN and 22 examples for COGS.
arXiv Detail & Related papers (2023-09-28T14:33:20Z) - Improving Generalization in Language Model-Based Text-to-SQL Semantic
Parsing: Two Simple Semantic Boundary-Based Techniques [14.634536051274468]
We introduce a token preprocessing method to preserve the semantic boundaries of tokens produced by LM tokenizers.
At the sequence level, we propose to use special tokens to mark the boundaries of components aligned between input and output.
Our experimental results on two text-to- semantic parsing datasets show that our token preprocessing, although simple, can substantially improve the LM performance on both types of generalization.
arXiv Detail & Related papers (2023-05-27T06:09:03Z) - Benchmarking Language Models for Code Syntax Understanding [79.11525961219591]
Pre-trained language models have demonstrated impressive performance in both natural language processing and program understanding.
In this work, we perform the first thorough benchmarking of the state-of-the-art pre-trained models for identifying the syntactic structures of programs.
Our findings point out key limitations of existing pre-training methods for programming languages, and suggest the importance of modeling code syntactic structures.
arXiv Detail & Related papers (2022-10-26T04:47:18Z) - BenchCLAMP: A Benchmark for Evaluating Language Models on Syntactic and
Semantic Parsing [55.058258437125524]
We introduce BenchCLAMP, a Benchmark to evaluate Constrained LAnguage Model Parsing.
We benchmark eight language models, including two GPT-3 variants available only through an API.
Our experiments show that encoder-decoder pretrained language models can achieve similar performance or surpass state-of-the-art methods for syntactic and semantic parsing when the model output is constrained to be valid.
arXiv Detail & Related papers (2022-06-21T18:34:11Z) - Syntax-Aware Network for Handwritten Mathematical Expression Recognition [53.130826547287626]
Handwritten mathematical expression recognition (HMER) is a challenging task that has many potential applications.
Recent methods for HMER have achieved outstanding performance with an encoder-decoder architecture.
We propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.
arXiv Detail & Related papers (2022-03-03T09:57:19Z) - Enforcing Consistency in Weakly Supervised Semantic Parsing [68.2211621631765]
We explore the use of consistency between the output programs for related inputs to reduce the impact of spurious programs.
We find that a more consistent formalism leads to improved model performance even without consistency-based training.
arXiv Detail & Related papers (2021-07-13T03:48:04Z) - Representing Partial Programs with Blended Abstract Semantics [62.20775388513027]
We introduce a technique for representing partially written programs in a program synthesis engine.
We learn an approximate execution model implemented as a modular neural network.
We show that these hybrid neuro-symbolic representations enable execution-guided synthesizers to use more powerful language constructs.
arXiv Detail & Related papers (2020-12-23T20:40:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.