Related papers: Data types as a more ergonomic frontend for Grammar-Guided Genetic Programming

Data types as a more ergonomic frontend for Grammar-Guided Genetic Programming

URL: http://arxiv.org/abs/2210.04826v1
Date: Mon, 10 Oct 2022 16:38:16 GMT
Title: Data types as a more ergonomic frontend for Grammar-Guided Genetic Programming
Authors: Guilherme Espada, Leon Ingelse, Paulo Canelas, Pedro Barbosa, Alcides Fonseca
Abstract summary: We propose to embed the grammar as an internal Domain-Specific Language in the host language of the framework. This approach has the same expressive power as BNF and EBNF while using the host language type-system. We also present Meta-Handlers, user-defined overrides of the tree-generation system.
Score: 0.0
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Genetic Programming (GP) is an heuristic method that can be applied to many Machine Learning, Optimization and Engineering problems. In particular, it has been widely used in Software Engineering for Test-case generation, Program Synthesis and Improvement of Software (GI). Grammar-Guided Genetic Programming (GGGP) approaches allow the user to refine the domain of valid program solutions. Backus Normal Form is the most popular interface for describing Context-Free Grammars (CFG) for GGGP. BNF and its derivatives have the disadvantage of interleaving the grammar language and the target language of the program. We propose to embed the grammar as an internal Domain-Specific Language in the host language of the framework. This approach has the same expressive power as BNF and EBNF while using the host language type-system to take advantage of all the existing tooling: linters, formatters, type-checkers, autocomplete, and legacy code support. These tools have a practical utility in designing software in general, and GP systems in particular. We also present Meta-Handlers, user-defined overrides of the tree-generation system. This technique extends our object-oriented encoding with more practicability and expressive power than existing CFG approaches, achieving the same expressive power of Attribute Grammars, but without the grammar vs target language duality. Furthermore, we evidence that this approach is feasible, showing an example Python implementation as proof. We also compare our approach against textual BNF-representations w.r.t. expressive power and ergonomics. These advantages do not come at the cost of performance, as shown by our empirical evaluation on 5 benchmarks of our example implementation against PonyGE2. We conclude that our approach has better ergonomics with the same expressive power and performance of textual BNF-based grammar encodings.

Related papers

Guided Tensor Lifting [54.10411390218929]
Domain-specific languages (s) for machine learning are revolutionizing the speed and efficiency of machine learning workloads. To take advantage of these capabilities, a user must first translate their legacy code from the language it is currently written in, into the new DSL. Process of automatically lifting code into these DSLs has been identified by several recent works, which propose program synthesis as a solution.
arXiv Detail & Related papers (2025-04-28T12:00:10Z)
CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs. CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language. Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z)
Constrained Decoding for Fill-in-the-Middle Code Language Models via Efficient Left and Right Quotienting of Context-Sensitive Grammars [11.279507894576213]
This paper contributes an incremental synthesis that allows early rejection of syntactically incorrect code. We extend the Earley parsing algorithm to allow for left and right quotients of context-free grammars.
arXiv Detail & Related papers (2024-02-28T02:12:47Z)
Compositional Program Generation for Few-Shot Systematic Generalization [59.57656559816271]
This study on a neuro-symbolic architecture called the Compositional Program Generator (CPG) CPG has three key features: textitmodularity, textitcomposition, and textitabstraction, in the form of grammar rules. It perfect achieves generalization on both the SCAN and COGS benchmarks using just 14 examples for SCAN and 22 examples for COGS.
arXiv Detail & Related papers (2023-09-28T14:33:20Z)
AI2: The next leap toward native language based and explainable machine learning framework [1.827510863075184]
The proposed framework, named AI$2$, uses a natural language interface that allows a non-specialist to benefit from machine learning algorithms. The primary contribution of the AI$2$ framework allows a user to call the machine learning algorithms in English, making its interface usage easier. Another contribution is a preprocessing module that helps to describe and to load data properly.
arXiv Detail & Related papers (2023-01-09T14:48:35Z)
GraphQ IR: Unifying Semantic Parsing of Graph Query Language with Intermediate Representation [91.27083732371453]
We propose a unified intermediate representation (IR) for graph query languages, namely GraphQ IR. With the IR's natural-language-like representation that bridges the semantic gap and its formally defined syntax that maintains the graph structure, neural semantic parsing can more effectively convert user queries into GraphQ IR. Our approach can consistently achieve state-of-the-art performance on KQA Pro, Overnight and MetaQA.
arXiv Detail & Related papers (2022-05-24T13:59:53Z)
CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark [144.05723617401674]
General-purpose language intelligence evaluation has been a longstanding goal for natural language processing. We argue that for general-purpose language intelligence evaluation, the benchmark itself needs to be comprehensive and systematic. We propose CUGE, a Chinese Language Understanding and Generation Evaluation benchmark with the following features.
arXiv Detail & Related papers (2021-12-27T11:08:58Z)
Learning to Synthesize Data for Semantic Parsing [57.190817162674875]
We propose a generative model which models the composition of programs and maps a program to an utterance. Due to the simplicity of PCFG and pre-trained BART, our generative model can be efficiently learned from existing data at hand. We evaluate our method in both in-domain and out-of-domain settings of text-to-Query parsing on the standard benchmarks of GeoQuery and Spider.
arXiv Detail & Related papers (2021-04-12T21:24:02Z)
Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models. We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)
Code Building Genetic Programming [0.0]
We introduce Code Building Genetic Programming (CBGP) as a framework within which this can be done. CBGP produces a computational graph that can be executed or translated into source code of a host language.
arXiv Detail & Related papers (2020-08-09T04:33:04Z)
ProGraML: Graph-based Deep Learning for Program Optimization and Analysis [16.520971531754018]
We introduce ProGraML, a graph-based program representation for machine learning. ProGraML achieves an average 94.0 F1 score, significantly outperforming the state-of-the-art approaches. We then apply our approach to two high-level tasks - heterogeneous device mapping and program classification - setting new state-of-the-art performance in both.
arXiv Detail & Related papers (2020-03-23T20:27:00Z)
Traduction des Grammaires Cat\'egorielles de Lambek dans les Grammaires Cat\'egorielles Abstraites [0.0]
This internship report is to demonstrate that every Lambek Grammar can be, not entirely but efficiently, expressed in Abstract Categorial Grammars (ACG) The main idea is to transform the type rewriting system of LGs into that of Context-Free Grammars (CFG) by erasing introduction and elimination rules and generating enough axioms so that the cut rule suffices. Although the underlying algorithm was not fully implemented, this proof provides another argument in favour of the relevance of ACGs in Natural Language Processing.
arXiv Detail & Related papers (2020-01-23T18:23:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.