Related papers: Pattern Matching in AI Compilers and its Formalization (Extended Version)

Pattern Matching in AI Compilers and its Formalization (Extended Version)

URL: http://arxiv.org/abs/2412.13398v1
Date: Wed, 18 Dec 2024 00:29:09 GMT
Title: Pattern Matching in AI Compilers and its Formalization (Extended Version)
Authors: Joseph W. Cutler, Alex Collins, Bin Fan, Mahesh Ravishankar, Vinod Grover,
Abstract summary: PyPM is a Python-based domain specific language for building rewrite-based optimization passes on machine learning computation graphs.<n>We present our work on building PyPM, as well as formalizing and distilling and this complexity to an understandable mathematical core.
Score: 5.025922465392978
License: http://creativecommons.org/licenses/by/4.0/
Abstract: PyPM is a Python-based domain specific language (DSL) for building rewrite-based optimization passes on machine learning computation graphs. Users define individual optimizations by writing (a) patterns that match subgraphs of a computation graph and (b) corresponding rules which replace a matched subgraph with an optimized kernel. PyPM is distinguished from the many other DSLs for defining rewriting passes by its complex and novel pattern language which borrows concepts from logic programming. PyPM patterns can be recursive, nondeterminstic, and can require checking domain-specific constraints such as the shapes of tensors. The PyPM implementation is thus similarly complicated, consisting of thousands of lines of C++ code. In this paper, we present our work on building PyPM, as well as formalizing and distilling and this complexity to an understandable mathematical core. We have developed a formal core calculus expressing the main operations of the PyPM pattern language. We define both a declarative semantics - describing which patterns match which terms - and an algorithmic semantics - an idealized version of the PyPM pattern interpreter - and prove their equivalence. The development is fully mechanized in the Coq proof assistant.

Related papers

Neuro-Symbolic Query Compiler [57.78201019000895]
This paper presents QCompiler, a neuro-symbolic framework inspired by linguistic grammar rules and compiler design, to bridge this gap.<n>It theoretically designs a minimal yet sufficient Backus-Naur Form (BNF) grammar $G[q]$ to formalize complex queries.<n>The atomicity of the sub-queries in the leaf ensures more precise document retrieval and response generation, significantly improving the RAG system's ability to address complex queries.
arXiv Detail & Related papers (2025-05-17T09:36:03Z)
Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers. It is common to instead use proxy tasks that are similar in only an informal sense. We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z)
Automata-based constraints for language model decoding [9.137697105669142]
Language models (LMs) are often expected to generate strings in some formal language. tuning requires significant resources, making it impractical for uncommon or task-specific formats. We solve these issues through the application of automata theory. Our system compiles constraints 7,000x faster, is provably correct, and can be extended in a modular fashion.
arXiv Detail & Related papers (2024-07-11T00:25:01Z)
AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation [14.831115535710692]
We propose the concept of AI-oriented grammar. This aims to represent code in a way that better suits the working mechanism of AI models. Code written with AI-oriented grammar discards formats and uses a minimum number of tokens.
arXiv Detail & Related papers (2024-04-25T04:46:02Z)
Compositional Program Generation for Few-Shot Systematic Generalization [59.57656559816271]
This study on a neuro-symbolic architecture called the Compositional Program Generator (CPG) CPG has three key features: textitmodularity, textitcomposition, and textitabstraction, in the form of grammar rules. It perfect achieves generalization on both the SCAN and COGS benchmarks using just 14 examples for SCAN and 22 examples for COGS.
arXiv Detail & Related papers (2023-09-28T14:33:20Z)
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning [84.12154024070024]
We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks. Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge. A Python interpreter then executes the generated code and prints the output.
arXiv Detail & Related papers (2023-09-19T17:54:21Z)
Top-Down Knowledge Compilation for Counting Modulo Theories [11.086759883832505]
Propositional model counting can be solved efficiently when the input formula is in deterministic decomposable negation normal form (d-DNNF) Top-down knowledge compilation is a state-of-the-art technique for solving #SAT problems. We advocate for a top-down compiler based on the traces of exhaustive DPLL(T) search.
arXiv Detail & Related papers (2023-06-07T15:46:28Z)
Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference. Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
arXiv Detail & Related papers (2022-11-15T05:22:40Z)
GraphQ IR: Unifying Semantic Parsing of Graph Query Language with Intermediate Representation [91.27083732371453]
We propose a unified intermediate representation (IR) for graph query languages, namely GraphQ IR. With the IR's natural-language-like representation that bridges the semantic gap and its formally defined syntax that maintains the graph structure, neural semantic parsing can more effectively convert user queries into GraphQ IR. Our approach can consistently achieve state-of-the-art performance on KQA Pro, Overnight and MetaQA.
arXiv Detail & Related papers (2022-05-24T13:59:53Z)
Discovering Non-monotonic Autoregressive Orderings with Variational Inference [67.27561153666211]
We develop an unsupervised parallelizable learner that discovers high-quality generation orders purely from training data. We implement the encoder as a Transformer with non-causal attention that outputs permutations in one forward pass. Empirical results in language modeling tasks demonstrate that our method is context-aware and discovers orderings that are competitive with or even better than fixed orders.
arXiv Detail & Related papers (2021-10-27T16:08:09Z)
pygrank: A Python Package for Graph Node Ranking [13.492381728793612]
We introduce pygrank, an open source Python package to define, run and evaluate node ranking algorithms. We provide object-oriented and extensively unit-tested algorithm components, such as graph filters, post-processors, measures, benchmarks and online tuning.
arXiv Detail & Related papers (2021-10-18T13:13:21Z)
PyMatching: A Python package for decoding quantum codes with minimum-weight perfect matching [0.0]
PyMatching is a package for decoding quantum error-correcting codes with the minimum-weight perfect matching (MWPM) algorithm. PyMatching supports the use of weighted edges, hook errors, boundaries and measurement errors, enabling fast decoding and simulation of fault-tolerant quantum computing.
arXiv Detail & Related papers (2021-05-27T12:10:37Z)
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [133.93803565077337]
retrieval-augmented generation models combine pre-trained parametric and non-parametric memory for language generation. We show that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
arXiv Detail & Related papers (2020-05-22T21:34:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.