$O_2$ is a multiple context-free grammar: an implementation-, formalisation-friendly proof
- URL: http://arxiv.org/abs/2405.09396v1
- Date: Wed, 15 May 2024 14:51:11 GMT
- Title: $O_2$ is a multiple context-free grammar: an implementation-, formalisation-friendly proof
- Authors: Marco B. Caminati,
- Abstract summary: Classifying languages according to the expressiveness of grammars able to generate them is a fundamental problem in computational linguistics.
This paper analyses the existing proofs from the computational and the proof-theoretical point of views systematically studying whether each proof can lead to a verified (i.e., checked by a proof assistant) algorithm parsing balanced languages via MCFGs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Classifying formal languages according to the expressiveness of grammars able to generate them is a fundamental problem in computational linguistics and, therefore, in the theory of computation. Furthermore, such kind of analysis can give insight into the classification of abstract algebraic structure such as groups, for example through the correspondence given by the word problem. While many such classification problems remain open, others have been settled. Recently, it was proved that $n$-balanced languages (i.e., whose strings contain the same occurrences of letters $a_i$ and $A_i$ with $1\leq i \leq n$) can be generated by multiple context-free grammars (MCFGs), which are one of the several slight extensions of context free grammars added to the classical Chomsky hierarchy to make the mentioned classification more precise. This paper analyses the existing proofs from the computational and the proof-theoretical point of views, systematically studying whether each proof can lead to a verified (i.e., checked by a proof assistant) algorithm parsing balanced languages via MCFGs. We conclude that none of the existing proofs is realistically suitable against this practical goal, and proceed to provide a radically new, elementary, extremely short proof for the crucial case $n \leq 2$. A comparative analysis with respect to the existing proofs is finally performed to justify why the proposed proof is a substantial step towards concretely obtaining a verified parsing algorithm for $O_2$.
Related papers
- MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs [80.96119560172224]
Large language models (LLMs) can solve arithmetic word problems with high accuracy, but little is known about how well they generalize to problems that are more complex than the ones on which they have been trained.
We present a framework for evaluating LLMs on problems with arbitrarily complex arithmetic proofs, called MathGAP.
arXiv Detail & Related papers (2024-10-17T12:48:14Z) - Detecting and explaining (in)equivalence of context-free grammars [0.6282171844772422]
We propose a scalable framework for deciding, proving, and explaining (in)equivalence of context-free grammars.
We present an implementation of the framework and evaluate it on large data sets collected within educational support systems.
arXiv Detail & Related papers (2024-07-25T17:36:18Z) - Authorship Verification based on the Likelihood Ratio of Grammar Models [0.8749675983608172]
Authorship Verification (AV) is the process of analyzing a set of documents to determine whether they were written by a specific author.
We propose a method relying on calculating a quantity we call $lambda_G$ (LambdaG)
Despite not needing large amounts of data for training, LambdaG still outperforms other established AV methods with higher computational complexity.
arXiv Detail & Related papers (2024-03-13T12:25:47Z) - MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data [85.50740598523818]
MUSTARD is a framework that masters uniform synthesis of theorem and proof data of high quality and diversity.
We present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points.
We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data.
arXiv Detail & Related papers (2024-02-14T05:57:58Z) - "You Are An Expert Linguistic Annotator": Limits of LLMs as Analyzers of
Abstract Meaning Representation [60.863629647985526]
We examine the successes and limitations of the GPT-3, ChatGPT, and GPT-4 models in analysis of sentence meaning structure.
We find that models can reliably reproduce the basic format of AMR, and can often capture core event, argument, and modifier structure.
Overall, our findings indicate that these models out-of-the-box can capture aspects of semantic structure, but there remain key limitations in their ability to support fully accurate semantic analyses or parses.
arXiv Detail & Related papers (2023-10-26T21:47:59Z) - LINC: A Neurosymbolic Approach for Logical Reasoning by Combining
Language Models with First-Order Logic Provers [60.009969929857704]
Logical reasoning is an important task for artificial intelligence with potential impacts on science, mathematics, and society.
In this work, we reformulating such tasks as modular neurosymbolic programming, which we call LINC.
We observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate.
arXiv Detail & Related papers (2023-10-23T17:58:40Z) - Language Models Are Greedy Reasoners: A Systematic Formal Analysis of
Chain-of-Thought [10.524051272257614]
Large language models (LLMs) have shown remarkable reasoning capabilities given chain-of-thought prompts.
We present a new synthetic question-answering dataset called PrOntoQA, where each example is generated as a synthetic world model.
This allows us to parse the generated chain-of-thought into symbolic proofs for formal analysis.
arXiv Detail & Related papers (2022-10-03T21:34:32Z) - multiPRover: Generating Multiple Proofs for Improved Interpretability in
Rule Reasoning [73.09791959325204]
We focus on a type of linguistic formal reasoning where the goal is to reason over explicit knowledge in the form of natural language facts and rules.
A recent work, named PRover, performs such reasoning by answering a question and also generating a proof graph that explains the answer.
In our work, we address a new and challenging problem of generating multiple proof graphs for reasoning over natural language rule-bases.
arXiv Detail & Related papers (2021-06-02T17:58:35Z) - The Logic for a Mildly Context-Sensitive Fragment of the Lambek-Grishin
Calculus [0.0]
We present a proof-theoretic characterization of tree-adjoining languages based on the Lambek-Grishin calculus.
Several new techniques are introduced for the proofs, such as purely structural connectives, usefulness, and a graph-theoretic argument on proof nets for HLG.
arXiv Detail & Related papers (2021-01-10T22:28:05Z) - Proof-theoretic aspects of NL$\lambda$ [0.0]
We present a proof-theoretic analysis of the logic NL$lambda$.
We introduce a novel calculus of proof nets and prove it is sound and complete with respect to the sequent calculus for the logic.
We show there is an unexpected convergence of the natural language analyses proposed in the two formalisms.
arXiv Detail & Related papers (2020-10-23T08:13:39Z) - The Return of Lexical Dependencies: Neural Lexicalized PCFGs [103.41187595153652]
We present novel neural models of lexicalized PCFGs which allow us to overcome sparsity problems.
Experiments demonstrate that this unified framework results in stronger results on both representations than achieved when either formalism alone.
arXiv Detail & Related papers (2020-07-29T22:12:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.