Related papers: Finding Good Proofs for Description Logic Entailments Using Recursive Quality Measures (Extended Technical Report)

Finding Good Proofs for Description Logic Entailments Using Recursive Quality Measures (Extended Technical Report)

URL: http://arxiv.org/abs/2104.13138v1
Date: Tue, 27 Apr 2021 12:34:13 GMT
Title: Finding Good Proofs for Description Logic Entailments Using Recursive Quality Measures (Extended Technical Report)
Authors: Christian Alrabbaa and Franz Baader and Stefan Borgwardt and Patrick Koopmann and Alisa Kovtunova
Abstract summary: How comprehensible a proof is depends not only on the employed calculus, but also on the properties of the particular proof. We aim for general results that hold for wide classes of calculi and measures.
Score: 15.150938933215906
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Logic-based approaches to AI have the advantage that their behavior can in principle be explained to a user. If, for instance, a Description Logic reasoner derives a consequence that triggers some action of the overall system, then one can explain such an entailment by presenting a proof of the consequence in an appropriate calculus. How comprehensible such a proof is depends not only on the employed calculus, but also on the properties of the particular proof, such as its overall size, its depth, the complexity of the employed sentences and proof steps, etc. For this reason, we want to determine the complexity of generating proofs that are below a certain threshold w.r.t. a given measure of proof quality. Rather than investigating this problem for a fixed proof calculus and a fixed measure, we aim for general results that hold for wide classes of calculi and measures. In previous work, we first restricted the attention to a setting where proof size is used to measure the quality of a proof. We then extended the approach to a more general setting, but important measures such as proof depth were not covered. In the present paper, we provide results for a class of measures called recursive, which yields lower complexities and also encompasses proof depth. In addition, we close some gaps left open in our previous work, thus providing a comprehensive picture of the complexity landscape.

Related papers

Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework [2.9334627971166336]
This paper investigates the logical reasoning capabilities of large language models (LLMs) A trained LLM receives as input a set of assumptions and a goal, and produces as output a proof that formally derives the goal from the assumptions. A critical obstacle for training is the scarcity of real-world proofs.
arXiv Detail & Related papers (2025-04-28T19:25:29Z)
Critical Thinking: Which Kinds of Complexity Govern Optimal Reasoning Length? [72.70486097967124]
We formalize a framework using deterministic finite automata (DFAs) We show that there exists an optimal amount of reasoning tokens such that the probability of producing a correct solution is maximized. We then demonstrate an implication of these findings: being able to predict the optimal number of reasoning tokens for new problems and filtering out non-optimal length answers results in consistent accuracy improvements.
arXiv Detail & Related papers (2025-04-02T17:45:58Z)
LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction [74.79306773878955]
We introduce LeanProgress, a method that predicts the progress in the proof. Our experiments show that LeanProgress achieves an overall prediction accuracy of 75.1%.
arXiv Detail & Related papers (2025-02-25T07:46:36Z)
Next-Token Prediction Task Assumes Optimal Data Ordering for LLM Training in Proof Generation [27.60611509339328]
We argue that the optimal order for one training data sample occurs when the relevant intermediate supervision for a particular proof step is always positioned to the left of that proof step. We demonstrate that training is most effective when the proof is in the intuitively sequential order.
arXiv Detail & Related papers (2024-10-30T18:00:04Z)
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs [80.96119560172224]
Large language models (LLMs) can solve arithmetic word problems with high accuracy, but little is known about how well they generalize to problems that are more complex than the ones on which they have been trained. We present a framework for evaluating LLMs on problems with arbitrarily complex arithmetic proofs, called MathGAP.
arXiv Detail & Related papers (2024-10-17T12:48:14Z)
Proving Theorems Recursively [80.42431358105482]
We propose POETRY, which proves theorems in a level-by-level manner. Unlike previous step-by-step methods, POETRY searches for a sketch of the proof at each level. We observe a substantial increase in the maximum proof length found by POETRY, from 10 to 26.
arXiv Detail & Related papers (2024-05-23T10:35:08Z)
Efficient and Universal Merkle Tree Inclusion Proofs via OR Aggregation [27.541105686358378]
We propose a novel proof aggregation approach based on OR logic for Merkle tree inclusion proofs. We achieve a proof size independent of the number of leaves in the tree, and verification can be performed using any single valid leaf hash. The proposed techniques have the potential to significantly enhance the scalability, efficiency, and flexibility of zero-knowledge proof systems.
arXiv Detail & Related papers (2024-05-13T17:15:38Z)
Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples [36.63316546586304]
Large language models (LLMs) possess some abstract deductive reasoning ability given chain-of-thought prompts. We test on a broad set of deduction rules and measure their ability to generalize to more complex proofs from simpler demonstrations. Experiments on four LLMs of various sizes and training objectives show that they are able to generalize to compositional proofs.
arXiv Detail & Related papers (2023-05-24T15:55:51Z)
ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness [67.49087159888298]
ReCEval is a framework that evaluates reasoning chains via two key properties: correctness and informativeness. We show that ReCEval effectively identifies various error types and yields notable improvements compared to prior methods.
arXiv Detail & Related papers (2023-04-21T02:19:06Z)
Generating Natural Language Proofs with Verifier-Guided Search [74.9614610172561]
We present a novel stepwise method NLProofS (Natural Language Proof Search) NLProofS learns to generate relevant steps conditioning on the hypothesis. It achieves state-of-the-art performance on EntailmentBank and RuleTaker.
arXiv Detail & Related papers (2022-05-25T02:22:30Z)
The General Adversary Bound: A Survey [2.3986080077861787]
Ben Reichardt showed in a series of results that the general adversary bound of a function characterizes its quantum query complexity. This survey seeks to aggregate the background and definitions necessary to understand the proof.
arXiv Detail & Related papers (2021-04-13T17:35:16Z)
A simple geometric proof for the benefit of depth in ReLU networks [57.815699322370826]
We present a simple proof for the benefit of depth in multi-layer feedforward network with rectified activation ("depth separation") We present a concrete neural network with linear depth (in $m$) and small constant width ($leq 4$) that classifies the problem with zero error.
arXiv Detail & Related papers (2021-01-18T15:40:27Z)
Causal Expectation-Maximisation [70.45873402967297]
We show that causal inference is NP-hard even in models characterised by polytree-shaped graphs. We introduce the causal EM algorithm to reconstruct the uncertainty about the latent variables from data about categorical manifest variables. We argue that there appears to be an unnoticed limitation to the trending idea that counterfactual bounds can often be computed without knowledge of the structural equations.
arXiv Detail & Related papers (2020-11-04T10:25:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.