BAIT: Benchmarking (Embedding) Architectures for Interactive
Theorem-Proving
- URL: http://arxiv.org/abs/2403.03401v1
- Date: Wed, 6 Mar 2024 01:56:17 GMT
- Title: BAIT: Benchmarking (Embedding) Architectures for Interactive
Theorem-Proving
- Authors: Sean Lamont, Michael Norrish, Amir Dezfouli, Christian Walder, Paul
Montague
- Abstract summary: We present BAIT, a framework for fair and streamlined comparison of learning approaches in Interactive Theorem Proving.
We demonstrate BAIT's capabilities with an in-depth comparison, across several ITP benchmarks.
BAIT also allows us to assess the end-to-end proving performance of systems built on interactive environments.
- Score: 13.374504717801061
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Artificial Intelligence for Theorem Proving has given rise to a plethora of
benchmarks and methodologies, particularly in Interactive Theorem Proving
(ITP). Research in the area is fragmented, with a diverse set of approaches
being spread across several ITP systems. This presents a significant challenge
to the comparison of methods, which are often complex and difficult to
replicate. Addressing this, we present BAIT, a framework for fair and
streamlined comparison of learning approaches in ITP. We demonstrate BAIT's
capabilities with an in-depth comparison, across several ITP benchmarks, of
state-of-the-art architectures applied to the problem of formula embedding. We
find that Structure Aware Transformers perform particularly well, improving on
techniques associated with the original problem sets. BAIT also allows us to
assess the end-to-end proving performance of systems built on interactive
environments. This unified perspective reveals a novel end-to-end system that
improves on prior work. We also provide a qualitative analysis, illustrating
that improved performance is associated with more semantically-aware
embeddings. By streamlining the implementation and comparison of Machine
Learning algorithms in the ITP context, we anticipate BAIT will be a
springboard for future research.
Related papers
- Boosting CNN-based Handwriting Recognition Systems with Learnable Relaxation Labeling [48.78361527873024]
We propose a novel approach to handwriting recognition that integrates the strengths of two distinct methodologies.
We introduce a sparsification technique that accelerates the convergence of the algorithm and enhances the overall system's performance.
arXiv Detail & Related papers (2024-09-09T15:12:28Z) - POGEMA: A Benchmark Platform for Cooperative Multi-Agent Navigation [76.67608003501479]
We introduce and specify an evaluation protocol defining a range of domain-related metrics computed on the basics of the primary evaluation indicators.
The results of such a comparison, which involves a variety of state-of-the-art MARL, search-based, and hybrid methods, are presented.
arXiv Detail & Related papers (2024-07-20T16:37:21Z) - LInK: Learning Joint Representations of Design and Performance Spaces through Contrastive Learning for Mechanism Synthesis [15.793704096341523]
In this paper, we introduce LInK, a novel framework that integrates contrastive learning of performance and design space with optimization techniques.
By leveraging a multimodal and transformation-invariant contrastive learning framework, LInK learns a joint representation that captures complex physics and design representations of mechanisms.
Our results demonstrate that LInK not only advances the field of mechanism design but also broadens the applicability of contrastive learning and optimization to other areas of engineering.
arXiv Detail & Related papers (2024-05-31T03:04:57Z) - Learning Interpretable Models Through Multi-Objective Neural
Architecture Search [0.9990687944474739]
We propose a framework to optimize for both task performance and "introspectability," a surrogate metric for aspects of interpretability.
We demonstrate that jointly optimizing for task error and introspectability leads to more disentangled and debuggable architectures that perform within error.
arXiv Detail & Related papers (2021-12-16T05:50:55Z) - Harnessing Heterogeneity: Learning from Decomposed Feedback in Bayesian
Modeling [68.69431580852535]
We introduce a novel GP regression to incorporate the subgroup feedback.
Our modified regression has provably lower variance -- and thus a more accurate posterior -- compared to previous approaches.
We execute our algorithm on two disparate social problems.
arXiv Detail & Related papers (2021-07-07T03:57:22Z) - Towards Large Scale Automated Algorithm Design by Integrating Modular
Benchmarking Frameworks [0.9281671380673306]
We present a first proof-of-concept use-case that demonstrates the efficiency of the algorithm framework ParadisEO with the automated algorithm configuration tool irace and the experimental platform IOHprofiler.
Key advantages of our pipeline are fast evaluation times, the possibility to generate rich data sets, and a standardized interface that can be used to benchmark very broad classes of sampling-based optimizations.
arXiv Detail & Related papers (2021-02-12T10:47:00Z) - Investigating Bi-Level Optimization for Learning and Vision from a
Unified Perspective: A Survey and Beyond [114.39616146985001]
In machine learning and computer vision fields, despite the different motivations and mechanisms, a lot of complex problems contain a series of closely related subproblms.
In this paper, we first uniformly express these complex learning and vision problems from the perspective of Bi-Level Optimization (BLO)
Then we construct a value-function-based single-level reformulation and establish a unified algorithmic framework to understand and formulate mainstream gradient-based BLO methodologies.
arXiv Detail & Related papers (2021-01-27T16:20:23Z) - Improving the Effectiveness of Traceability Link Recovery using
Hierarchical Bayesian Networks [21.15456830607455]
We implement a HierarchiCal PrObabilistic Model for SoftwarE Traceability (Comet)
Comet is capable of modeling relationships between artifacts by combining the complementary observational prowess of multiple measures of textual similarity.
We conduct a comprehensive empirical evaluation of Comet that illustrates an improvement over a set of optimally configured baselines.
arXiv Detail & Related papers (2020-05-18T19:38:29Z) - A Dependency Syntactic Knowledge Augmented Interactive Architecture for
End-to-End Aspect-based Sentiment Analysis [73.74885246830611]
We propose a novel dependency syntactic knowledge augmented interactive architecture with multi-task learning for end-to-end ABSA.
This model is capable of fully exploiting the syntactic knowledge (dependency relations and types) by leveraging a well-designed Dependency Relation Embedded Graph Convolutional Network (DreGcn)
Extensive experimental results on three benchmark datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-04T14:59:32Z) - Image Matching across Wide Baselines: From Paper to Practice [80.9424750998559]
We introduce a comprehensive benchmark for local features and robust estimation algorithms.
Our pipeline's modular structure allows easy integration, configuration, and combination of different methods.
We show that with proper settings, classical solutions may still outperform the perceived state of the art.
arXiv Detail & Related papers (2020-03-03T15:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.