Related papers: A Minimal Agent for Automated Theorem Proving

A Minimal Agent for Automated Theorem Proving

URL: http://arxiv.org/abs/2602.24273v1
Date: Fri, 27 Feb 2026 18:43:47 GMT
Title: A Minimal Agent for Automated Theorem Proving
Authors: Borja Requena Pozo, Austin Letson, Krystian Nowakowski, Izan Beltran Ferreiro, Leopoldo Sarra,
Abstract summary: This design implements the core features shared among state-of-the-art systems: iterative proof refinement, library search and context management.<n>We evaluate our baseline using qualitatively different benchmarks and compare various popular models and design choices.
Score: 0.3262230127283452
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a minimal agentic baseline that enables systematic comparison across different AI-based theorem prover architectures. This design implements the core features shared among state-of-the-art systems: iterative proof refinement, library search and context management. We evaluate our baseline using qualitatively different benchmarks and compare various popular models and design choices, and demonstrate competitive performance compared to state-of-the-art approaches, while using a significantly simpler architecture. Our results demonstrate consistent advantages of an iterative approach over multiple single-shot generations, especially in terms of sample efficiency and cost effectiveness. The implementation is released open-source as a candidate reference for future research and as an accessible prover for the community.

Related papers

HEAS: Hierarchical Evolutionary Agent Simulation Framework for Cross-Scale Modeling and Multi-Objective Search [4.807104001943257]
Hierarchical Simulation Agent (HEAS) is a Python framework that unifies layered agent-based modeling with evolutionary optimization and tournament evaluation.<n>HEAS represents models as hierarchies of lightweight processes ("streams") scheduled in deterministic layers that read and write a shared context.<n> compact API and CLI-simulate, optimize, evaluate-expose single- and multi-objective evolution.
arXiv Detail & Related papers (2025-08-21T13:35:46Z)
Comparative Analysis of AI Agent Architectures for Entity Relationship Classification [1.6887793771613606]
In this study, we conduct a comparative analysis of three distinct AI agent architectures to perform relation classification.<n>The agentic architectures explored include (1) reflective self-evaluation, (2) hierarchical task decomposition, and (3) a novel multi-agent dynamic example generation mechanism.<n>Our experiments demonstrate that multi-agent coordination consistently outperforms standard few-shot prompting.
arXiv Detail & Related papers (2025-06-03T04:19:47Z)
EVA-MILP: Towards Standardized Evaluation of MILP Instance Generation [13.49043811341421]
Mixed-Integer Linear Programming (MILP) is fundamental to solving complex decision-making problems.<n>The proliferation of MILP instance generation methods, driven by machine learning's demand for diverse datasets, has significantly outpaced standardized evaluation techniques.<n>This paper introduces a comprehensive benchmark framework designed for the systematic and objective evaluation of MILP instance generation methods.
arXiv Detail & Related papers (2025-05-30T16:42:15Z)
POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding [76.67608003501479]
We introduce POGEMA, a comprehensive set of tools that includes a fast environment for learning, a problem instance generator, and a visualization toolkit.<n>We also introduce and define an evaluation protocol that specifies a range of domain-related metrics, computed based on primary evaluation indicators.<n>The results of this comparison, which involves a variety of state-of-the-art MARL, search-based, and hybrid methods, are presented.
arXiv Detail & Related papers (2024-07-20T16:37:21Z)
Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation [54.50526986788175]
Recent advances in efficient sequence modeling have led to attention-free layers, such as Mamba, RWKV, and various gated RNNs. We present a unified view of these models, formulating such layers as implicit causal self-attention layers. Our framework compares the underlying mechanisms on similar grounds for different layers and provides a direct means for applying explainability methods.
arXiv Detail & Related papers (2024-05-26T09:57:45Z)
BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving [13.374504717801061]
We present BAIT, a framework for fair and streamlined comparison of learning approaches in Interactive Theorem Proving. We demonstrate BAIT's capabilities with an in-depth comparison, across several ITP benchmarks. BAIT also allows us to assess the end-to-end proving performance of systems built on interactive environments.
arXiv Detail & Related papers (2024-03-06T01:56:17Z)
Hierarchical Ensemble-Based Feature Selection for Time Series Forecasting [0.0]
We introduce a novel ensemble approach for feature selection based on hierarchical stacking for non-stationarity. Our approach exploits the co-dependency between features using a hierarchical structure. The effectiveness of the approach is demonstrated on synthetic and well-known real-life datasets.
arXiv Detail & Related papers (2023-10-26T16:40:09Z)
Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models. In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z)
Exploiting Temporal Structures of Cyclostationary Signals for Data-Driven Single-Channel Source Separation [98.95383921866096]
We study the problem of single-channel source separation (SCSS) We focus on cyclostationary signals, which are particularly suitable in a variety of application domains. We propose a deep learning approach using a U-Net architecture, which is competitive with the minimum MSE estimator.
arXiv Detail & Related papers (2022-08-22T14:04:56Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings. We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data. We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
Probabilistic Case-based Reasoning for Open-World Knowledge Graph Completion [59.549664231655726]
A case-based reasoning (CBR) system solves a new problem by retrieving cases' that are similar to the given problem. In this paper, we demonstrate that such a system is achievable for reasoning in knowledge-bases (KBs) Our approach predicts attributes for an entity by gathering reasoning paths from similar entities in the KB.
arXiv Detail & Related papers (2020-10-07T17:48:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.