Related papers: RocqSmith: Can Automatic Optimization Forge Better Proof Agents?

RocqSmith: Can Automatic Optimization Forge Better Proof Agents?

URL: http://arxiv.org/abs/2602.05762v1
Date: Thu, 05 Feb 2026 15:28:26 GMT
Title: RocqSmith: Can Automatic Optimization Forge Better Proof Agents?
Authors: Andrei Kozyrev, Nikita Khramov, Denis Lochmelis, Valerio Morelli, Gleb Solovev, Anton Podkopaev,
Abstract summary: We evaluate how different automatic agents perform when applied to the task of optimizing a Rocq proof-generation agent.<n>Our results show that while several bootstraps yield measurable improvements, simple few-shot bootstrapping is the most consistently effective.
Score: 0.07696728525672149
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work studies the applicability of automatic AI agent optimization methods to real-world agents in formal verification settings, focusing on automated theorem proving in Rocq as a representative and challenging domain. We evaluate how different automatic agent optimizers perform when applied to the task of optimizing a Rocq proof-generation agent, and assess whether parts of the fine-grained tuning of agentic systems, such as prompt design, contextual knowledge, and control strategies, can be automated. Our results show that while several optimizers yield measurable improvements, simple few-shot bootstrapping is the most consistently effective; however, none of the studied methods matches the performance of a carefully engineered state-of-the-art proof agent.

Related papers

AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent [57.10083973844841]
AgentArk is a novel framework to distill multi-agent dynamics into the weights of a single model.<n>We investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios.<n>By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents.
arXiv Detail & Related papers (2026-02-03T19:18:28Z)
Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization [69.36509281190662]
Adapting production-level computer vision tools to bespoke scientific datasets is a critical "last mile" bottleneck.<n>We consider using AI agents to automate this manual coding, and focus on the open question of optimal agent design.<n>We demonstrate that a simple agent framework consistently generates adaptation code that outperforms human-expert solutions.
arXiv Detail & Related papers (2025-12-02T18:42:26Z)
Alita-G: Self-Evolving Generative Agent for Agent Generation [54.49365835457433]
We present ALITA-G, a framework that transforms a general-purpose agent into a domain expert.<n>In this framework, a generalist agent executes a curated suite of target-domain tasks.<n>It attains strong gains while reducing computation costs.
arXiv Detail & Related papers (2025-10-27T17:59:14Z)
Are Agents Just Automata? On the Formal Equivalence Between Agentic AI and the Chomsky Hierarchy [4.245979127318219]
This paper establishes a formal equivalence between the architectural classes of modern agentic AI systems and the abstract machines of the hierarchy.<n>We demonstrate that simple reflex agents are equivalent to Finite Automata, hierarchical task-decomposition agents are equivalent to Pushdown Automata, and agents employing readable/writable memory for reflection are equivalent to TMs.
arXiv Detail & Related papers (2025-10-27T16:22:02Z)
InfiAgent: Self-Evolving Pyramid Agent Framework for Infinite Scenarios [28.65914611521654]
InfiAgent is a Pyramid-like DAG-based Multi-Agent Framework that can be applied to textbfinfinite scenarios.<n>InfiAgent achieves 9.9% higher performance compared to ADAS (similar auto-generated agent framework)
arXiv Detail & Related papers (2025-09-26T15:44:09Z)
$Agent^2$: An Agent-Generates-Agent Framework for Reinforcement Learning Automation [5.325886106098561]
Reinforcement learning (RL) agent development traditionally requires substantial expertise and iterative effort.<n>This paper introduces Agent$2$, an LLM-driven agent-generates-agent framework for fully automated RL agent design.<n>Agent$2$ translates natural language task descriptions and environment code into executable RL solutions without human intervention.
arXiv Detail & Related papers (2025-09-16T02:14:39Z)
Agentic Predictor: Performance Prediction for Agentic Workflows via Multi-View Encoding [56.565200973244146]
Agentic Predictor is a lightweight predictor for efficient agentic workflow evaluation.<n>By learning to approximate task success rates, Agentic Predictor enables fast and accurate selection of optimal agentic workflow configurations.
arXiv Detail & Related papers (2025-05-26T09:46:50Z)
A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops [3.729242965449096]
This paper introduces a framework for autonomously optimizing Agentic AI solutions across industries.<n>The framework achieves optimal performance without human input by autonomously generating and testing hypotheses.<n>Case studies show significant improvements in output quality, relevance, and actionability.
arXiv Detail & Related papers (2024-12-22T20:08:04Z)
Agent-as-a-Judge: Evaluate Agents with Agents [61.33974108405561]
We introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems. This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process. We present DevAI, a new benchmark of 55 realistic automated AI development tasks.
arXiv Detail & Related papers (2024-10-14T17:57:02Z)
Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement [112.04307762405669]
G"odel Agent is a self-evolving framework inspired by the G"odel machine.<n>G"odel Agent can achieve continuous self-improvement, surpassing manually crafted agents in performance, efficiency, and generalizability.
arXiv Detail & Related papers (2024-10-06T10:49:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.