Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent
- URL: http://arxiv.org/abs/2407.14521v1
- Date: Fri, 5 Jul 2024 15:59:16 GMT
- Title: Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent
- Authors: Mahdi Buali, Robert Hoehndorf,
- Abstract summary: Automated Theorem Proving (ATP) faces challenges due to its complexity and computational demands.
Recent work has explored using Large Language Models (LLMs) for ATP action selection, but these methods can be resource-intensive.
This study introduces FEAS, an agent that enhances the COPRA in-context learning framework within Lean.
- Score: 1.006303657343407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated Theorem Proving (ATP) faces challenges due to its complexity and computational demands. Recent work has explored using Large Language Models (LLMs) for ATP action selection, but these methods can be resource-intensive. This study introduces FEAS, an agent that enhances the COPRA in-context learning framework within Lean. FEAS refines prompt generation, response parsing, and incorporates domain-specific heuristics for functional equations. It introduces FunEq, a curated dataset of functional equation problems with varying difficulty. FEAS outperforms baselines on FunEq, particularly with the integration of domain-specific heuristics. The results demonstrate FEAS's effectiveness in generating and formalizing high-level proof strategies into Lean proofs, showcasing the potential of tailored approaches for specific ATP challenges.
Related papers
- PEA: Enhancing LLM Performance on Computational-Reasoning Tasks [21.13926189404758]
This study introduces a formal approach to describe and solve a class of important reasoning tasks termed computational reasoning problems.
The framework decomposes these problems into predicate and enumeration components, using LLMs to synthesize programs based on specified predicates, enumeration, and aggregation rules.
Empirical evaluation reveals that PEA substantially enhances the performance of underlying models on benchmark computational problems, yielding an average accuracy improvement of approximately $50%$, coupled with increased efficiency.
arXiv Detail & Related papers (2025-02-16T00:27:05Z) - Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.
We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.
We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z) - Self-test loss functions for learning weak-form operators and gradient flows [5.9739929525316064]
We introduce self-test loss functions, which employ test functions that depend on the unknown parameters.
The proposed self-test loss function conserves energy gradient flows and coincides with the expected log-likelihood ratio for differential equations.
arXiv Detail & Related papers (2024-12-04T17:48:38Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative
Language Models [68.65075559137608]
We propose TRIGO, an ATP benchmark that not only requires a model to reduce a trigonometric expression with step-by-step proofs but also evaluates a generative LM's reasoning ability on formulas.
We gather trigonometric expressions and their reduced forms from the web, annotate the simplification process manually, and translate it into the Lean formal language system.
We develop an automatic generator based on Lean-Gym to create dataset splits of varying difficulties and distributions in order to thoroughly analyze the model's generalization ability.
arXiv Detail & Related papers (2023-10-16T08:42:39Z) - Transport Equation based Physics Informed Neural Network to predict the
Yield Strength of Architected Materials [0.0]
The PINN model showcases exceptional generalization capabilities, indicating its capacity to avoid overfitting with the provided dataset.
The research underscores the importance of striking a balance between performance and computational efficiency while selecting an activation function for specific real-world applications.
arXiv Detail & Related papers (2023-07-29T12:42:03Z) - On the Integration of Physics-Based Machine Learning with Hierarchical
Bayesian Modeling Techniques [0.0]
This paper proposes to embed mechanics-based models into the mean function of a Gaussian Process (GP) model and characterize potential discrepancies through kernel machines.
The stationarity of the kernel function is a difficult hurdle in the sequential processing of long data sets, resolved through hierarchical Bayesian techniques.
Using numerical and experimental examples, potential applications of the proposed method to structural dynamics inverse problems are demonstrated.
arXiv Detail & Related papers (2023-03-01T02:29:41Z) - Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z) - Scalable Gaussian Processes for Data-Driven Design using Big Data with
Categorical Factors [14.337297795182181]
Gaussian processes (GP) have difficulties in accommodating big datasets, categorical inputs, and multiple responses.
We propose a GP model that utilizes latent variables and functions obtained through variational inference to address the aforementioned challenges simultaneously.
Our approach is demonstrated for machine learning of ternary oxide materials and topology optimization of a multiscale compliant mechanism.
arXiv Detail & Related papers (2021-06-26T02:17:23Z) - Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning [92.05556163518999]
MARL exacerbates matters by imposing various constraints on communication and observability.
For value-based methods, it poses challenges in accurately representing the optimal value function.
For policy gradient methods, it makes training the critic difficult and exacerbates the problem of the lagging critic.
We show that from a learning theory perspective, both problems can be addressed by accurately representing the associated action-value function.
arXiv Detail & Related papers (2021-05-31T23:08:05Z) - Removing Bias in Multi-modal Classifiers: Regularization by Maximizing
Functional Entropies [88.0813215220342]
Some modalities can more easily contribute to the classification results than others.
We develop a method based on the log-Sobolev inequality, which bounds the functional entropy with the functional-Fisher-information.
On the two challenging multi-modal datasets VQA-CPv2 and SocialIQ, we obtain state-of-the-art results while more uniformly exploiting the modalities.
arXiv Detail & Related papers (2020-10-21T07:40:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.