Generating Mathematical Derivations with Large Language Models
- URL: http://arxiv.org/abs/2307.09998v3
- Date: Tue, 8 Aug 2023 12:23:49 GMT
- Title: Generating Mathematical Derivations with Large Language Models
- Authors: Jordan Meadows, Marco Valentino, Andre Freitas
- Abstract summary: We leverage a symbolic engine to generate derivations of equations at scale.
We investigate the capabilities of Large Language Models when deriving goal equations from premises.
- Score: 2.363388546004777
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The derivation of mathematical results in specialised fields, using Large
Language Models (LLMs), is an emerging research direction that can help
identify models' limitations, and potentially support mathematical discovery.
In this paper, we leverage a symbolic engine to generate derivations of
equations at scale, and investigate the capabilities of LLMs when deriving goal
equations from premises. Specifically, we employ in-context learning for GPT
and fine-tune a range of T5 models to compare the robustness and generalisation
of pre-training strategies to specialised models. Empirical results show that
fine-tuned FLAN-T5-large (MathT5) outperforms GPT models on all static and
out-of-distribution test sets in conventional scores. However, an in-depth
analysis reveals that the fine-tuned models are more sensitive to perturbations
involving unseen symbols and (to a lesser extent) changes to equation
structure. In addition, we analyse 1.7K equations, and over 200 derivations, to
highlight common reasoning errors such as the inclusion of incorrect,
irrelevant, and redundant equations. Finally, we explore the suitability of
existing metrics for evaluating mathematical derivations and find evidence
that, while they can capture general properties such as sensitivity to
perturbations, they fail to highlight fine-grained reasoning errors and
essential differences between models. Overall, this work demonstrates that
training models on synthetic data may improve their math capabilities beyond
much larger LLMs, but current metrics are not appropriately assessing the
quality of generated mathematical text.
Related papers
- SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Shape Arithmetic Expressions: Advancing Scientific Discovery Beyond Closed-Form Equations [56.78271181959529]
Generalized Additive Models (GAMs) can capture non-linear relationships between variables and targets, but they cannot capture intricate feature interactions.
We propose Shape Expressions Arithmetic ( SHAREs) that fuses GAM's flexible shape functions with the complex feature interactions found in mathematical expressions.
We also design a set of rules for constructing SHAREs that guarantee transparency of the found expressions beyond the standard constraints.
arXiv Detail & Related papers (2024-04-15T13:44:01Z) - Wasserstein proximal operators describe score-based generative models
and resolve memorization [12.321631823103894]
We first formulate SGMs in terms of the Wasserstein proximal operator (WPO)
We show that WPO describes the inductive bias of diffusion and score-based models.
We present an interpretable kernel-based model for the score function which dramatically improves the performance of SGMs.
arXiv Detail & Related papers (2024-02-09T03:33:13Z) - On Least Square Estimation in Softmax Gating Mixture of Experts [78.3687645289918]
We investigate the performance of the least squares estimators (LSE) under a deterministic MoE model.
We establish a condition called strong identifiability to characterize the convergence behavior of various types of expert functions.
Our findings have important practical implications for expert selection.
arXiv Detail & Related papers (2024-02-05T12:31:18Z) - Discovering Interpretable Physical Models using Symbolic Regression and
Discrete Exterior Calculus [55.2480439325792]
We propose a framework that combines Symbolic Regression (SR) and Discrete Exterior Calculus (DEC) for the automated discovery of physical models.
DEC provides building blocks for the discrete analogue of field theories, which are beyond the state-of-the-art applications of SR to physical problems.
We prove the effectiveness of our methodology by re-discovering three models of Continuum Physics from synthetic experimental data.
arXiv Detail & Related papers (2023-10-10T13:23:05Z) - SLEM: Machine Learning for Path Modeling and Causal Inference with Super
Learner Equation Modeling [3.988614978933934]
Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions using observational data.
Path models, Structural Equation Models (SEMs) and Directed Acyclic Graphs (DAGs) provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon.
We propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles.
arXiv Detail & Related papers (2023-08-08T16:04:42Z) - Lorentz group equivariant autoencoders [6.858459233149096]
Lorentz group autoencoder (LGAE)
We develop an autoencoder model equivariant with respect to the proper, orthochronous Lorentz group $mathrmSO+(2,1)$, with a latent space living in the representations of the group.
We present our architecture and several experimental results on jets at the LHC and find it outperforms graph and convolutional neural network baseline models on several compression, reconstruction, and anomaly detection metrics.
arXiv Detail & Related papers (2022-12-14T17:19:46Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - GAM(e) changer or not? An evaluation of interpretable machine learning
models based on additive model constraints [5.783415024516947]
This paper investigates a series of intrinsically interpretable machine learning models.
We evaluate the prediction qualities of five GAMs as compared to six traditional ML models.
arXiv Detail & Related papers (2022-04-19T20:37:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.