Related papers: Using Large Language Models to Study Mathematical Practice

Using Large Language Models to Study Mathematical Practice

URL: http://arxiv.org/abs/2507.02873v1
Date: Mon, 16 Jun 2025 20:22:50 GMT
Title: Using Large Language Models to Study Mathematical Practice
Authors: William D'Alessandro,
Abstract summary: This paper reports the results of a corpus study facilitated by Google's Gemini 2.5 Pro.<n>Based on a sample of 5000 mathematics papers from arXiv.org, the experiments yielded a dataset of hundreds of useful annotated examples.<n>It also seeks to begin a conversation about these methods as research tools in practice-oriented philosophy.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The philosophy of mathematical practice (PMP) looks to evidence from working mathematics to help settle philosophical questions. One prominent program under the PMP banner is the study of explanation in mathematics, which aims to understand what sorts of proofs mathematicians consider explanatory and what role the pursuit of explanation plays in mathematical practice. In an effort to address worries about cherry-picked examples and file-drawer problems in PMP, a handful of authors have recently turned to corpus analysis methods as a promising alternative to small-scale case studies. This paper reports the results from such a corpus study facilitated by Google's Gemini 2.5 Pro, a model whose reasoning capabilities, advances in hallucination control and large context window allow for the accurate analysis of hundreds of pages of text per query. Based on a sample of 5000 mathematics papers from arXiv.org, the experiments yielded a dataset of hundreds of useful annotated examples. Its aim was to gain insight on questions like the following: How often do mathematicians make claims about explanation in the relevant sense? Do mathematicians' explanatory practices vary in any noticeable way by subject matter? Which philosophical theories of explanation are most consistent with a large body of non-cherry-picked examples? How might philosophers make further use of AI tools to gain insights from large datasets of this kind? As the first PMP study making extensive use of LLM methods, it also seeks to begin a conversation about these methods as research tools in practice-oriented philosophy and to evaluate the strengths and weaknesses of current models for such work.

Related papers

Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models [76.6028674686018]
We introduce thought-tracing, an inference-time reasoning algorithm to trace the mental states of agents.<n>Our algorithm is modeled after the Bayesian theory-of-mind framework.<n>We evaluate thought-tracing on diverse theory-of-mind benchmarks, demonstrating significant performance improvements.
arXiv Detail & Related papers (2025-02-17T15:08:50Z)
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs [57.48325300739872]
Leveraging mathematical Large Language Models for proof generation is a fundamental topic in LLMs research.<n>We argue that the ability of current LLMs to prove statements largely depends on whether they have encountered the relevant proof process during training.<n>Inspired by the pedagogical method of "proof by counterexamples" commonly used in human mathematics education, our work aims to enhance LLMs' ability to conduct mathematical reasoning and proof through counterexamples.
arXiv Detail & Related papers (2025-02-12T02:01:10Z)
Learning Formal Mathematics From Intrinsic Motivation [34.986025832497255]
Minimo is an agent that learns to pose problems for itself (conjecturing) and solve them (theorem proving) We combine methods for constrained decoding and type-directed synthesis to sample valid conjectures from a language model. Our agent targets generating hard but provable conjectures - a moving target, since its own theorem proving ability also improves as it trains.
arXiv Detail & Related papers (2024-06-30T13:34:54Z)
Machine learning and information theory concepts towards an AI Mathematician [77.63761356203105]
The current state-of-the-art in artificial intelligence is impressive, especially in terms of mastery of language, but not so much in terms of mathematical reasoning. This essay builds on the idea that current deep learning mostly succeeds at system 1 abilities. It takes an information-theoretical posture to ask questions about what constitutes an interesting mathematical statement.
arXiv Detail & Related papers (2024-03-07T15:12:06Z)
MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning [2.9104279358536647]
We present MathSensei, a tool-augmented large language model for mathematical reasoning. We study the complementary benefits of the tools - knowledge retriever (Bing Web Search), program generator + executor (Python), and symbolic equation solver (Wolfram-Alpha API)
arXiv Detail & Related papers (2024-02-27T05:50:35Z)
math-PVS: A Large Language Model Framework to Map Scientific Publications to PVS Theories [10.416375584563728]
This work investigates the applicability of large language models (LLMs) in formalizing advanced mathematical concepts. We envision an automated process, called emphmath-PVS, to extract and formalize mathematical theorems from research papers.
arXiv Detail & Related papers (2023-10-25T23:54:04Z)
Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models [107.07851578154242]
Language models (LMs) have strong multi-step (i.e., procedural) reasoning capabilities. It is unclear whether LMs perform tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism. We show that MechanisticProbe is able to detect the information of the reasoning tree from the model's attentions for most examples.
arXiv Detail & Related papers (2023-10-23T01:47:29Z)
Towards a Holistic Understanding of Mathematical Questions with Contrastive Pre-training [65.10741459705739]
We propose a novel contrastive pre-training approach for mathematical question representations, namely QuesCo. We first design two-level question augmentations, including content-level and structure-level, which generate literally diverse question pairs with similar purposes. Then, to fully exploit hierarchical information of knowledge concepts, we propose a knowledge hierarchy-aware rank strategy.
arXiv Detail & Related papers (2023-01-18T14:23:29Z)
JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding [74.12405417718054]
This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model(PLM) Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement. We design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
arXiv Detail & Related papers (2022-06-13T17:03:52Z)
Noisy Deductive Reasoning: How Humans Construct Math, and How Math Constructs Universes [0.5874142059884521]
We present a computational model of mathematical reasoning according to which mathematics is a fundamentally process. We show that this framework gives a compelling account of several aspects of mathematical practice.
arXiv Detail & Related papers (2020-10-28T19:43:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.