Related papers: Extracting Mathematical Concepts with Large Language Models

Extracting Mathematical Concepts with Large Language Models

URL: http://arxiv.org/abs/2309.00642v1
Date: Tue, 29 Aug 2023 20:54:50 GMT
Title: Extracting Mathematical Concepts with Large Language Models
Authors: Valeria de Paiva, Qiyue Gao, Pavel Kovalev, and Lawrence S. Moss
Abstract summary: We aim for automatic extraction of terms in one mathematical field, category theory, using as a corpus the 755 abstracts from a snapshot of the online journal "Theory and Applications of Categories", circa 2020. We provide a more thorough analysis of what makes mathematical term extraction a difficult problem to begin with. We introduce a new annotation tool to help humans with ATE, applicable to any mathematical field and even beyond mathematics.
Score: 6.371906893858652
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We extract mathematical concepts from mathematical text using generative large language models (LLMs) like ChatGPT, contributing to the field of automatic term extraction (ATE) and mathematical text processing, and also to the study of LLMs themselves. Our work builds on that of others in that we aim for automatic extraction of terms (keywords) in one mathematical field, category theory, using as a corpus the 755 abstracts from a snapshot of the online journal "Theory and Applications of Categories", circa 2020. Where our study diverges from previous work is in (1) providing a more thorough analysis of what makes mathematical term extraction a difficult problem to begin with; (2) paying close attention to inter-annotator disagreements; (3) providing a set of guidelines which both human and machine annotators could use to standardize the extraction process; (4) introducing a new annotation tool to help humans with ATE, applicable to any mathematical field and even beyond mathematics; (5) using prompts to ChatGPT as part of the extraction process, and proposing best practices for such prompts; and (6) raising the question of whether ChatGPT could be used as an annotator on the same level as human experts. Our overall findings are that the matter of mathematical ATE is an interesting field which can benefit from participation by LLMs, but LLMs themselves cannot at this time surpass human performance on it.

Related papers

MathlibLemma: Folklore Lemma Generation and Benchmark for Formal Mathematics [51.65118519899584]
We introduce MathlibLemma, the first LLM-based multi-agent system to automate the discovery and formalization of mathematical folklore lemmas.<n>We further construct the MathlibLemma benchmark, a suite of 4,028 type-checked Lean statements spanning a broad range of mathematical domains.
arXiv Detail & Related papers (2026-01-30T15:24:42Z)
ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning [54.70811660561151]
Existing math datasets evaluate the reasoning abilities of large language models (LLMs) by either using the final answer or the intermediate reasoning steps derived from static examples. We seek to use symbolic programs as a means for automated evaluation if a model can consistently produce correct final answers across various inputs to the program. We observe significant accuracy drops using our proposed evaluation compared with original static examples, suggesting the fragility of math reasoning in state-of-the-art LLMs.
arXiv Detail & Related papers (2024-10-24T18:02:37Z)
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs [34.498175178707065]
We propose a novel large-scale and diverse Math Informed syNthetic Dialogue (MIND) generation method. MIND generates synthetic conversations based on OpenWebMath (OWM), resulting in a new math corpus, MIND-OWM. Our experiments with different conversational settings reveal that incorporating knowledge gaps between dialog participants is essential for generating high-quality math data.
arXiv Detail & Related papers (2024-10-15T18:25:53Z)
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs [61.74749961334557]
MathHay is an automated benchmark designed to assess the long-context mathematical reasoning capabilities of LLMs. We conduct extensive experiments on MathHay to assess the long-context mathematical reasoning abilities of eight top-performing models.
arXiv Detail & Related papers (2024-10-07T02:30:07Z)
Mathematical Entities: Corpora and Benchmarks [0.8766411351797883]
There has been relatively little research on natural language processing for mathematical texts. We provide annotated corpora that can be used to study the language of mathematics in different contexts.
arXiv Detail & Related papers (2024-06-17T14:11:00Z)
MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning [2.9104279358536647]
We present MathSensei, a tool-augmented large language model for mathematical reasoning. We study the complementary benefits of the tools - knowledge retriever (Bing Web Search), program generator + executor (Python), and symbolic equation solver (Wolfram-Alpha API)
arXiv Detail & Related papers (2024-02-27T05:50:35Z)
Exploring the Potential of Large Language Models in Computational Argumentation [54.85665903448207]
Large language models (LLMs) have demonstrated impressive capabilities in understanding context and generating natural language. This work aims to embark on an assessment of LLMs, such as ChatGPT, Flan models, and LLaMA2 models, in both zero-shot and few-shot settings.
arXiv Detail & Related papers (2023-11-15T15:12:15Z)
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct [128.89645483139236]
We present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Our model even surpasses ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k, simultaneously surpasses Text-davinci, PaLM-1 and GPT-3 on MATH.
arXiv Detail & Related papers (2023-08-18T14:23:21Z)
Evaluating Language Models for Mathematics through Interactions [116.67206980096513]
We introduce CheckMate, a prototype platform for humans to interact with and evaluate large language models (LLMs) We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics. We derive a taxonomy of human behaviours and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness.
arXiv Detail & Related papers (2023-06-02T17:12:25Z)
JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding [74.12405417718054]
This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model(PLM) Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement. We design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
arXiv Detail & Related papers (2022-06-13T17:03:52Z)
Learning to Match Mathematical Statements with Proofs [37.38969121408295]
The task is designed to improve the processing of research-level mathematical texts. We release a dataset for the task, consisting of over 180k statement-proof pairs. We show that considering the assignment problem globally and using weighted bipartite matching algorithms helps a lot in tackling the task.
arXiv Detail & Related papers (2021-02-03T15:38:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.