Mathematical Data Science
- URL: http://arxiv.org/abs/2502.08620v1
- Date: Wed, 12 Feb 2025 18:15:35 GMT
- Title: Mathematical Data Science
- Authors: Michael R. Douglas, Kyu-Hwan Lee,
- Abstract summary: We discuss an approach to doing this which one can call "mathematical data science"<n>In this paradigm, one studies mathematical objects collectively rather than individually, by creating datasets and doing machine learning experiments and interpretations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Can machine learning help discover new mathematical structures? In this article we discuss an approach to doing this which one can call "mathematical data science". In this paradigm, one studies mathematical objects collectively rather than individually, by creating datasets and doing machine learning experiments and interpretations. After an overview, we present two case studies: murmurations in number theory and loadings of partitions related to Kronecker coefficients in representation theory and combinatorics.
Related papers
- Machine Learning meets Algebraic Combinatorics: A Suite of Datasets Capturing Research-level Conjecturing Ability in Pure Mathematics [4.229995708813431]
We introduce a new collection of datasets, the Algebraic Combinatorics dataset Repository (ACD Repo)
Each dataset includes an open-ended research-level question and a large collection of examples.
We describe all nine datasets, the different ways machine learning models can be applied to them.
arXiv Detail & Related papers (2025-03-09T00:11:40Z) - MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training [7.164697875838552]
This study focuses on the development of specialized training datasets to enhance the encoding of mathematical content.<n>We introduce Math Mutator (MAMUT), a framework capable of generating equivalent and falsified versions of a given mathematical formula in notation.<n>Experiments show that models trained on these datasets exhibit new SoTA performance on mathematical retrieval tasks.
arXiv Detail & Related papers (2025-02-28T08:53:42Z) - Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning [85.635988711588]
We argue that enhancing the capabilities of large language models requires a paradigm shift in the design of mathematical datasets.<n>We advocate for mathematical dataset developers to consider the concept of "motivated proof", introduced by G. P'olya in 1949, which can serve as a blueprint for datasets that offer a better proof learning signal.<n>We provide a questionnaire designed specifically for math datasets that we urge creators to include with their datasets.
arXiv Detail & Related papers (2024-12-19T18:55:17Z) - Review and Prospect of Algebraic Research in Equivalent Framework between Statistical Mechanics and Machine Learning Theory [0.0]
This paper is devoted to the memory of Professor Huzihiro Araki who is a pioneer founder of algebraic research in both statistical mechanics and quantum field theory.
arXiv Detail & Related papers (2024-05-31T11:04:13Z) - Machine learning and information theory concepts towards an AI
Mathematician [77.63761356203105]
The current state-of-the-art in artificial intelligence is impressive, especially in terms of mastery of language, but not so much in terms of mathematical reasoning.
This essay builds on the idea that current deep learning mostly succeeds at system 1 abilities.
It takes an information-theoretical posture to ask questions about what constitutes an interesting mathematical statement.
arXiv Detail & Related papers (2024-03-07T15:12:06Z) - OntoMath${}^{\mathbf{PRO}}$ 2.0 Ontology: Updates of the Formal Model [68.8204255655161]
The main attention is paid to the development of a formal model for the representation of mathematical statements in the Open Linked Data cloud.
The proposed model is intended for applications that extract mathematical facts from natural language mathematical texts and represent these facts as Linked Open Data.
The model is used in development of a new version of the OntoMath$mathrmPRO$ ontology of professional mathematics is described.
arXiv Detail & Related papers (2023-03-17T20:29:17Z) - How Do Transformers Learn Topic Structure: Towards a Mechanistic
Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure"
We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z) - Tree-Based Representation and Generation of Natural and Mathematical
Language [77.34726150561087]
Mathematical language in scientific communications and educational scenarios is important yet relatively understudied.
Recent works on mathematical language focus either on representing stand-alone mathematical expressions, or mathematical reasoning in pre-trained natural language models.
We propose a series of modifications to existing language models to jointly represent and generate text and math.
arXiv Detail & Related papers (2023-02-15T22:38:34Z) - A Survey of Deep Learning for Mathematical Reasoning [71.88150173381153]
We review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade.
Recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning.
arXiv Detail & Related papers (2022-12-20T18:46:16Z) - Self-Supervised Pretraining of Graph Neural Network for the Retrieval of
Related Mathematical Expressions in Scientific Articles [8.942112181408156]
We propose a new approach for retrieval of mathematical expressions based on machine learning.
We design an unsupervised representation learning task that combines embedding learning with self-supervised learning.
We collect a huge dataset with over 29 million mathematical expressions from over 900,000 publications published on arXiv.org.
arXiv Detail & Related papers (2022-08-22T12:11:30Z) - Machine-Learning Mathematical Structures [0.0]
We present a comparative study of the accuracies on different problems.
The paradigm should be useful for conjecture formulation, finding more efficient methods of computation, as well as probing into certain hierarchy of structures in mathematics.
arXiv Detail & Related papers (2021-01-15T22:48:19Z) - Noisy Deductive Reasoning: How Humans Construct Math, and How Math
Constructs Universes [0.5874142059884521]
We present a computational model of mathematical reasoning according to which mathematics is a fundamentally process.
We show that this framework gives a compelling account of several aspects of mathematical practice.
arXiv Detail & Related papers (2020-10-28T19:43:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.