Related papers: Tree-Based Representation and Generation of Natural and Mathematical Language

Tree-Based Representation and Generation of Natural and Mathematical Language

URL: http://arxiv.org/abs/2302.07974v1
Date: Wed, 15 Feb 2023 22:38:34 GMT
Title: Tree-Based Representation and Generation of Natural and Mathematical Language
Authors: Alexander Scarlatos and Andrew Lan
Abstract summary: Mathematical language in scientific communications and educational scenarios is important yet relatively understudied. Recent works on mathematical language focus either on representing stand-alone mathematical expressions, or mathematical reasoning in pre-trained natural language models. We propose a series of modifications to existing language models to jointly represent and generate text and math.
Score: 77.34726150561087
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mathematical language in scientific communications and educational scenarios is important yet relatively understudied compared to natural languages. Recent works on mathematical language focus either on representing stand-alone mathematical expressions, especially in their natural tree format, or mathematical reasoning in pre-trained natural language models. Existing works on jointly modeling and generating natural and mathematical languages simply treat mathematical expressions as text, without accounting for the rigid structural properties of mathematical expressions. In this paper, we propose a series of modifications to existing language models to jointly represent and generate text and math: representing mathematical expressions as sequences of node tokens in their operator tree format, using math symbol and tree position embeddings to preserve the semantic and structural properties of mathematical expressions, and using a constrained decoding method to generate mathematically valid expressions. We ground our modifications in GPT-2, resulting in a model MathGPT, and demonstrate that it outperforms baselines on mathematical expression generation tasks.

Related papers

MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training [7.164697875838552]
This study focuses on the development of specialized training datasets to enhance the encoding of mathematical content. We introduce Math Mutator (MAMUT), a framework capable of generating equivalent and falsified versions of a given mathematical formula in notation. Based on MAMUT, we have generated four large mathematical datasets containing diverse notation, which can be used to train language models with enhanced mathematical embeddings.
arXiv Detail & Related papers (2025-02-28T08:53:42Z)
Large Language Models for Mathematicians [53.27302720305432]
Large language models (LLMs) have received immense interest for their general-purpose language understanding and, in particular, their ability to generate high-quality text or computer code. In this note, we discuss to what extent they can aid professional mathematicians.
arXiv Detail & Related papers (2023-12-07T18:59:29Z)
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning [84.12154024070024]
We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks. Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge. A Python interpreter then executes the generated code and prints the output.
arXiv Detail & Related papers (2023-09-19T17:54:21Z)
OntoMath${}^{\mathbf{PRO}}$ 2.0 Ontology: Updates of the Formal Model [68.8204255655161]
The main attention is paid to the development of a formal model for the representation of mathematical statements in the Open Linked Data cloud. The proposed model is intended for applications that extract mathematical facts from natural language mathematical texts and represent these facts as Linked Open Data. The model is used in development of a new version of the OntoMath$mathrmPRO$ ontology of professional mathematics is described.
arXiv Detail & Related papers (2023-03-17T20:29:17Z)
Semantic Representations of Mathematical Expressions in a Continuous Vector Space [0.0]
This work describes an approach for representing mathematical expressions in a continuous vector space. We use the encoder of a sequence-to-sequence architecture, trained on visually different but mathematically equivalent expressions, to generate vector representations.
arXiv Detail & Related papers (2022-10-08T22:33:39Z)
NaturalProver: Grounded Mathematical Proof Generation with Language Models [84.2064569475095]
Theorem proving in natural mathematical language plays a central role in mathematical advances and education. We develop NaturalProver, a language model that generates proofs by conditioning on background references. NaturalProver is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time.
arXiv Detail & Related papers (2022-05-25T17:01:18Z)
Polynomial Graph Parsing with Non-Structural Reentrancies [0.2867517731896504]
Graph-based semantic representations are valuable in natural language processing. We introduce graph extension grammar, which generates graphs with non-structural reentrancies. We provide a parsing algorithm for graph extension grammars, which is proved to be correct and run in time.
arXiv Detail & Related papers (2021-05-05T13:05:01Z)
NaturalProofs: Mathematical Theorem Proving in Natural Language [132.99913141409968]
We develop NaturalProofs, a multi-domain corpus of mathematical statements and their proofs. NaturalProofs unifies broad coverage, deep coverage, and low-resource mathematical sources. We benchmark strong neural methods on mathematical reference retrieval and generation tasks.
arXiv Detail & Related papers (2021-03-24T03:14:48Z)
Natural Language Premise Selection: Finding Supporting Statements for Mathematical Text [3.42658286826597]
We propose a new NLP task, the natural premise selection, which is used to retrieve supporting definitions and supporting propositions. We also make available a dataset, NL-PS, which can be used to evaluate different approaches for the natural premise selection task.
arXiv Detail & Related papers (2020-04-30T17:08:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.