Tree-Based Representation and Generation of Natural and Mathematical
Language
- URL: http://arxiv.org/abs/2302.07974v1
- Date: Wed, 15 Feb 2023 22:38:34 GMT
- Title: Tree-Based Representation and Generation of Natural and Mathematical
Language
- Authors: Alexander Scarlatos and Andrew Lan
- Abstract summary: Mathematical language in scientific communications and educational scenarios is important yet relatively understudied.
Recent works on mathematical language focus either on representing stand-alone mathematical expressions, or mathematical reasoning in pre-trained natural language models.
We propose a series of modifications to existing language models to jointly represent and generate text and math.
- Score: 77.34726150561087
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mathematical language in scientific communications and educational scenarios
is important yet relatively understudied compared to natural languages. Recent
works on mathematical language focus either on representing stand-alone
mathematical expressions, especially in their natural tree format, or
mathematical reasoning in pre-trained natural language models. Existing works
on jointly modeling and generating natural and mathematical languages simply
treat mathematical expressions as text, without accounting for the rigid
structural properties of mathematical expressions. In this paper, we propose a
series of modifications to existing language models to jointly represent and
generate text and math: representing mathematical expressions as sequences of
node tokens in their operator tree format, using math symbol and tree position
embeddings to preserve the semantic and structural properties of mathematical
expressions, and using a constrained decoding method to generate mathematically
valid expressions. We ground our modifications in GPT-2, resulting in a model
MathGPT, and demonstrate that it outperforms baselines on mathematical
expression generation tasks.
Related papers
- Large Language Models for Mathematicians [53.27302720305432]
Large language models (LLMs) have received immense interest for their general-purpose language understanding and, in particular, their ability to generate high-quality text or computer code.
In this note, we discuss to what extent they can aid professional mathematicians.
arXiv Detail & Related papers (2023-12-07T18:59:29Z) - Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning [84.12154024070024]
We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks.
Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge.
A Python interpreter then executes the generated code and prints the output.
arXiv Detail & Related papers (2023-09-19T17:54:21Z) - OntoMath${}^{\mathbf{PRO}}$ 2.0 Ontology: Updates of the Formal Model [68.8204255655161]
The main attention is paid to the development of a formal model for the representation of mathematical statements in the Open Linked Data cloud.
The proposed model is intended for applications that extract mathematical facts from natural language mathematical texts and represent these facts as Linked Open Data.
The model is used in development of a new version of the OntoMath$mathrmPRO$ ontology of professional mathematics is described.
arXiv Detail & Related papers (2023-03-17T20:29:17Z) - Semantic Representations of Mathematical Expressions in a Continuous
Vector Space [0.0]
This work describes an approach for representing mathematical expressions in a continuous vector space.
We use the encoder of a sequence-to-sequence architecture, trained on visually different but mathematically equivalent expressions, to generate vector representations.
arXiv Detail & Related papers (2022-10-08T22:33:39Z) - NaturalProver: Grounded Mathematical Proof Generation with Language
Models [84.2064569475095]
Theorem proving in natural mathematical language plays a central role in mathematical advances and education.
We develop NaturalProver, a language model that generates proofs by conditioning on background references.
NaturalProver is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time.
arXiv Detail & Related papers (2022-05-25T17:01:18Z) - Polynomial Graph Parsing with Non-Structural Reentrancies [0.2867517731896504]
Graph-based semantic representations are valuable in natural language processing.
We introduce graph extension grammar, which generates graphs with non-structural reentrancies.
We provide a parsing algorithm for graph extension grammars, which is proved to be correct and run in time.
arXiv Detail & Related papers (2021-05-05T13:05:01Z) - NaturalProofs: Mathematical Theorem Proving in Natural Language [132.99913141409968]
We develop NaturalProofs, a multi-domain corpus of mathematical statements and their proofs.
NaturalProofs unifies broad coverage, deep coverage, and low-resource mathematical sources.
We benchmark strong neural methods on mathematical reference retrieval and generation tasks.
arXiv Detail & Related papers (2021-03-24T03:14:48Z) - Natural Language Premise Selection: Finding Supporting Statements for
Mathematical Text [3.42658286826597]
We propose a new NLP task, the natural premise selection, which is used to retrieve supporting definitions and supporting propositions.
We also make available a dataset, NL-PS, which can be used to evaluate different approaches for the natural premise selection task.
arXiv Detail & Related papers (2020-04-30T17:08:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.