Related papers: Semantic Representations of Mathematical Expressions in a Continuous Vector Space

Semantic Representations of Mathematical Expressions in a Continuous Vector Space

URL: http://arxiv.org/abs/2211.08142v3
Date: Sat, 2 Sep 2023 20:35:23 GMT
Title: Semantic Representations of Mathematical Expressions in a Continuous Vector Space
Authors: Neeraj Gangwar, Nickvash Kani
Abstract summary: This work describes an approach for representing mathematical expressions in a continuous vector space. We use the encoder of a sequence-to-sequence architecture, trained on visually different but mathematically equivalent expressions, to generate vector representations.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Mathematical notation makes up a large portion of STEM literature, yet finding semantic representations for formulae remains a challenging problem. Because mathematical notation is precise, and its meaning changes significantly with small character shifts, the methods that work for natural text do not necessarily work well for mathematical expressions. This work describes an approach for representing mathematical expressions in a continuous vector space. We use the encoder of a sequence-to-sequence architecture, trained on visually different but mathematically equivalent expressions, to generate vector representations (or embeddings). We compare this approach with a structural approach that considers visual layout to embed an expression and show that our proposed approach is better at capturing mathematical semantics. Finally, to expedite future research, we publish a corpus of equivalent transcendental and algebraic expression pairs.

Related papers

MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training [7.164697875838552]
This study focuses on the development of specialized training datasets to enhance the encoding of mathematical content. We introduce Math Mutator (MAMUT), a framework capable of generating equivalent and falsified versions of a given mathematical formula in notation. Based on MAMUT, we have generated four large mathematical datasets containing diverse notation, which can be used to train language models with enhanced mathematical embeddings.
arXiv Detail & Related papers (2025-02-28T08:53:42Z)
PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer [51.260384040953326]
Handwritten Mathematical Expression Recognition (HMER) has wide applications in human-machine interaction scenarios. We propose a position forest transformer (PosFormer) for HMER, which jointly optimize two tasks: expression recognition and position recognition. PosFormer consistently outperforms the state-of-the-art methods 2.03%/1.22%/2, 1.83%, and 4.62% gains on datasets.
arXiv Detail & Related papers (2024-07-10T15:42:58Z)
Learning Visual-Semantic Subspace Representations for Propositional Reasoning [49.17165360280794]
We propose a novel approach for learning visual representations that conform to a specified semantic structure. Our approach is based on a new nuclear norm-based loss. We show that its minimum encodes the spectral geometry of the semantics in a subspace lattice.
arXiv Detail & Related papers (2024-05-25T12:51:38Z)
BERT is not The Count: Learning to Match Mathematical Statements with Proofs [34.61792250254876]
The task fits well within current research on Mathematical Information Retrieval and, more generally, mathematical article analysis. We present a dataset consisting of over 180k statement-proof pairs extracted from modern mathematical research articles. We propose a bilinear similarity model and two decoding methods to match statements to proofs effectively.
arXiv Detail & Related papers (2023-02-18T14:48:20Z)
Tree-Based Representation and Generation of Natural and Mathematical Language [77.34726150561087]
Mathematical language in scientific communications and educational scenarios is important yet relatively understudied. Recent works on mathematical language focus either on representing stand-alone mathematical expressions, or mathematical reasoning in pre-trained natural language models. We propose a series of modifications to existing language models to jointly represent and generate text and math.
arXiv Detail & Related papers (2023-02-15T22:38:34Z)
On the Complexity of Representation Learning in Contextual Linear Bandits [110.84649234726442]
We show that representation learning is fundamentally more complex than linear bandits. In particular, learning with a given set of representations is never simpler than learning with the worst realizable representation in the set.
arXiv Detail & Related papers (2022-12-19T13:08:58Z)
Self-Supervised Pretraining of Graph Neural Network for the Retrieval of Related Mathematical Expressions in Scientific Articles [8.942112181408156]
We propose a new approach for retrieval of mathematical expressions based on machine learning. We design an unsupervised representation learning task that combines embedding learning with self-supervised learning. We collect a huge dataset with over 29 million mathematical expressions from over 900,000 publications published on arXiv.org.
arXiv Detail & Related papers (2022-08-22T12:11:30Z)
Syntax-Aware Network for Handwritten Mathematical Expression Recognition [53.130826547287626]
Handwritten mathematical expression recognition (HMER) is a challenging task that has many potential applications. Recent methods for HMER have achieved outstanding performance with an encoder-decoder architecture. We propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.
arXiv Detail & Related papers (2022-03-03T09:57:19Z)
Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning [109.21780441933164]
We propose a hybrid approach to improve systematic generalization in reasoning. We showcase a prototype with algebraic representation for the abstract spatial-temporal task of Raven's Progressive Matrices (RPM) We show that the algebraic representation learned can be decoded by isomorphism to generate an answer.
arXiv Detail & Related papers (2021-11-25T09:56:30Z)
Polynomial Graph Parsing with Non-Structural Reentrancies [0.2867517731896504]
Graph-based semantic representations are valuable in natural language processing. We introduce graph extension grammar, which generates graphs with non-structural reentrancies. We provide a parsing algorithm for graph extension grammars, which is proved to be correct and run in time.
arXiv Detail & Related papers (2021-05-05T13:05:01Z)
A Study of Continuous Vector Representationsfor Theorem Proving [2.0518509649405106]
We develop an encoding that allows for logical properties to be preserved and is additionally reversible. This means that the tree shape of a formula including all symbols can be reconstructed from the dense vector representation. We propose datasets that can be used to train these syntactic and semantic properties.
arXiv Detail & Related papers (2021-01-22T15:04:54Z)
Unsupervised Distillation of Syntactic Information from Contextualized Word Representations [62.230491683411536]
We tackle the task of unsupervised disentanglement between semantics and structure in neural language representations. To this end, we automatically generate groups of sentences which are structurally similar but semantically different. We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics.
arXiv Detail & Related papers (2020-10-11T15:13:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.