Towards Semantic Markup of Mathematical Documents via User Interaction
- URL: http://arxiv.org/abs/2408.04656v1
- Date: Mon, 5 Aug 2024 12:36:40 GMT
- Title: Towards Semantic Markup of Mathematical Documents via User Interaction
- Authors: Luka Vrečar, Joe Wells, Fairouz Kamareddine,
- Abstract summary: We present an approach to semantic markup of formulas by (semi-)automatically generating grammars from existing s macro definitions and parsing formulas with them.
We also present a GUI-based tool for the disambiguation of parse results and showcase its potential using a grammar for parsing untyped $lambda$-terms.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mathematical documents written in LaTeX often contain ambiguities. We can resolve some of them via semantic markup using, e.g., sTeX, which also has other potential benefits, such as interoperability with computer algebra systems, proof systems, and increased accessibility. However, semantic markup is more involved than "regular" typesetting and presents a challenge for authors of mathematical documents. We aim to smooth out the transition from plain LaTeX to semantic markup by developing semi-automatic tools for authors. In this paper we present an approach to semantic markup of formulas by (semi-)automatically generating grammars from existing sTeX macro definitions and parsing mathematical formulas with them. We also present a GUI-based tool for the disambiguation of parse results and showcase its functionality and potential using a grammar for parsing untyped $\lambda$-terms.
Related papers
- TeXBLEU: Automatic Metric for Evaluate LaTeX Format [4.337656290539519]
We propose BLEU, a metric for evaluating mathematical expressions in the format built on the n-gram-based BLEU metric.
The proposed BLEU consists of a tokenizer trained on the arXiv paper dataset and a fine-tuned embedding model with positional encoding.
arXiv Detail & Related papers (2024-09-10T16:54:32Z) - MathBridge: A Large Corpus Dataset for Translating Spoken Mathematical Expressions into $LaTeX$ Formulas for Improved Readability [10.757551947236879]
We introduce MathBridge, the first extensive dataset for translating mathematical spoken sentences into formulas.
MathBridge significantly enhances the capabilities of pretrained language models for converting to formulas from mathematical spoken sentences.
arXiv Detail & Related papers (2024-08-07T18:07:15Z) - PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer [51.260384040953326]
Handwritten Mathematical Expression Recognition (HMER) has wide applications in human-machine interaction scenarios.
We propose a position forest transformer (PosFormer) for HMER, which jointly optimize two tasks: expression recognition and position recognition.
PosFormer consistently outperforms the state-of-the-art methods 2.03%/1.22%/2, 1.83%, and 4.62% gains on datasets.
arXiv Detail & Related papers (2024-07-10T15:42:58Z) - A General and Flexible Multi-concept Parsing Framework for Multilingual Semantic Matching [60.51839859852572]
We propose to resolve the text into multi concepts for multilingual semantic matching to liberate the model from the reliance on NER models.
We conduct comprehensive experiments on English datasets QQP and MRPC, and Chinese dataset Medical-SM.
arXiv Detail & Related papers (2024-03-05T13:55:16Z) - Syntax-Aware Network for Handwritten Mathematical Expression Recognition [53.130826547287626]
Handwritten mathematical expression recognition (HMER) is a challenging task that has many potential applications.
Recent methods for HMER have achieved outstanding performance with an encoder-decoder architecture.
We propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.
arXiv Detail & Related papers (2022-03-03T09:57:19Z) - Towards Math-Aware Automated Classification and Similarity Search of
Scientific Publications: Methods of Mathematical Content Representations [0.456877715768796]
We investigate mathematical content representations suitable for the automated classification of and the similarity search in STEM documents.
The methods are evaluated on a subset of arXiv.org papers with the Mathematics Subject Classification (MSC) as a reference classification.
arXiv Detail & Related papers (2021-10-08T11:27:40Z) - Disambiguating Symbolic Expressions in Informal Documents [2.423990103106667]
We present a dataset with roughly 33,000 entries.
We describe a methodology using a transformer language model pre-trained on sources obtained from arxiv.org.
We evaluate our model using a plurality of dedicated techniques, taking the syntax and semantics of symbolic expressions into account.
arXiv Detail & Related papers (2021-01-25T10:14:37Z) - TexSmart: A Text Understanding System for Fine-Grained NER and Enhanced
Semantic Analysis [61.28407236720969]
This technique report introduces TexSmart, a text understanding system that supports fine-grained named entity recognition (NER) and enhanced semantic analysis functionalities.
TexSmart holds some unique features. First, the NER function of TexSmart supports over 1,000 entity types, while most other public tools typically support several to (at most) dozens of entity types.
Second, TexSmart introduces new semantic analysis functions like semantic expansion and deep semantic representation, that are absent in most previous systems.
arXiv Detail & Related papers (2020-12-31T14:58:01Z) - Reproducible Science with LaTeX [4.09920839425892]
This paper proposes a procedure to execute external source codes from a document.
It includes the calculation outputs in the resulting Portable Document Format (pdf) file automatically.
arXiv Detail & Related papers (2020-10-04T04:04:07Z) - Automatic Extraction of Rules Governing Morphological Agreement [103.78033184221373]
We develop an automated framework for extracting a first-pass grammatical specification from raw text.
We focus on extracting rules describing agreement, a morphosyntactic phenomenon at the core of the grammars of many of the world's languages.
We apply our framework to all languages included in the Universal Dependencies project, with promising results.
arXiv Detail & Related papers (2020-10-02T18:31:45Z) - Generative Language Modeling for Automated Theorem Proving [94.01137612934842]
This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans might be addressable via generation from language models.
We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance.
arXiv Detail & Related papers (2020-09-07T19:50:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.