Related papers: MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition

MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition

URL: http://arxiv.org/abs/2404.13667v1
Date: Sun, 21 Apr 2024 14:03:34 GMT
Title: MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition
Authors: Felix M. Schmitt-Koopmann, Elaine M. Huang, Hans-Peter Hutter, Thilo Stadelmann, Alireza Darvishy,
Abstract summary: We present an improved version of the benchmark dataset im2latex-100k, featuring 30 fonts instead of one. Second, we introduce the real-world dataset realFormula, with MEs extracted from papers. Third, we developed a MER model, MathNet, based on a convolutional vision transformer, with superior results on all four test sets.
Score: 2.325171167252542
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Printed mathematical expression recognition (MER) models are usually trained and tested using LaTeX-generated mathematical expressions (MEs) as input and the LaTeX source code as ground truth. As the same ME can be generated by various different LaTeX source codes, this leads to unwanted variations in the ground truth data that bias test performance results and hinder efficient learning. In addition, the use of only one font to generate the MEs heavily limits the generalization of the reported results to realistic scenarios. We propose a data-centric approach to overcome this problem, and present convincing experimental results: Our main contribution is an enhanced LaTeX normalization to map any LaTeX ME to a canonical form. Based on this process, we developed an improved version of the benchmark dataset im2latex-100k, featuring 30 fonts instead of one. Second, we introduce the real-world dataset realFormula, with MEs extracted from papers. Third, we developed a MER model, MathNet, based on a convolutional vision transformer, with superior results on all four test sets (im2latex-100k, im2latexv2, realFormula, and InftyMDB-1), outperforming the previous state of the art by up to 88.3%.

Related papers

$A^2R^2$: Advancing Img2LaTeX Conversion via Visual Reasoning with Attention-Guided Refinement [48.856390495568114]
Vision-language models (VLMs) have demonstrated strong performance across a variety of visual understanding tasks.<n>We propose $A2R2$: Advancing Img2La Conversion via Visual Reasoning with Attention-Guided Refinement.<n>For effective evaluation, we introduce a new dataset, Img2La-TexHard-1K, consisting of 1,100 carefully curated and challenging examples.
arXiv Detail & Related papers (2025-07-28T14:41:57Z)
NeuRaLaTeX: A machine learning library written in pure LaTeX [15.978130916451295]
We introduce NeuRaLa, which we believe to be the first deep learning library written entirely in rhyme. As part of your document you can specify the architecture of a neural network and its loss functions. When the document is compiled, the compiler will generate or load training data, train the network, run experiments, and generate figures. The paper took 48 hours to compile and the entire source code for NeuRaLa is contained within the source code of the paper.
arXiv Detail & Related papers (2025-03-31T15:05:19Z)
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis [56.00885545573299]
We introduce LeX-Art, a comprehensive suite for high-quality text-image synthesis. Our approach follows a data-centric paradigm, constructing a high-quality data synthesis pipeline based on Deepseek-R1. We develop LeX-Enhancer, a robust prompt enrichment model, and train two text-to-image models, LeX-FLUX and LeX-Lumina.
arXiv Detail & Related papers (2025-03-27T17:56:15Z)
LATTE: Improving Latex Recognition for Tables and Formulae with Iterative Refinement [11.931911831112357]
LATTE improves the source extraction accuracy of both formulae and tables, outperforming existing techniques as well as GPT-4V. This paper proposes LATTE, the first iterative refinement framework for recognition.
arXiv Detail & Related papers (2024-09-21T17:18:49Z)
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning [58.7966588457529]
InfiMM-WebMath-40B is a high-quality dataset of interleaved image-text documents. It comprises 24 million web pages, 85 million associated image URLs, and 40 billion text tokens, all meticulously extracted and filtered from CommonCrawl. Our evaluations on text-only benchmarks show that, despite utilizing only 40 billion tokens, our dataset significantly enhances the performance of our 1.3B model. Our models set a new state-of-the-art among open-source models on multi-modal math benchmarks such as MathVerse and We-Math.
arXiv Detail & Related papers (2024-09-19T08:41:21Z)
TeXBLEU: Automatic Metric for Evaluate LaTeX Format [4.337656290539519]
We propose BLEU, a metric for evaluating mathematical expressions in the format built on the n-gram-based BLEU metric. The proposed BLEU consists of a tokenizer trained on the arXiv paper dataset and a fine-tuned embedding model with positional encoding.
arXiv Detail & Related papers (2024-09-10T16:54:32Z)
MathBridge: A Large Corpus Dataset for Translating Spoken Mathematical Expressions into $LaTeX$ Formulas for Improved Readability [10.757551947236879]
We introduce MathBridge, the first extensive dataset for translating mathematical spoken sentences into formulas. MathBridge significantly enhances the capabilities of pretrained language models for converting to formulas from mathematical spoken sentences.
arXiv Detail & Related papers (2024-08-07T18:07:15Z)
ICAL: Implicit Character-Aided Learning for Enhanced Handwritten Mathematical Expression Recognition [9.389169879626428]
This paper introduces a novel approach, Implicit Character-Aided Learning (ICAL), to mine the global expression information. By modeling and utilizing implicit character information, ICAL achieves a more accurate and context-aware interpretation of handwritten mathematical expressions.
arXiv Detail & Related papers (2024-05-15T02:03:44Z)
RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning [69.23782518456932]
We propose a novel zero-shot video captioning framework named Retrieval-Enhanced Test-Time Adaptation (RETTA) We bridge video and text using four key models: a general video-text retrieval model XCLIP, a general image-text matching model CLIP, a text alignment model AnglE, and a text generation model GPT-2. To address this problem, we propose using learnable tokens as a communication medium among these four frozen models GPT-2, XCLIP, CLIP, and AnglE.
arXiv Detail & Related papers (2024-05-11T16:22:00Z)
MathWriting: A Dataset For Handwritten Mathematical Expression Recognition [0.9012198585960439]
MathWriting is the largest online handwritten mathematical expression dataset to date. One MathWriting sample consists of a formula written on a touch screen and a corresponding expression. This dataset can also be used in its rendered form for offline HME recognition.
arXiv Detail & Related papers (2024-04-16T16:10:23Z)
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning [52.97768001837269]
We present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems.
arXiv Detail & Related papers (2023-10-05T17:52:09Z)
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models [91.66694225955872]
We propose MetaMath, a fine-tuned language model that specializes in mathematical reasoning. Specifically, we start by bootstrapping mathematical questions by rewriting the question from multiple perspectives without extra knowledge. We release all the MetaMathQA dataset, the MetaMath models with different model sizes and the training code for public use.
arXiv Detail & Related papers (2023-09-21T17:45:42Z)
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct [128.89645483139236]
We present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Our model even surpasses ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k, simultaneously surpasses Text-davinci, PaLM-1 and GPT-3 on MATH.
arXiv Detail & Related papers (2023-08-18T14:23:21Z)
GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation [76.7772833556714]
We introduce GENIUS: a conditional text generation model using sketches as input. GENIUS is pre-trained on a large-scale textual corpus with a novel reconstruction from sketch objective. We show that GENIUS can be used as a strong and ready-to-use data augmentation tool for various natural language processing (NLP) tasks.
arXiv Detail & Related papers (2022-11-18T16:39:45Z)
Syntax-Aware Network for Handwritten Mathematical Expression Recognition [53.130826547287626]
Handwritten mathematical expression recognition (HMER) is a challenging task that has many potential applications. Recent methods for HMER have achieved outstanding performance with an encoder-decoder architecture. We propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.
arXiv Detail & Related papers (2022-03-03T09:57:19Z)
Machine Translation of Mathematical Text [0.0]
We have implemented a machine translation system, the PolyMath Translator, for documents containing mathematical text. The current implementation translates English to French, attaining a BLEU score of 53.5 on a held-out test corpus of mathematical sentences. It produces documents that can be compiled to PDF without further editing.
arXiv Detail & Related papers (2020-10-11T11:59:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.