Related papers: A Transformer-based Math Language Model for Handwritten Math Expression Recognition

A Transformer-based Math Language Model for Handwritten Math Expression Recognition

URL: http://arxiv.org/abs/2108.05002v1
Date: Wed, 11 Aug 2021 03:03:48 GMT
Title: A Transformer-based Math Language Model for Handwritten Math Expression Recognition
Authors: Huy Quang Ung, Cuong Tuan Nguyen, Hung Tuan Nguyen, Thanh-Nghia Truong and Masaki Nakagawa
Abstract summary: Math symbols are very similar in the writing style, such as dot and comma or 0, O, and o. This paper presents a Transformer-based Math Language Model (TMLM) TMLM achieved the perplexity of 4.42, which outperformed the previous math language models.
Score: 7.202733269706245
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Handwritten mathematical expressions (HMEs) contain ambiguities in their interpretations, even for humans sometimes. Several math symbols are very similar in the writing style, such as dot and comma or 0, O, and o, which is a challenge for HME recognition systems to handle without using contextual information. To address this problem, this paper presents a Transformer-based Math Language Model (TMLM). Based on the self-attention mechanism, the high-level representation of an input token in a sequence of tokens is computed by how it is related to the previous tokens. Thus, TMLM can capture long dependencies and correlations among symbols and relations in a mathematical expression (ME). We trained the proposed language model using a corpus of approximately 70,000 LaTeX sequences provided in CROHME 2016. TMLM achieved the perplexity of 4.42, which outperformed the previous math language models, i.e., the N-gram and recurrent neural network-based language models. In addition, we combine TMLM into a stochastic context-free grammar-based HME recognition system using a weighting parameter to re-rank the top-10 best candidates. The expression rates on the testing sets of CROHME 2016 and CROHME 2019 were improved by 2.97 and 0.83 percentage points, respectively.

Related papers

Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models [92.18057318458528]
Token-Shuffle is a novel method that reduces the number of image tokens in Transformer. Our strategy requires no additional pretrained text-encoder and enables MLLMs to support extremely high-resolution image synthesis. In GenAI-benchmark, our 2.7B model achieves 0.77 overall score on hard prompts, outperforming AR models LlamaGen by 0.18 and diffusion models LDM by 0.15.
arXiv Detail & Related papers (2025-04-24T17:59:56Z)
NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition [80.22784377150465]
Handwritten Mathematical Expression Recognition (HMER) has gained considerable attention in pattern recognition for its diverse applications in document understanding. This paper makes the first attempt to build a novel bottom-up Non-AutoRegressive Modeling approach for HMER, called NAMER. NAMER comprises a Visual Aware Tokenizer (VAT) and a Parallel Graph (PGD)
arXiv Detail & Related papers (2024-07-16T04:52:39Z)
PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer [51.260384040953326]
Handwritten Mathematical Expression Recognition (HMER) has wide applications in human-machine interaction scenarios. We propose a position forest transformer (PosFormer) for HMER, which jointly optimize two tasks: expression recognition and position recognition. PosFormer consistently outperforms the state-of-the-art methods 2.03%/1.22%/2, 1.83%, and 4.62% gains on datasets.
arXiv Detail & Related papers (2024-07-10T15:42:58Z)
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens [49.569695524535454]
We propose to represent speech with supervised semantic tokens, which are derived from a multilingual speech recognition model by inserting vector quantization into the encoder. Based on the tokens, we further propose a scalable zero-shot TTS synthesizer, CosyVoice, which consists of an LLM for text-to-token generation and a conditional flow matching model for token-to-speech synthesis.
arXiv Detail & Related papers (2024-07-07T15:16:19Z)
ICAL: Implicit Character-Aided Learning for Enhanced Handwritten Mathematical Expression Recognition [9.389169879626428]
This paper introduces a novel approach, Implicit Character-Aided Learning (ICAL), to mine the global expression information. By modeling and utilizing implicit character information, ICAL achieves a more accurate and context-aware interpretation of handwritten mathematical expressions.
arXiv Detail & Related papers (2024-05-15T02:03:44Z)
Which Syntactic Capabilities Are Statistically Learned by Masked Language Models for Code? [51.29970742152668]
We highlight relying on accuracy-based measurements may lead to an overestimation of models' capabilities. To address these issues, we introduce a technique called SyntaxEval in Syntactic Capabilities.
arXiv Detail & Related papers (2024-01-03T02:44:02Z)
Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning [53.2491163874712]
We use equations as IMRs to solve the numerical reasoning task. We present a method called Boosting Numerical Reasontextbfing by Decomposing the Generation of Equations (Bridge) Our method improves the performance by 2.2%, 0.9%, and 1.7% on GSM8K, SVAMP, and Algebra datasets.
arXiv Detail & Related papers (2023-08-21T09:35:33Z)
Offline Handwritten Mathematical Recognition using Adversarial Learning and Transformers [3.9220281834178463]
offline HMER is often viewed as a much harder problem as compared to online HMER. In this paper, we purpose a encoder-decoder model that uses paired adversarial learning. We have been able to improve latest CROHME 2019 test set results by 4% approx.
arXiv Detail & Related papers (2022-08-20T11:45:02Z)
Syntax-Aware Network for Handwritten Mathematical Expression Recognition [53.130826547287626]
Handwritten mathematical expression recognition (HMER) is a challenging task that has many potential applications. Recent methods for HMER have achieved outstanding performance with an encoder-decoder architecture. We propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.
arXiv Detail & Related papers (2022-03-03T09:57:19Z)
Mathematical Word Problem Generation from Commonsense Knowledge Graph and Equations [27.063577644162358]
We develop an end-to-end neural model to generate diverse MWPs in real-world scenarios from commonsense knowledge graph and equations. The proposed model learns both representations from edge-enhanced Levi graphs of symbolic equations and commonsense knowledge. Experiments on an educational gold-standard set and a large-scale generated MWP set show that our approach is superior on the MWP generation task.
arXiv Detail & Related papers (2020-10-13T06:31:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.