Improving Attention-Based Handwritten Mathematical Expression
Recognition with Scale Augmentation and Drop Attention
- URL: http://arxiv.org/abs/2007.10092v1
- Date: Mon, 20 Jul 2020 13:35:09 GMT
- Title: Improving Attention-Based Handwritten Mathematical Expression
Recognition with Scale Augmentation and Drop Attention
- Authors: Zhe Li, Lianwen Jin, Songxuan Lai, Yecheng Zhu
- Abstract summary: Handwritten mathematical expression recognition (HMER) is an important research direction in handwriting recognition.
The performance of HMER suffers from the two-dimensional structure of mathematical expressions (MEs)
We propose a high-performance HMER model with scale augmentation and drop attention.
- Score: 35.82648516972362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Handwritten mathematical expression recognition (HMER) is an important
research direction in handwriting recognition. The performance of HMER suffers
from the two-dimensional structure of mathematical expressions (MEs). To
address this issue, in this paper, we propose a high-performance HMER model
with scale augmentation and drop attention. Specifically, tackling ME with
unstable scale in both horizontal and vertical directions, scale augmentation
improves the performance of the model on MEs of various scales. An
attention-based encoder-decoder network is used for extracting features and
generating predictions. In addition, drop attention is proposed to further
improve performance when the attention distribution of the decoder is not
precise. Compared with previous methods, our method achieves state-of-the-art
performance on two public datasets of CROHME 2014 and CROHME 2016.
Related papers
- Adaptive Masking Enhances Visual Grounding [12.793586888511978]
We propose IMAGE, Interpretative MAsking with Gaussian radiation modEling, to enhance vocabulary grounding in low-shot learning scenarios.
We evaluate the efficacy of our approach on benchmark datasets, including COCO and ODinW, demonstrating its superior performance in zero-shot and few-shot tasks.
arXiv Detail & Related papers (2024-10-04T05:48:02Z) - Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture.
We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z) - Attention Guidance Mechanism for Handwritten Mathematical Expression
Recognition [20.67011291281534]
Handwritten mathematical expression recognition (HMER) is challenging in image-to-text tasks due to the complex layouts of mathematical expressions.
We propose an attention guidance mechanism to explicitly suppress attention weights in irrelevant areas and enhance the appropriate ones.
Our method outperforms existing state-of-the-art methods, achieving expression recognition rates of 60.75% / 61.81% / 63.30% on the CROHME 2014/ 2016/ 2019 datasets.
arXiv Detail & Related papers (2024-03-04T06:22:17Z) - Bidirectional Trained Tree-Structured Decoder for Handwritten
Mathematical Expression Recognition [51.66383337087724]
The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR.
Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models.
We propose the Mirror-Flipped Symbol Layout Tree (MF-SLT) and Bidirectional Asynchronous Training (BAT) structure.
arXiv Detail & Related papers (2023-12-31T09:24:21Z) - Offline Detection of Misspelled Handwritten Words by Convolving
Recognition Model Features with Text Labels [0.0]
We introduce the task of comparing a handwriting image to text.
Our model's classification head is trained entirely on synthetic data created using a state-of-the-art generative adversarial network.
Such massive performance gains can lead to significant productivity increases in applications utilizing human-in-the-loop automation.
arXiv Detail & Related papers (2023-09-18T21:13:42Z) - Semantic Graph Representation Learning for Handwritten Mathematical
Expression Recognition [57.60390958736775]
We propose a simple but efficient method to enhance semantic interaction learning (SIL)
We first construct a semantic graph based on the statistical symbol co-occurrence probabilities.
Then we design a semantic aware module (SAM), which projects the visual and classification feature into semantic space.
Our method achieves better recognition performance than prior arts on both CROHME and HME100K datasets.
arXiv Detail & Related papers (2023-08-21T06:23:41Z) - DenseBAM-GI: Attention Augmented DeneseNet with momentum aided GRU for
HMER [4.518012967046983]
It is difficult to accurately determine the length and complex spatial relationships among symbols in handwritten mathematical expressions.
In this study, we present a novel encoder-decoder architecture (DenseBAM-GI) for HMER.
The proposed model is an efficient and lightweight architecture with performance equivalent to state-of-the-art models in terms of Expression Recognition Rate (exprate)
arXiv Detail & Related papers (2023-06-28T18:12:23Z) - When Counting Meets HMER: Counting-Aware Network for Handwritten
Mathematical Expression Recognition [57.51793420986745]
We propose an unconventional network for handwritten mathematical expression recognition (HMER) named Counting-Aware Network (CAN)
We design a weakly-supervised counting module that can predict the number of each symbol class without the symbol-level position annotations.
Experiments on the benchmark datasets for HMER validate that both joint optimization and counting results are beneficial for correcting the prediction errors of encoder-decoder models.
arXiv Detail & Related papers (2022-07-23T08:39:32Z) - GraphCoCo: Graph Complementary Contrastive Learning [65.89743197355722]
Graph Contrastive Learning (GCL) has shown promising performance in graph representation learning (GRL) without the supervision of manual annotations.
This paper proposes an effective graph complementary contrastive learning approach named GraphCoCo to tackle the above issue.
arXiv Detail & Related papers (2022-03-24T02:58:36Z) - Transferring Dual Stochastic Graph Convolutional Network for Facial
Micro-expression Recognition [7.62031665958404]
This paper presents a transferring dual Graph Convolutional Network (GCN) model.
We propose a graph construction method and dual graph convolutional network to extract more discriminative features from the micro-expression images.
Our proposed method achieves state-of-the-art performance on recently released MMEW benchmarks.
arXiv Detail & Related papers (2022-03-10T07:41:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.