Attention Guidance Mechanism for Handwritten Mathematical Expression
Recognition
- URL: http://arxiv.org/abs/2403.01756v2
- Date: Tue, 5 Mar 2024 15:02:00 GMT
- Title: Attention Guidance Mechanism for Handwritten Mathematical Expression
Recognition
- Authors: Yutian Liu, Wenjun Ke, Jianguo Wei
- Abstract summary: Handwritten mathematical expression recognition (HMER) is challenging in image-to-text tasks due to the complex layouts of mathematical expressions.
We propose an attention guidance mechanism to explicitly suppress attention weights in irrelevant areas and enhance the appropriate ones.
Our method outperforms existing state-of-the-art methods, achieving expression recognition rates of 60.75% / 61.81% / 63.30% on the CROHME 2014/ 2016/ 2019 datasets.
- Score: 20.67011291281534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Handwritten mathematical expression recognition (HMER) is challenging in
image-to-text tasks due to the complex layouts of mathematical expressions and
suffers from problems including over-parsing and under-parsing. To solve these,
previous HMER methods improve the attention mechanism by utilizing historical
alignment information. However, this approach has limitations in addressing
under-parsing since it cannot correct the erroneous attention on image areas
that should be parsed at subsequent decoding steps. This faulty attention
causes the attention module to incorporate future context into the current
decoding step, thereby confusing the alignment process. To address this issue,
we propose an attention guidance mechanism to explicitly suppress attention
weights in irrelevant areas and enhance the appropriate ones, thereby
inhibiting access to information outside the intended context. Depending on the
type of attention guidance, we devise two complementary approaches to refine
attention weights: self-guidance that coordinates attention of multiple heads
and neighbor-guidance that integrates attention from adjacent time steps.
Experiments show that our method outperforms existing state-of-the-art methods,
achieving expression recognition rates of 60.75% / 61.81% / 63.30% on the
CROHME 2014/ 2016/ 2019 datasets.
Related papers
- Elliptical Attention [1.7597562616011944]
Pairwise dot-product self-attention is key to the success of transformers that achieve state-of-the-art performance across a variety of applications in language and vision.
We propose using a Mahalanobis distance metric for computing the attention weights to stretch the underlying feature space in directions of high contextual relevance.
arXiv Detail & Related papers (2024-06-19T18:38:11Z) - Guiding Visual Question Answering with Attention Priors [76.21671164766073]
We propose to guide the attention mechanism using explicit linguistic-visual grounding.
This grounding is derived by connecting structured linguistic concepts in the query to their referents among the visual objects.
The resultant algorithm is capable of probing attention-based reasoning models, injecting relevant associative knowledge, and regulating the core reasoning process.
arXiv Detail & Related papers (2022-05-25T09:53:47Z) - Self-supervised Implicit Glyph Attention for Text Recognition [52.68772018871633]
We propose a novel attention mechanism for scene text recognition (STR) methods, self-supervised implicit glyph attention (SIGA)
SIGA delineates the glyph structures of text images by jointly self-supervised text segmentation and implicit attention alignment.
Experimental results demonstrate that SIGA performs consistently and significantly better than previous attention-based STR methods.
arXiv Detail & Related papers (2022-03-07T13:40:33Z) - Learning to ignore: rethinking attention in CNNs [87.01305532842878]
We propose to reformulate the attention mechanism in CNNs to learn to ignore instead of learning to attend.
Specifically, we propose to explicitly learn irrelevant information in the scene and suppress it in the produced representation.
arXiv Detail & Related papers (2021-11-10T13:47:37Z) - Alignment Attention by Matching Key and Query Distributions [48.93793773929006]
This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head.
It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention.
On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks.
arXiv Detail & Related papers (2021-10-25T00:54:57Z) - Progressively Guide to Attend: An Iterative Alignment Framework for
Temporal Sentence Grounding [53.377028000325424]
We propose an Iterative Alignment Network (IA-Net) for temporal sentence grounding task.
We pad multi-modal features with learnable parameters to alleviate the nowhere-to-attend problem of non-matched frame-word pairs.
We also devise a calibration module following each attention module to refine the alignment knowledge.
arXiv Detail & Related papers (2021-09-14T02:08:23Z) - More Than Just Attention: Learning Cross-Modal Attentions with
Contrastive Constraints [63.08768589044052]
We propose Contrastive Content Re-sourcing ( CCR) and Contrastive Content Swapping ( CCS) constraints to address such limitation.
CCR and CCS constraints supervise the training of attention models in a contrastive learning manner without requiring explicit attention annotations.
Experiments on both Flickr30k and MS-COCO datasets demonstrate that integrating these attention constraints into two state-of-the-art attention-based models improves the model performance.
arXiv Detail & Related papers (2021-05-20T08:48:10Z) - Boost Image Captioning with Knowledge Reasoning [10.733743535624509]
We propose word attention to improve the correctness of visual attention when generating sequential descriptions word-by-word.
We introduce a new strategy to inject external knowledge extracted from knowledge graph into the encoder-decoder framework to facilitate meaningful captioning.
arXiv Detail & Related papers (2020-11-02T12:19:46Z) - Gaussian Constrained Attention Network for Scene Text Recognition [16.485898019983797]
We argue that the existing attention mechanism faces the problem of attention diffusion, in which the model may not focus on a certain character area.
We propose a 2D attention-based method integrated with a novel Gaussian Constrained Refinement Module.
In this way, the attention weights will be more concentrated and the attention-based recognition network achieves better performance.
arXiv Detail & Related papers (2020-10-19T01:55:30Z) - Improving Attention-Based Handwritten Mathematical Expression
Recognition with Scale Augmentation and Drop Attention [35.82648516972362]
Handwritten mathematical expression recognition (HMER) is an important research direction in handwriting recognition.
The performance of HMER suffers from the two-dimensional structure of mathematical expressions (MEs)
We propose a high-performance HMER model with scale augmentation and drop attention.
arXiv Detail & Related papers (2020-07-20T13:35:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.