Handwritten Mathematical Expression Recognition via Attention
Aggregation based Bi-directional Mutual Learning
- URL: http://arxiv.org/abs/2112.03603v1
- Date: Tue, 7 Dec 2021 09:53:40 GMT
- Title: Handwritten Mathematical Expression Recognition via Attention
Aggregation based Bi-directional Mutual Learning
- Authors: Xiaohang Bian, Bo Qin, Xiaozhe Xin, Jianwu Li, Xuefeng Su, Yanfeng
Wang
- Abstract summary: We propose an Attention aggregation based Bi-directional Mutual learning Network (ABM)
In the inference phase, given that the model already learns knowledge from two inverse directions, we only use the L2R branch for inference.
Our proposed approach achieves the recognition accuracy of 56.85 % on CROHME 2014, 52.92 % on CROHME 2016, and 53.96 % on CROHME 2019 without data augmentation and model ensembling.
- Score: 13.696706205837234
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Handwritten mathematical expression recognition aims to automatically
generate LaTeX sequences from given images. Currently, attention-based
encoder-decoder models are widely used in this task. They typically generate
target sequences in a left-to-right (L2R) manner, leaving the right-to-left
(R2L) contexts unexploited. In this paper, we propose an Attention aggregation
based Bi-directional Mutual learning Network (ABM) which consists of one shared
encoder and two parallel inverse decoders (L2R and R2L). The two decoders are
enhanced via mutual distillation, which involves one-to-one knowledge transfer
at each training step, making full use of the complementary information from
two inverse directions. Moreover, in order to deal with mathematical symbols in
diverse scales, an Attention Aggregation Module (AAM) is proposed to
effectively integrate multi-scale coverage attentions. Notably, in the
inference phase, given that the model already learns knowledge from two inverse
directions, we only use the L2R branch for inference, keeping the original
parameter size and inference speed. Extensive experiments demonstrate that our
proposed approach achieves the recognition accuracy of 56.85 % on CROHME 2014,
52.92 % on CROHME 2016, and 53.96 % on CROHME 2019 without data augmentation
and model ensembling, substantially outperforming the state-of-the-art methods.
The source code is available in the supplementary materials.
Related papers
- Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching [53.05954114863596]
We propose a brand-new Deep Boosting Learning (DBL) algorithm for image-text matching.
An anchor branch is first trained to provide insights into the data properties.
A target branch is concurrently tasked with more adaptive margin constraints to further enlarge the relative distance between matched and unmatched samples.
arXiv Detail & Related papers (2024-04-28T08:44:28Z) - Bidirectional Trained Tree-Structured Decoder for Handwritten
Mathematical Expression Recognition [51.66383337087724]
The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR.
Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models.
We propose the Mirror-Flipped Symbol Layout Tree (MF-SLT) and Bidirectional Asynchronous Training (BAT) structure.
arXiv Detail & Related papers (2023-12-31T09:24:21Z) - Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based
Image Retrieval [69.46139774646308]
This paper studies the problem of zero-shot sketch-based image retrieval (ZS-SBIR)
It aims to use sketches from unseen categories as queries to match the images of the same category.
We propose a novel Symmetrical Bidirectional Knowledge Alignment for zero-shot sketch-based image retrieval (SBKA)
arXiv Detail & Related papers (2023-12-16T04:50:34Z) - Enhancing Motor Imagery Decoding in Brain Computer Interfaces using
Riemann Tangent Space Mapping and Cross Frequency Coupling [5.860347939369221]
Motor Imagery (MI) serves as a crucial experimental paradigm within the realm of Brain Computer Interfaces (BCIs)
This paper introduces a novel approach to enhance the representation quality and decoding capability pertaining to MI features.
A lightweight convolutional neural network is employed for further feature extraction and classification, operating under the joint supervision of cross-entropy and center loss.
arXiv Detail & Related papers (2023-10-29T23:37:47Z) - Handwritten Mathematical Expression Recognition with Bidirectionally
Trained Transformer [2.952085248753861]
A transformer-decoder decoder is employed to replace RNN-based ones.
Experiments demonstrate that our model improves the ExpRate of current state-of-the-art methods on CROHME 2014 by 2.23%.
arXiv Detail & Related papers (2021-05-06T03:11:54Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Attentive WaveBlock: Complementarity-enhanced Mutual Networks for
Unsupervised Domain Adaptation in Person Re-identification and Beyond [97.25179345878443]
This paper proposes a novel light-weight module, the Attentive WaveBlock (AWB)
AWB can be integrated into the dual networks of mutual learning to enhance the complementarity and further depress noise in the pseudo-labels.
Experiments demonstrate that the proposed method achieves state-of-the-art performance with significant improvements on multiple UDA person re-identification tasks.
arXiv Detail & Related papers (2020-06-11T15:40:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.