Bidirectional Trained Tree-Structured Decoder for Handwritten
Mathematical Expression Recognition
- URL: http://arxiv.org/abs/2401.00435v1
- Date: Sun, 31 Dec 2023 09:24:21 GMT
- Title: Bidirectional Trained Tree-Structured Decoder for Handwritten
Mathematical Expression Recognition
- Authors: Hanbo Cheng, Chenyu Liu, Pengfei Hu, Zhenrong Zhang, Jiefeng Ma, Jun
Du
- Abstract summary: The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR.
Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models.
We propose the Mirror-Flipped Symbol Layout Tree (MF-SLT) and Bidirectional Asynchronous Training (BAT) structure.
- Score: 51.66383337087724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Handwritten Mathematical Expression Recognition (HMER) task is a critical
branch in the field of OCR. Recent studies have demonstrated that incorporating
bidirectional context information significantly improves the performance of
HMER models. However, existing methods fail to effectively utilize
bidirectional context information during the inference stage. Furthermore,
current bidirectional training methods are primarily designed for string
decoders and cannot adequately generalize to tree decoders, which offer
superior generalization capabilities and structural analysis capacity. In order
to overcome these limitations, we propose the Mirror-Flipped Symbol Layout Tree
(MF-SLT) and Bidirectional Asynchronous Training (BAT) structure. Our method
extends the bidirectional training strategy to the tree decoder, allowing for
more effective training by leveraging bidirectional information. Additionally,
we analyze the impact of the visual and linguistic perception of the HMER model
separately and introduce the Shared Language Modeling (SLM) mechanism. Through
the SLM, we enhance the model's robustness and generalization when dealing with
visual ambiguity, particularly in scenarios with abundant training data. Our
approach has been validated through extensive experiments, demonstrating its
ability to achieve new state-of-the-art results on the CROHME 2014, 2016, and
2019 datasets, as well as the HME100K dataset. The code used in our experiments
will be publicly available.
Related papers
- Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution [49.762034744605955]
We propose a multi-modal information bottleneck approach to improve interpretability of vision-language models.
We demonstrate how M2IB can be applied to attribution analysis of vision-language pretrained models.
arXiv Detail & Related papers (2023-12-28T18:02:22Z) - Unifying Structure and Language Semantic for Efficient Contrastive
Knowledge Graph Completion with Structured Entity Anchors [0.3913403111891026]
The goal of knowledge graph completion (KGC) is to predict missing links in a KG using trained facts that are already known.
We propose a novel method to effectively unify structure information and language semantics without losing the power of inductive reasoning.
arXiv Detail & Related papers (2023-11-07T11:17:55Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality.
Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z) - Handwritten Mathematical Expression Recognition via Attention
Aggregation based Bi-directional Mutual Learning [13.696706205837234]
We propose an Attention aggregation based Bi-directional Mutual learning Network (ABM)
In the inference phase, given that the model already learns knowledge from two inverse directions, we only use the L2R branch for inference.
Our proposed approach achieves the recognition accuracy of 56.85 % on CROHME 2014, 52.92 % on CROHME 2016, and 53.96 % on CROHME 2019 without data augmentation and model ensembling.
arXiv Detail & Related papers (2021-12-07T09:53:40Z) - Incorporating Linguistic Knowledge for Abstractive Multi-document
Summarization [20.572283625521784]
We develop a neural network based abstractive multi-document summarization (MDS) model.
We process the dependency information into the linguistic-guided attention mechanism.
With the help of linguistic signals, sentence-level relations can be correctly captured.
arXiv Detail & Related papers (2021-09-23T08:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.