Semantic Graph Representation Learning for Handwritten Mathematical
Expression Recognition
- URL: http://arxiv.org/abs/2308.10493v1
- Date: Mon, 21 Aug 2023 06:23:41 GMT
- Title: Semantic Graph Representation Learning for Handwritten Mathematical
Expression Recognition
- Authors: Zhuang Liu and Ye Yuan and Zhilong Ji and Jingfeng Bai and Xiang Bai
- Abstract summary: We propose a simple but efficient method to enhance semantic interaction learning (SIL)
We first construct a semantic graph based on the statistical symbol co-occurrence probabilities.
Then we design a semantic aware module (SAM), which projects the visual and classification feature into semantic space.
Our method achieves better recognition performance than prior arts on both CROHME and HME100K datasets.
- Score: 57.60390958736775
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Handwritten mathematical expression recognition (HMER) has attracted
extensive attention recently. However, current methods cannot explicitly study
the interactions between different symbols, which may fail when faced similar
symbols. To alleviate this issue, we propose a simple but efficient method to
enhance semantic interaction learning (SIL). Specifically, we firstly construct
a semantic graph based on the statistical symbol co-occurrence probabilities.
Then we design a semantic aware module (SAM), which projects the visual and
classification feature into semantic space. The cosine distance between
different projected vectors indicates the correlation between symbols. And
jointly optimizing HMER and SIL can explicitly enhances the model's
understanding of symbol relationships. In addition, SAM can be easily plugged
into existing attention-based models for HMER and consistently bring
improvement. Extensive experiments on public benchmark datasets demonstrate
that our proposed module can effectively enhance the recognition performance.
Our method achieves better recognition performance than prior arts on both
CROHME and HME100K datasets.
Related papers
- Semantic-aware Representation Learning for Homography Estimation [28.70450397793246]
We propose SRMatcher, a detector-free feature matching method, which encourages the network to learn integrated semantic feature representation.
By reducing errors stemming from semantic inconsistencies in matching pairs, our proposed SRMatcher is able to deliver more accurate and realistic outcomes.
arXiv Detail & Related papers (2024-07-18T08:36:28Z) - Dual Relation Mining Network for Zero-Shot Learning [48.89161627050706]
We propose a Dual Relation Mining Network (DRMN) to enable effective visual-semantic interactions and learn semantic relationship among attributes for knowledge transfer.
Specifically, we introduce a Dual Attention Block (DAB) for visual-semantic relationship mining, which enriches visual information by multi-level feature fusion.
For semantic relationship modeling, we utilize a Semantic Interaction Transformer (SIT) to enhance the generalization of attribute representations among images.
arXiv Detail & Related papers (2024-05-06T16:31:19Z) - Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised
Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps.
We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z) - Improving Deep Representation Learning via Auxiliary Learnable Target Coding [69.79343510578877]
This paper introduces a novel learnable target coding as an auxiliary regularization of deep representation learning.
Specifically, a margin-based triplet loss and a correlation consistency loss on the proposed target codes are designed to encourage more discriminative representations.
arXiv Detail & Related papers (2023-05-30T01:38:54Z) - Cross-modal Representation Learning for Zero-shot Action Recognition [67.57406812235767]
We present a cross-modal Transformer-based framework, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR)
Our model employs a conceptually new pipeline by which visual representations are learned in conjunction with visual-semantic associations in an end-to-end manner.
Experiment results show our model considerably improves upon the state of the arts in ZSAR, reaching encouraging top-1 accuracy on UCF101, HMDB51, and ActivityNet benchmark datasets.
arXiv Detail & Related papers (2022-05-03T17:39:27Z) - Learning with Holographic Reduced Representations [28.462635977110413]
Holographic Reduced Representations (HRR) are a method for performing symbolic AI on top of real-valued vectors.
This paper revisits this approach to understand if it is viable for enabling a hybrid neural-symbolic approach to learning.
arXiv Detail & Related papers (2021-09-05T19:37:34Z) - Imposing Relation Structure in Language-Model Embeddings Using
Contrastive Learning [30.00047118880045]
We propose a novel contrastive learning framework that trains sentence embeddings to encode the relations in a graph structure.
The resulting relation-aware sentence embeddings achieve state-of-the-art results on the relation extraction task.
arXiv Detail & Related papers (2021-09-02T10:58:27Z) - Joint Graph Learning and Matching for Semantic Feature Correspondence [69.71998282148762]
We propose a joint emphgraph learning and matching network, named GLAM, to explore reliable graph structures for boosting graph matching.
The proposed method is evaluated on three popular visual matching benchmarks (Pascal VOC, Willow Object and SPair-71k)
It outperforms previous state-of-the-art graph matching methods by significant margins on all benchmarks.
arXiv Detail & Related papers (2021-09-01T08:24:02Z) - Facial Action Unit Intensity Estimation via Semantic Correspondence
Learning with Dynamic Graph Convolution [27.48620879003556]
We present a new learning framework that automatically learns the latent relationships of AUs via establishing semantic correspondences between feature maps.
In the heatmap regression-based network, feature maps preserve rich semantic information associated with AU intensities and locations.
This motivates us to model the correlation among feature channels, which implicitly represents the co-occurrence relationship of AU intensity levels.
arXiv Detail & Related papers (2020-04-20T23:55:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.