CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical
Expression Recognition
- URL: http://arxiv.org/abs/2207.04410v1
- Date: Sun, 10 Jul 2022 07:59:23 GMT
- Title: CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical
Expression Recognition
- Authors: Wenqi Zhao, Liangcai Gao
- Abstract summary: Transformer-based encoder-decoder architecture has recently made significant advances in recognizing handwritten mathematical expressions.
Coverage information, which records the alignment information of the past steps, has proven effective in the RNN models.
We propose CoMER, a model that adopts the coverage information in the transformer decoder.
- Score: 4.812445272764651
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Transformer-based encoder-decoder architecture has recently made
significant advances in recognizing handwritten mathematical expressions.
However, the transformer model still suffers from the lack of coverage problem,
making its expression recognition rate (ExpRate) inferior to its RNN
counterpart. Coverage information, which records the alignment information of
the past steps, has proven effective in the RNN models. In this paper, we
propose CoMER, a model that adopts the coverage information in the transformer
decoder. Specifically, we propose a novel Attention Refinement Module (ARM) to
refine the attention weights with past alignment information without hurting
its parallelism. Furthermore, we take coverage information to the extreme by
proposing self-coverage and cross-coverage, which utilize the past alignment
information from the current and previous layers. Experiments show that CoMER
improves the ExpRate by 0.61%/2.09%/1.59% compared to the current
state-of-the-art model, and reaches 59.33%/59.81%/62.97% on the CROHME
2014/2016/2019 test sets.
Related papers
- SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation [53.675725490807615]
We introduce SDPose, a new self-distillation method for improving the performance of small transformer-based models.
SDPose-T obtains 69.7% mAP with 4.4M parameters and 1.8 GFLOPs, while SDPose-S-V2 obtains 73.5% mAP on the MSCOCO validation dataset.
arXiv Detail & Related papers (2024-04-04T15:23:14Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - Robust representations of oil wells' intervals via sparse attention
mechanism [2.604557228169423]
We introduce the class of efficient Transformers named Regularized Transformers (Reguformers)
The focus in our experiments is on oil&gas data, namely, well logs.
To evaluate our models for such problems, we work with an industry-scale open dataset consisting of well logs of more than 20 wells.
arXiv Detail & Related papers (2022-12-29T09:56:33Z) - Laplacian Autoencoders for Learning Stochastic Representations [0.6999740786886537]
We present a Bayesian autoencoder for unsupervised representation learning, which is trained using a novel variational lower-bound of the autoencoder evidence.
We show that our Laplacian autoencoder estimates well-calibrated uncertainties in both latent and output space.
arXiv Detail & Related papers (2022-06-30T07:23:16Z) - DepthFormer: Exploiting Long-Range Correlation and Local Information for
Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation.
We propose to leverage the Transformer to model this global context with an effective attention mechanism.
Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z) - Handwritten Mathematical Expression Recognition with Bidirectionally
Trained Transformer [2.952085248753861]
A transformer-decoder decoder is employed to replace RNN-based ones.
Experiments demonstrate that our model improves the ExpRate of current state-of-the-art methods on CROHME 2014 by 2.23%.
arXiv Detail & Related papers (2021-05-06T03:11:54Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.