When Counting Meets HMER: Counting-Aware Network for Handwritten
Mathematical Expression Recognition
- URL: http://arxiv.org/abs/2207.11463v1
- Date: Sat, 23 Jul 2022 08:39:32 GMT
- Title: When Counting Meets HMER: Counting-Aware Network for Handwritten
Mathematical Expression Recognition
- Authors: Bohan Li, Ye Yuan, Dingkang Liang, Xiao Liu, Zhilong Ji, Jinfeng Bai,
Wenyu Liu, Xiang Bai
- Abstract summary: We propose an unconventional network for handwritten mathematical expression recognition (HMER) named Counting-Aware Network (CAN)
We design a weakly-supervised counting module that can predict the number of each symbol class without the symbol-level position annotations.
Experiments on the benchmark datasets for HMER validate that both joint optimization and counting results are beneficial for correcting the prediction errors of encoder-decoder models.
- Score: 57.51793420986745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, most handwritten mathematical expression recognition (HMER) methods
adopt the encoder-decoder networks, which directly predict the markup sequences
from formula images with the attention mechanism. However, such methods may
fail to accurately read formulas with complicated structure or generate long
markup sequences, as the attention results are often inaccurate due to the
large variance of writing styles or spatial layouts. To alleviate this problem,
we propose an unconventional network for HMER named Counting-Aware Network
(CAN), which jointly optimizes two tasks: HMER and symbol counting.
Specifically, we design a weakly-supervised counting module that can predict
the number of each symbol class without the symbol-level position annotations,
and then plug it into a typical attention-based encoder-decoder model for HMER.
Experiments on the benchmark datasets for HMER validate that both joint
optimization and counting results are beneficial for correcting the prediction
errors of encoder-decoder models, and CAN consistently outperforms the
state-of-the-art methods. In particular, compared with an encoder-decoder model
for HMER, the extra time cost caused by the proposed counting module is
marginal. The source code is available at https://github.com/LBH1024/CAN.
Related papers
- Benchmarking Large Language Models with Integer Sequence Generation Tasks [1.3108652488669736]
This paper presents a novel benchmark where the large language model (LLM) must write code that computes integer sequences from the Online Encyclopedia of Sequences (OEIS)
Our benchmark reveals that the o1 series of models outperform other frontier models from OpenAI, Anthropic, Meta, and Google in accuracy and cheating rates across both easy and hard integer sequences.
arXiv Detail & Related papers (2024-11-07T02:05:43Z) - NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition [80.22784377150465]
Handwritten Mathematical Expression Recognition (HMER) has gained considerable attention in pattern recognition for its diverse applications in document understanding.
This paper makes the first attempt to build a novel bottom-up Non-AutoRegressive Modeling approach for HMER, called NAMER.
NAMER comprises a Visual Aware Tokenizer (VAT) and a Parallel Graph (PGD)
arXiv Detail & Related papers (2024-07-16T04:52:39Z) - PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer [51.260384040953326]
Handwritten Mathematical Expression Recognition (HMER) has wide applications in human-machine interaction scenarios.
We propose a position forest transformer (PosFormer) for HMER, which jointly optimize two tasks: expression recognition and position recognition.
PosFormer consistently outperforms the state-of-the-art methods 2.03%/1.22%/2, 1.83%, and 4.62% gains on datasets.
arXiv Detail & Related papers (2024-07-10T15:42:58Z) - An Intelligent-Detection Network for Handwritten Mathematical Expression
Recognition [0.9790236766474201]
The proposed Intelligent-Detection Network (IDN) for HMER differs from traditional encoder-decoder methods by utilizing object detection techniques.
Specifically, we have developed an enhanced YOLOv7 network that can accurately detect both digital and symbolic objects.
The experiments demonstrate that the proposed method outperforms those encoder-decoder networks in recognizing complex handwritten mathematical expressions.
arXiv Detail & Related papers (2023-11-26T12:01:50Z) - Context Perception Parallel Decoder for Scene Text Recognition [52.620841341333524]
Scene text recognition methods have struggled to attain high accuracy and fast inference speed.
We present an empirical study of AR decoding in STR, and discover that the AR decoder not only models linguistic context, but also provides guidance on visual context perception.
We construct a series of CPPD models and also plug the proposed modules into existing STR decoders. Experiments on both English and Chinese benchmarks demonstrate that the CPPD models achieve highly competitive accuracy while running approximately 8x faster than their AR-based counterparts.
arXiv Detail & Related papers (2023-07-23T09:04:13Z) - DenseBAM-GI: Attention Augmented DeneseNet with momentum aided GRU for
HMER [4.518012967046983]
It is difficult to accurately determine the length and complex spatial relationships among symbols in handwritten mathematical expressions.
In this study, we present a novel encoder-decoder architecture (DenseBAM-GI) for HMER.
The proposed model is an efficient and lightweight architecture with performance equivalent to state-of-the-art models in terms of Expression Recognition Rate (exprate)
arXiv Detail & Related papers (2023-06-28T18:12:23Z) - Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence.
Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text.
Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z) - Syntax-Aware Network for Handwritten Mathematical Expression Recognition [53.130826547287626]
Handwritten mathematical expression recognition (HMER) is a challenging task that has many potential applications.
Recent methods for HMER have achieved outstanding performance with an encoder-decoder architecture.
We propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.
arXiv Detail & Related papers (2022-03-03T09:57:19Z) - ConvMath: A Convolutional Sequence Network for Mathematical Expression
Recognition [11.645568743440087]
The performance of ConvMath is evaluated on an open dataset named IM2LATEX-100K, including 103556 samples.
The proposed network achieves state-of-the-art accuracy and much better efficiency than previous methods.
arXiv Detail & Related papers (2020-12-23T12:08:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.