ConvMath: A Convolutional Sequence Network for Mathematical Expression
Recognition
- URL: http://arxiv.org/abs/2012.12619v1
- Date: Wed, 23 Dec 2020 12:08:18 GMT
- Title: ConvMath: A Convolutional Sequence Network for Mathematical Expression
Recognition
- Authors: Zuoyu Yan, Xiaode Zhang, Liangcai Gao, Ke Yuan and Zhi Tang
- Abstract summary: The performance of ConvMath is evaluated on an open dataset named IM2LATEX-100K, including 103556 samples.
The proposed network achieves state-of-the-art accuracy and much better efficiency than previous methods.
- Score: 11.645568743440087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the recent advances in optical character recognition (OCR),
mathematical expressions still face a great challenge to recognize due to their
two-dimensional graphical layout. In this paper, we propose a convolutional
sequence modeling network, ConvMath, which converts the mathematical expression
description in an image into a LaTeX sequence in an end-to-end way. The network
combines an image encoder for feature extraction and a convolutional decoder
for sequence generation. Compared with other Long Short Term Memory(LSTM) based
encoder-decoder models, ConvMath is entirely based on convolution, thus it is
easy to perform parallel computation. Besides, the network adopts multi-layer
attention mechanism in the decoder, which allows the model to align output
symbols with source feature vectors automatically, and alleviates the problem
of lacking coverage while training the model. The performance of ConvMath is
evaluated on an open dataset named IM2LATEX-100K, including 103556 samples. The
experimental results demonstrate that the proposed network achieves
state-of-the-art accuracy and much better efficiency than previous methods.
Related papers
- PottsMGNet: A Mathematical Explanation of Encoder-Decoder Based Neural
Networks [7.668812831777923]
We study the encoder-decoder-based network architecture from the algorithmic perspective.
We use the two-phase Potts model for image segmentation as an example for our explanations.
We show that the resulting discrete PottsMGNet is equivalent to an encoder-decoder-based network.
arXiv Detail & Related papers (2023-07-18T07:48:48Z) - T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields.
In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting.
Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z) - Closed-Loop Transcription via Convolutional Sparse Coding [29.75613581643052]
Autoencoders often use generic deep networks as the encoder or decoder, which are difficult to interpret.
In this work, we make the explicit assumption that the image distribution is generated from a multistage convolution sparse coding (CSC)
Our method enjoys several side benefits, including more structured and interpretable representations, more stable convergence, and scalability to large datasets.
arXiv Detail & Related papers (2023-02-18T14:40:07Z) - Low PAPR MIMO-OFDM Design Based on Convolutional Autoencoder [20.544993155126967]
A new framework for peak-to-average power ratio ($mathsfPAPR$) reduction and waveform design is presented.
A convolutional-autoencoder ($mathsfCAE$) architecture is presented.
We show that a single trained model covers the tasks of $mathsfPAPR$ reduction, spectrum design, and $mathsfMIMO$ detection together over a wide range of SNR levels.
arXiv Detail & Related papers (2023-01-11T11:35:10Z) - Multi-scale Transformer Network with Edge-aware Pre-training for
Cross-Modality MR Image Synthesis [52.41439725865149]
Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones.
Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective synthesis model.
We propose a Multi-scale Transformer Network (MT-Net) with edge-aware pre-training for cross-modality MR image synthesis.
arXiv Detail & Related papers (2022-12-02T11:40:40Z) - When Counting Meets HMER: Counting-Aware Network for Handwritten
Mathematical Expression Recognition [57.51793420986745]
We propose an unconventional network for handwritten mathematical expression recognition (HMER) named Counting-Aware Network (CAN)
We design a weakly-supervised counting module that can predict the number of each symbol class without the symbol-level position annotations.
Experiments on the benchmark datasets for HMER validate that both joint optimization and counting results are beneficial for correcting the prediction errors of encoder-decoder models.
arXiv Detail & Related papers (2022-07-23T08:39:32Z) - UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
We present UniXcoder, a unified cross-modal pre-trained model for programming language.
We propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree.
We evaluate UniXcoder on five code-related tasks over nine datasets.
arXiv Detail & Related papers (2022-03-08T04:48:07Z) - Syntax-Aware Network for Handwritten Mathematical Expression Recognition [53.130826547287626]
Handwritten mathematical expression recognition (HMER) is a challenging task that has many potential applications.
Recent methods for HMER have achieved outstanding performance with an encoder-decoder architecture.
We propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.
arXiv Detail & Related papers (2022-03-03T09:57:19Z) - Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision.
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z) - EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for
Printed Mathematical Expression Recognition [23.658113675853546]
We propose a new method named E, shorted for encoder-decoder with symbol-level features, to identify the printed mathematical expressions from images.
E has achieved 92.7% and 89.0% in evaluation, which are 3.47% and 4.04% higher than the state-of-the-art method.
arXiv Detail & Related papers (2020-07-06T03:53:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.