Related papers: ConvMath: A Convolutional Sequence Network for Mathematical Expression Recognition

ConvMath: A Convolutional Sequence Network for Mathematical Expression Recognition

URL: http://arxiv.org/abs/2012.12619v1
Date: Wed, 23 Dec 2020 12:08:18 GMT
Title: ConvMath: A Convolutional Sequence Network for Mathematical Expression Recognition
Authors: Zuoyu Yan, Xiaode Zhang, Liangcai Gao, Ke Yuan and Zhi Tang
Abstract summary: The performance of ConvMath is evaluated on an open dataset named IM2LATEX-100K, including 103556 samples. The proposed network achieves state-of-the-art accuracy and much better efficiency than previous methods.
Score: 11.645568743440087
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the recent advances in optical character recognition (OCR), mathematical expressions still face a great challenge to recognize due to their two-dimensional graphical layout. In this paper, we propose a convolutional sequence modeling network, ConvMath, which converts the mathematical expression description in an image into a LaTeX sequence in an end-to-end way. The network combines an image encoder for feature extraction and a convolutional decoder for sequence generation. Compared with other Long Short Term Memory(LSTM) based encoder-decoder models, ConvMath is entirely based on convolution, thus it is easy to perform parallel computation. Besides, the network adopts multi-layer attention mechanism in the decoder, which allows the model to align output symbols with source feature vectors automatically, and alleviates the problem of lacking coverage while training the model. The performance of ConvMath is evaluated on an open dataset named IM2LATEX-100K, including 103556 samples. The experimental results demonstrate that the proposed network achieves state-of-the-art accuracy and much better efficiency than previous methods.

Related papers

Image Coding for Machines via Feature-Preserving Rate-Distortion Optimization [27.97760974010369]
We show an approach to reduce the effect of compression on a task loss using the distance between features as a distortion metric. We simplify the RDO formulation to make the distortion term computable using block-based encoders. We show up to 10% bit-rate savings for the same computer vision accuracy compared to RDO based on SSE.
arXiv Detail & Related papers (2025-04-03T02:11:26Z)
GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting [64.84383010238908]
We propose an effective image tokenizer with 2D Gaussian Splatting as a solution. In general, our framework integrates the local influence of 2D Gaussian distribution into the discrete space. Competitive reconstruction performances on CIFAR, Mini-Net, and ImageNet-1K demonstrate the effectiveness of our framework.
arXiv Detail & Related papers (2025-01-26T17:56:11Z)
PottsMGNet: A Mathematical Explanation of Encoder-Decoder Based Neural Networks [7.668812831777923]
We study the encoder-decoder-based network architecture from the algorithmic perspective. We use the two-phase Potts model for image segmentation as an example for our explanations. We show that the resulting discrete PottsMGNet is equivalent to an encoder-decoder-based network.
arXiv Detail & Related papers (2023-07-18T07:48:48Z)
T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields. In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting. Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z)
Closed-Loop Transcription via Convolutional Sparse Coding [29.75613581643052]
Autoencoders often use generic deep networks as the encoder or decoder, which are difficult to interpret. In this work, we make the explicit assumption that the image distribution is generated from a multistage convolution sparse coding (CSC) Our method enjoys several side benefits, including more structured and interpretable representations, more stable convergence, and scalability to large datasets.
arXiv Detail & Related papers (2023-02-18T14:40:07Z)
Low PAPR MIMO-OFDM Design Based on Convolutional Autoencoder [20.544993155126967]
A new framework for peak-to-average power ratio ($mathsfPAPR$) reduction and waveform design is presented. A convolutional-autoencoder ($mathsfCAE$) architecture is presented. We show that a single trained model covers the tasks of $mathsfPAPR$ reduction, spectrum design, and $mathsfMIMO$ detection together over a wide range of SNR levels.
arXiv Detail & Related papers (2023-01-11T11:35:10Z)
Multi-scale Transformer Network with Edge-aware Pre-training for Cross-Modality MR Image Synthesis [52.41439725865149]
Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones. Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective synthesis model. We propose a Multi-scale Transformer Network (MT-Net) with edge-aware pre-training for cross-modality MR image synthesis.
arXiv Detail & Related papers (2022-12-02T11:40:40Z)
When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition [57.51793420986745]
We propose an unconventional network for handwritten mathematical expression recognition (HMER) named Counting-Aware Network (CAN) We design a weakly-supervised counting module that can predict the number of each symbol class without the symbol-level position annotations. Experiments on the benchmark datasets for HMER validate that both joint optimization and counting results are beneficial for correcting the prediction errors of encoder-decoder models.
arXiv Detail & Related papers (2022-07-23T08:39:32Z)
UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
We present UniXcoder, a unified cross-modal pre-trained model for programming language. We propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree. We evaluate UniXcoder on five code-related tasks over nine datasets.
arXiv Detail & Related papers (2022-03-08T04:48:07Z)
Syntax-Aware Network for Handwritten Mathematical Expression Recognition [53.130826547287626]
Handwritten mathematical expression recognition (HMER) is a challenging task that has many potential applications. Recent methods for HMER have achieved outstanding performance with an encoder-decoder architecture. We propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.
arXiv Detail & Related papers (2022-03-03T09:57:19Z)
Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z)
EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for Printed Mathematical Expression Recognition [23.658113675853546]
We propose a new method named E, shorted for encoder-decoder with symbol-level features, to identify the printed mathematical expressions from images. E has achieved 92.7% and 89.0% in evaluation, which are 3.47% and 4.04% higher than the state-of-the-art method.
arXiv Detail & Related papers (2020-07-06T03:53:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.