EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for
Printed Mathematical Expression Recognition
- URL: http://arxiv.org/abs/2007.02517v1
- Date: Mon, 6 Jul 2020 03:53:52 GMT
- Title: EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for
Printed Mathematical Expression Recognition
- Authors: Yingnan Fu, Tingting Liu, Ming Gao, Aoying Zhou
- Abstract summary: We propose a new method named E, shorted for encoder-decoder with symbol-level features, to identify the printed mathematical expressions from images.
E has achieved 92.7% and 89.0% in evaluation, which are 3.47% and 4.04% higher than the state-of-the-art method.
- Score: 23.658113675853546
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Printed Mathematical expression recognition (PMER) aims to transcribe a
printed mathematical expression image into a structural expression, such as
LaTeX expression. It is a crucial task for many applications, including
automatic question recommendation, automatic problem solving and analysis of
the students, etc. Currently, the mainstream solutions rely on solving image
captioning tasks, all addressing image summarization. As such, these methods
can be suboptimal for solving MER problem.
In this paper, we propose a new method named EDSL, shorted for
encoder-decoder with symbol-level features, to identify the printed
mathematical expressions from images. The symbol-level image encoder of EDSL
consists of segmentation module and reconstruction module. By performing
segmentation module, we identify all the symbols and their spatial information
from images in an unsupervised manner. We then design a novel reconstruction
module to recover the symbol dependencies after symbol segmentation.
Especially, we employ a position correction attention mechanism to capture the
spatial relationships between symbols. To alleviate the negative impact from
long output, we apply the transformer model for transcribing the encoded image
into the sequential and structural output. We conduct extensive experiments on
two real datasets to verify the effectiveness and rationality of our proposed
EDSL method. The experimental results have illustrated that EDSL has achieved
92.7\% and 89.0\% in evaluation metric Match, which are 3.47\% and 4.04\%
higher than the state-of-the-art method. Our code and datasets are available at
https://github.com/abcAnonymous/EDSL .
Related papers
- PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer [51.260384040953326]
Handwritten Mathematical Expression Recognition (HMER) has wide applications in human-machine interaction scenarios.
We propose a position forest transformer (PosFormer) for HMER, which jointly optimize two tasks: expression recognition and position recognition.
PosFormer consistently outperforms the state-of-the-art methods 2.03%/1.22%/2, 1.83%, and 4.62% gains on datasets.
arXiv Detail & Related papers (2024-07-10T15:42:58Z) - Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation [81.45400849638347]
In-image machine translation (IIMT) aims to translate an image containing texts in source language into an image containing translations in target language.
In this paper, we propose an end-to-end IIMT model consisting of four modules.
Our model achieves competitive performance compared to cascaded models with only 70.9% of parameters, and significantly outperforms the pixel-level end-to-end IIMT model.
arXiv Detail & Related papers (2024-07-03T08:15:39Z) - Composing Object Relations and Attributes for Image-Text Matching [70.47747937665987]
This work introduces a dual-encoder image-text matching model, leveraging a scene graph to represent captions with nodes for objects and attributes interconnected by relational edges.
Our model efficiently encodes object-attribute and object-object semantic relations, resulting in a robust and fast-performing system.
arXiv Detail & Related papers (2024-06-17T17:56:01Z) - ICAL: Implicit Character-Aided Learning for Enhanced Handwritten Mathematical Expression Recognition [9.389169879626428]
This paper introduces a novel approach, Implicit Character-Aided Learning (ICAL), to mine the global expression information.
By modeling and utilizing implicit character information, ICAL achieves a more accurate and context-aware interpretation of handwritten mathematical expressions.
arXiv Detail & Related papers (2024-05-15T02:03:44Z) - T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields.
In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting.
Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z) - Generalized Decoding for Pixel, Image, and Language [197.85760901840177]
We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly.
X-Decoder is the first work that provides a unified way to support all types of image segmentation and a variety of vision-language (VL) tasks.
arXiv Detail & Related papers (2022-12-21T18:58:41Z) - Accurate Image Restoration with Attention Retractable Transformer [50.05204240159985]
We propose Attention Retractable Transformer (ART) for image restoration.
ART presents both dense and sparse attention modules in the network.
We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks.
arXiv Detail & Related papers (2022-10-04T07:35:01Z) - Offline Handwritten Mathematical Recognition using Adversarial Learning
and Transformers [3.9220281834178463]
offline HMER is often viewed as a much harder problem as compared to online HMER.
In this paper, we purpose a encoder-decoder model that uses paired adversarial learning.
We have been able to improve latest CROHME 2019 test set results by 4% approx.
arXiv Detail & Related papers (2022-08-20T11:45:02Z) - Diffusion Autoencoders: Toward a Meaningful and Decodable Representation [1.471992435706872]
Diffusion models (DPMs) have achieved remarkable quality in image generation that rivals GANs'.
Unlike GANs, DPMs use a set of latent variables that lack semantic meaning and cannot serve as a useful representation for other tasks.
This paper explores the possibility of using DPMs for representation learning and seeks to extract a meaningful and decodable representation of an input image via autoencoding.
arXiv Detail & Related papers (2021-11-30T18:24:04Z) - ConvMath: A Convolutional Sequence Network for Mathematical Expression
Recognition [11.645568743440087]
The performance of ConvMath is evaluated on an open dataset named IM2LATEX-100K, including 103556 samples.
The proposed network achieves state-of-the-art accuracy and much better efficiency than previous methods.
arXiv Detail & Related papers (2020-12-23T12:08:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.