Offline Handwritten Mathematical Recognition using Adversarial Learning
and Transformers
- URL: http://arxiv.org/abs/2208.09662v1
- Date: Sat, 20 Aug 2022 11:45:02 GMT
- Title: Offline Handwritten Mathematical Recognition using Adversarial Learning
and Transformers
- Authors: Ujjwal Thakur and Anuj Sharma
- Abstract summary: offline HMER is often viewed as a much harder problem as compared to online HMER.
In this paper, we purpose a encoder-decoder model that uses paired adversarial learning.
We have been able to improve latest CROHME 2019 test set results by 4% approx.
- Score: 3.9220281834178463
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Offline Handwritten Mathematical Expression Recognition (HMER) is a major
area in the field of mathematical expression recognition. Offline HMER is often
viewed as a much harder problem as compared to online HMER due to a lack of
temporal information and variability in writing style. In this paper, we
purpose a encoder-decoder model that uses paired adversarial learning.
Semantic-invariant features are extracted from handwritten mathematical
expression images and their printed mathematical expression counterpart in the
encoder. Learning of semantic-invariant features combined with the DenseNet
encoder and transformer decoder, helped us to improve the expression rate from
previous studies. Evaluated on the CROHME dataset, we have been able to improve
latest CROHME 2019 test set results by 4% approx.
Related papers
- PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer [51.260384040953326]
Handwritten Mathematical Expression Recognition (HMER) has wide applications in human-machine interaction scenarios.
We propose a position forest transformer (PosFormer) for HMER, which jointly optimize two tasks: expression recognition and position recognition.
PosFormer consistently outperforms the state-of-the-art methods 2.03%/1.22%/2, 1.83%, and 4.62% gains on datasets.
arXiv Detail & Related papers (2024-07-10T15:42:58Z) - ICAL: Implicit Character-Aided Learning for Enhanced Handwritten Mathematical Expression Recognition [9.389169879626428]
This paper introduces a novel approach, Implicit Character-Aided Learning (ICAL), to mine the global expression information.
By modeling and utilizing implicit character information, ICAL achieves a more accurate and context-aware interpretation of handwritten mathematical expressions.
arXiv Detail & Related papers (2024-05-15T02:03:44Z) - An Intelligent-Detection Network for Handwritten Mathematical Expression
Recognition [0.9790236766474201]
The proposed Intelligent-Detection Network (IDN) for HMER differs from traditional encoder-decoder methods by utilizing object detection techniques.
Specifically, we have developed an enhanced YOLOv7 network that can accurately detect both digital and symbolic objects.
The experiments demonstrate that the proposed method outperforms those encoder-decoder networks in recognizing complex handwritten mathematical expressions.
arXiv Detail & Related papers (2023-11-26T12:01:50Z) - A Transformer Architecture for Online Gesture Recognition of
Mathematical Expressions [0.0]
Transformer architecture is shown to provide an end-to-end model for building expression trees from online handwritten gestures corresponding to glyph strokes.
The attention mechanism was successfully used to encode, learn and enforce the underlying syntax of expressions.
For the first time, the encoder is fed with unseen online-temporal data tokens potentially forming an infinitely large vocabulary.
arXiv Detail & Related papers (2022-11-04T17:55:55Z) - ConTextual Mask Auto-Encoder for Dense Passage Retrieval [49.49460769701308]
CoT-MAE is a simple yet effective generative pre-training method for dense passage retrieval.
It learns to compress the sentence semantics into a dense vector through self-supervised and context-supervised masked auto-encoding.
We conduct experiments on large-scale passage retrieval benchmarks and show considerable improvements over strong baselines.
arXiv Detail & Related papers (2022-08-16T11:17:22Z) - When Counting Meets HMER: Counting-Aware Network for Handwritten
Mathematical Expression Recognition [57.51793420986745]
We propose an unconventional network for handwritten mathematical expression recognition (HMER) named Counting-Aware Network (CAN)
We design a weakly-supervised counting module that can predict the number of each symbol class without the symbol-level position annotations.
Experiments on the benchmark datasets for HMER validate that both joint optimization and counting results are beneficial for correcting the prediction errors of encoder-decoder models.
arXiv Detail & Related papers (2022-07-23T08:39:32Z) - Syntax-Aware Network for Handwritten Mathematical Expression Recognition [53.130826547287626]
Handwritten mathematical expression recognition (HMER) is a challenging task that has many potential applications.
Recent methods for HMER have achieved outstanding performance with an encoder-decoder architecture.
We propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.
arXiv Detail & Related papers (2022-03-03T09:57:19Z) - A Transformer-based Math Language Model for Handwritten Math Expression
Recognition [7.202733269706245]
Math symbols are very similar in the writing style, such as dot and comma or 0, O, and o.
This paper presents a Transformer-based Math Language Model (TMLM)
TMLM achieved the perplexity of 4.42, which outperformed the previous math language models.
arXiv Detail & Related papers (2021-08-11T03:03:48Z) - TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval [103.85002875155551]
We propose a novel generalized distillation method, TeachText, for exploiting large-scale language pretraining.
We extend our method to video side modalities and show that we can effectively reduce the number of used modalities at test time.
Our approach advances the state of the art on several video retrieval benchmarks by a significant margin and adds no computational overhead at test time.
arXiv Detail & Related papers (2021-04-16T17:55:28Z) - Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding [80.3811072650087]
We study natural language watermarking as a defense to help better mark and trace the provenance of text.
We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training.
AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations.
arXiv Detail & Related papers (2020-09-07T11:01:24Z) - EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for
Printed Mathematical Expression Recognition [23.658113675853546]
We propose a new method named E, shorted for encoder-decoder with symbol-level features, to identify the printed mathematical expressions from images.
E has achieved 92.7% and 89.0% in evaluation, which are 3.47% and 4.04% higher than the state-of-the-art method.
arXiv Detail & Related papers (2020-07-06T03:53:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.