Attention-based Fully Gated CNN-BGRU for Russian Handwritten Text
- URL: http://arxiv.org/abs/2008.05373v5
- Date: Thu, 20 Aug 2020 13:59:37 GMT
- Title: Attention-based Fully Gated CNN-BGRU for Russian Handwritten Text
- Authors: Abdelrahman Abdallah, Mohamed Hamada and Daniyar Nurseitov
- Abstract summary: This research approaches the task of handwritten text with attention encoder-decoder networks that are trained on Kazakh and Russian language.
We developed a novel deep neural network model based on Fully Gated CNN, supported by Multiple bidirectional GRU and Attention mechanisms.
Our research is the first work on the HKR dataset and demonstrates state-of-the-art results to most of the other existing models.
- Score: 0.5371337604556311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research approaches the task of handwritten text with attention
encoder-decoder networks that are trained on Kazakh and Russian language. We
developed a novel deep neural network model based on Fully Gated CNN, supported
by Multiple bidirectional GRU and Attention mechanisms to manipulate
sophisticated features that achieve 0.045 Character Error Rate (CER), 0.192
Word Error Rate (WER) and 0.253 Sequence Error Rate (SER) for the first test
dataset and 0.064 CER, 0.24 WER and 0.361 SER for the second test dataset.
Also, we propose fully gated layers by taking the advantage of multiple the
output feature from Tahn and input feature, this proposed work achieves better
results and We experimented with our model on the Handwritten Kazakh & Russian
Database (HKR). Our research is the first work on the HKR dataset and
demonstrates state-of-the-art results to most of the other existing models.
Related papers
- Classification of Non-native Handwritten Characters Using Convolutional Neural Network [0.0]
The classification of English characters written by non-native users is performed by proposing a custom-tailored CNN model.
We train this CNN with a new dataset called the handwritten isolated English character dataset.
The proposed model with five convolutional layers and one hidden layer outperforms state-of-the-art models in terms of character recognition accuracy.
arXiv Detail & Related papers (2024-06-06T21:08:07Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - AttentionHTR: Handwritten Text Recognition Based on Attention
Encoder-Decoder Networks [0.0]
This work proposes an attention-based sequence-to-sequence model for handwritten word recognition.
It exploits models pre-trained on scene text images as a starting point towards tailoring the handwriting recognition models.
The effectiveness of the proposed end-to-end HTR system has been empirically evaluated on a novel multi-writer dataset.
arXiv Detail & Related papers (2022-01-23T22:48:36Z) - Detecting Handwritten Mathematical Terms with Sensor Based Data [71.84852429039881]
We propose a solution to the UbiComp 2021 Challenge by Stabilo in which handwritten mathematical terms are supposed to be automatically classified.
The input data set contains data of different writers, with label strings constructed from a total of 15 different possible characters.
arXiv Detail & Related papers (2021-09-12T19:33:34Z) - Classification of Handwritten Names of Cities and Handwritten Text
Recognition using Various Deep Learning Models [0.0]
We have tried to describe various approaches and achievements of recent years in the development of handwritten recognition models.
The first model uses deep convolutional neural networks (CNNs) for feature extraction and a fully connected multilayer perceptron neural network (MLP) for word classification.
The second model, called SimpleHTR, uses CNN and recurrent neural network (RNN) layers to extract information from images.
arXiv Detail & Related papers (2021-02-09T13:34:16Z) - Train your classifier first: Cascade Neural Networks Training from upper
layers to lower layers [54.47911829539919]
We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers.
We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks.
The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
arXiv Detail & Related papers (2021-02-09T08:19:49Z) - TextGNN: Improving Text Encoder via Graph Neural Network in Sponsored
Search [11.203006652211075]
We propose a TextGNN model that naturally extends the strong twin tower structured encoders with the complementary graph information from user historical behaviors.
In offline experiments, the model achieves a 0.14% overall increase in ROC-AUC with a 1% increased accuracy for long-tail low-frequency Ads.
In online A/B testing, the model shows a 2.03% increase in Revenue Per Mille with a 2.32% decrease in Ad defect rate.
arXiv Detail & Related papers (2021-01-15T23:12:47Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z) - Accuracy Prediction with Non-neural Model for Neural Architecture Search [185.0651567642238]
We study an alternative approach which uses non-neural model for accuracy prediction.
We leverage gradient boosting decision tree (GBDT) as the predictor for Neural architecture search (NAS)
Experiments on NASBench-101 and ImageNet demonstrate the effectiveness of using GBDT as predictor for NAS.
arXiv Detail & Related papers (2020-07-09T13:28:49Z) - Offline Handwritten Chinese Text Recognition with Convolutional Neural
Networks [5.984124397831814]
In this paper, we build the models using only the convolutional neural networks and use CTC as the loss function.
We achieve 6.81% character error rate (CER) on the ICDAR 2013 competition set, which is the best published result without language model correction.
arXiv Detail & Related papers (2020-06-28T14:34:38Z) - Structured Multimodal Attentions for TextVQA [57.71060302874151]
We propose an end-to-end structured multimodal attention (SMA) neural network to mainly solve the first two issues above.
SMA first uses a structural graph representation to encode the object-object, object-text and text-text relationships appearing in the image, and then designs a multimodal graph attention network to reason over it.
Our proposed model outperforms the SoTA models on TextVQA dataset and two tasks of ST-VQA dataset among all models except pre-training based TAP.
arXiv Detail & Related papers (2020-06-01T07:07:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.