EASTER: Efficient and Scalable Text Recognizer
- URL: http://arxiv.org/abs/2008.07839v2
- Date: Wed, 19 Aug 2020 14:02:50 GMT
- Title: EASTER: Efficient and Scalable Text Recognizer
- Authors: Kartik Chaudhary and Raghav Bali
- Abstract summary: We present an Efficient And Scalable TExt Recognizer (EASTER) to perform optical character recognition on both machine printed and handwritten text.
Our model utilise 1-D convolutional layers without any recurrence which enables parallel training with considerably less volume of data.
We also showcase improvements over the current best results on offline handwritten text recognition task.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent progress in deep learning has led to the development of Optical
Character Recognition (OCR) systems which perform remarkably well. Most
research has been around recurrent networks as well as complex gated layers
which make the overall solution complex and difficult to scale. In this paper,
we present an Efficient And Scalable TExt Recognizer (EASTER) to perform
optical character recognition on both machine printed and handwritten text. Our
model utilises 1-D convolutional layers without any recurrence which enables
parallel training with considerably less volume of data. We experimented with
multiple variations of our architecture and one of the smallest variant (depth
and number of parameter wise) performs comparably to RNN based complex choices.
Our 20-layered deepest variant outperforms RNN architectures with a good margin
on benchmarking datasets like IIIT-5k and SVT. We also showcase improvements
over the current best results on offline handwritten text recognition task. We
also present data generation pipelines with augmentation setup to generate
synthetic datasets for both handwritten and machine printed text.
Related papers
- (PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork [60.889175951038496]
Large-scale neural networks have demonstrated remarkable performance in different domains like vision and language processing.
One of the key questions of structural pruning is how to estimate the channel significance.
We propose a novel algorithmic framework, namely textttPASS.
It is a tailored hyper-network to take both visual prompts and network weight statistics as input, and output layer-wise channel sparsity in a recurrent manner.
arXiv Detail & Related papers (2024-07-24T16:47:45Z) - Best Practices for a Handwritten Text Recognition System [8.334691351242753]
Handwritten text recognition has been developed rapidly in the recent years.
Non-trivial deviation in performance can be detected even when small pre-processing elements are changed.
This work highlights simple yet effective empirical practices that can further help training and provide well-performing handwritten text recognition systems.
arXiv Detail & Related papers (2024-04-17T13:00:05Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Easter2.0: Improving convolutional models for handwritten text
recognition [0.0]
We propose a CNN based architecture that bridges this gap.
Easter2.0 is composed of multiple layers of 1D Convolution, Batch Normalization, ReLU, Dropout, Dense Residual connection, Squeeze-and-Excitation module.
Our work achieves state-of-the-art results on IAM handwriting database when trained using only publicly available training data.
arXiv Detail & Related papers (2022-05-30T06:33:15Z) - RTMV: A Ray-Traced Multi-View Synthetic Dataset for Novel View Synthesis [104.53930611219654]
We present a large-scale synthetic dataset for novel view synthesis consisting of 300k images rendered from nearly 2000 complex scenes.
The dataset is orders of magnitude larger than existing synthetic datasets for novel view synthesis.
Using 4 distinct sources of high-quality 3D meshes, the scenes of our dataset exhibit challenging variations in camera views, lighting, shape, materials, and textures.
arXiv Detail & Related papers (2022-05-14T13:15:32Z) - Hierarchical Neural Network Approaches for Long Document Classification [3.6700088931938835]
We employ pre-trained Universal Sentence (USE) and Bidirectional Representations from Transformers (BERT) in a hierarchical setup to capture better representations efficiently.
Our proposed models are conceptually simple where we divide the input data into chunks and then pass this through base models of BERT and USE.
We show that USE + CNN/LSTM performs better than its stand-alone baseline. Whereas the BERT + CNN/LSTM performs on par with its stand-alone counterpart.
arXiv Detail & Related papers (2022-01-18T07:17:40Z) - TrOCR: Transformer-based Optical Character Recognition with Pre-trained
Models [47.48019831416665]
We propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR.
TrOCR is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets.
Experiments show that the TrOCR model outperforms the current state-of-the-art models on both printed and handwritten text recognition tasks.
arXiv Detail & Related papers (2021-09-21T16:01:56Z) - Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive
Text Summarization [15.367455931848252]
We present a sequence-to-sequence (seq2seq) autoencoder via contrastive learning for abstractive text summarization.
Our model adopts a standard Transformer-based architecture with a multi-layer bi-directional encoder and an auto-regressive decoder.
We conduct experiments on two datasets and demonstrate that our model outperforms many existing benchmarks.
arXiv Detail & Related papers (2021-08-26T18:45:13Z) - Rethinking Text Line Recognition Models [57.47147190119394]
We consider two decoder families (Connectionist Temporal Classification and Transformer) and three encoder modules (Bidirectional LSTMs, Self-Attention, and GRCLs)
We compare their accuracy and performance on widely used public datasets of scene and handwritten text.
Unlike the more common Transformer-based models, this architecture can handle inputs of arbitrary length.
arXiv Detail & Related papers (2021-04-15T21:43:13Z) - Depth-Adaptive Graph Recurrent Network for Text Classification [71.20237659479703]
Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network.
We propose a depth-adaptive mechanism for the S-LSTM, which allows the model to learn how many computational steps to conduct for different words as required.
arXiv Detail & Related papers (2020-02-29T03:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.