Hamming OCR: A Locality Sensitive Hashing Neural Network for Scene Text
Recognition
- URL: http://arxiv.org/abs/2009.10874v1
- Date: Wed, 23 Sep 2020 01:20:19 GMT
- Title: Hamming OCR: A Locality Sensitive Hashing Neural Network for Scene Text
Recognition
- Authors: Bingcong Li, Xin Tang, Xianbiao Qi, Yihao Chen, Rong Xiao
- Abstract summary: Self-attention-based scene text recognition approaches have achieved outstanding performance.
Number of parameters in both classification and embedding layers is independent on the size of vocabulary.
Hamming OCR achieves competitive results.
- Score: 14.250874536962366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, inspired by Transformer, self-attention-based scene text
recognition approaches have achieved outstanding performance. However, we find
that the size of model expands rapidly with the lexicon increasing.
Specifically, the number of parameters for softmax classification layer and
output embedding layer are proportional to the vocabulary size. It hinders the
development of a lightweight text recognition model especially applied for
Chinese and multiple languages. Thus, we propose a lightweight scene text
recognition model named Hamming OCR. In this model, a novel Hamming classifier,
which adopts locality sensitive hashing (LSH) algorithm to encode each
character, is proposed to replace the softmax regression and the generated LSH
code is directly employed to replace the output embedding. We also present a
simplified transformer decoder to reduce the number of parameters by removing
the feed-forward network and using cross-layer parameter sharing technique.
Compared with traditional methods, the number of parameters in both
classification and embedding layers is independent on the size of vocabulary,
which significantly reduces the storage requirement without loss of accuracy.
Experimental results on several datasets, including four public benchmaks and a
Chinese text dataset synthesized by SynthText with more than 20,000 characters,
shows that Hamming OCR achieves competitive results.
Related papers
- Autoregressive Speech Synthesis without Vector Quantization [135.4776759536272]
We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS)
MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition.
arXiv Detail & Related papers (2024-07-11T14:36:53Z) - VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment [101.2489492032816]
VALL-E R is a robust and efficient zero-shot Text-to-Speech system.
This research has the potential to be applied to meaningful projects, including the creation of speech for those affected by aphasia.
arXiv Detail & Related papers (2024-06-12T04:09:44Z) - DLoRA-TrOCR: Mixed Text Mode Optical Character Recognition Based On Transformer [12.966765239586994]
Multi- fonts, mixed scenes and complex layouts seriously affect the recognition accuracy of traditional OCR models.
We propose a parameter-efficient mixed text recognition method based on pre-trained OCR Transformer, namely DLoRA-TrOCR.
arXiv Detail & Related papers (2024-04-19T09:28:16Z) - A multi-model-based deep learning framework for short text multiclass
classification with the imbalanced and extremely small data set [0.6875312133832077]
This paper proposes a multimodel-based deep learning framework for short-text multiclass classification with an imbalanced and extremely small data set.
It retains the state-of-the-art baseline performance in terms of precision, recall, accuracy, and F1 score.
arXiv Detail & Related papers (2022-06-24T00:51:02Z) - SVTR: Scene Text Recognition with a Single Visual Model [44.26135584093631]
We propose a Single Visual model for Scene Text recognition within the patch-wise image tokenization framework.
The method, termed SVTR, firstly decomposes an image text into small patches named character components.
Experimental results on both English and Chinese scene text recognition tasks demonstrate the effectiveness of SVTR.
arXiv Detail & Related papers (2022-04-30T04:37:01Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Dynamic Neural Representational Decoders for High-Resolution Semantic
Segmentation [98.05643473345474]
We propose a novel decoder, termed dynamic neural representational decoder (NRD)
As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks.
This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
arXiv Detail & Related papers (2021-07-30T04:50:56Z) - Rethinking Text Line Recognition Models [57.47147190119394]
We consider two decoder families (Connectionist Temporal Classification and Transformer) and three encoder modules (Bidirectional LSTMs, Self-Attention, and GRCLs)
We compare their accuracy and performance on widely used public datasets of scene and handwritten text.
Unlike the more common Transformer-based models, this architecture can handle inputs of arbitrary length.
arXiv Detail & Related papers (2021-04-15T21:43:13Z) - Offline Handwritten Chinese Text Recognition with Convolutional Neural
Networks [5.984124397831814]
In this paper, we build the models using only the convolutional neural networks and use CTC as the loss function.
We achieve 6.81% character error rate (CER) on the ICDAR 2013 competition set, which is the best published result without language model correction.
arXiv Detail & Related papers (2020-06-28T14:34:38Z) - All Word Embeddings from One Embedding [23.643059189673473]
In neural network-based models for natural language processing, the largest part of the parameters often consists of word embeddings.
In this study, to reduce the total number of parameters, the embeddings for all words are represented by transforming a shared embedding.
The proposed method, ALONE, constructs the embedding of a word by modifying the shared embedding with a filter vector, which is word-specific but non-trainable.
arXiv Detail & Related papers (2020-04-25T07:38:08Z) - Depth-Adaptive Graph Recurrent Network for Text Classification [71.20237659479703]
Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network.
We propose a depth-adaptive mechanism for the S-LSTM, which allows the model to learn how many computational steps to conduct for different words as required.
arXiv Detail & Related papers (2020-02-29T03:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.