Related papers: Hamming OCR: A Locality Sensitive Hashing Neural Network for Scene Text Recognition

Hamming OCR: A Locality Sensitive Hashing Neural Network for Scene Text Recognition

URL: http://arxiv.org/abs/2009.10874v1
Date: Wed, 23 Sep 2020 01:20:19 GMT
Title: Hamming OCR: A Locality Sensitive Hashing Neural Network for Scene Text Recognition
Authors: Bingcong Li, Xin Tang, Xianbiao Qi, Yihao Chen, Rong Xiao
Abstract summary: Self-attention-based scene text recognition approaches have achieved outstanding performance. Number of parameters in both classification and embedding layers is independent on the size of vocabulary. Hamming OCR achieves competitive results.
Score: 14.250874536962366
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, inspired by Transformer, self-attention-based scene text recognition approaches have achieved outstanding performance. However, we find that the size of model expands rapidly with the lexicon increasing. Specifically, the number of parameters for softmax classification layer and output embedding layer are proportional to the vocabulary size. It hinders the development of a lightweight text recognition model especially applied for Chinese and multiple languages. Thus, we propose a lightweight scene text recognition model named Hamming OCR. In this model, a novel Hamming classifier, which adopts locality sensitive hashing (LSH) algorithm to encode each character, is proposed to replace the softmax regression and the generated LSH code is directly employed to replace the output embedding. We also present a simplified transformer decoder to reduce the number of parameters by removing the feed-forward network and using cross-layer parameter sharing technique. Compared with traditional methods, the number of parameters in both classification and embedding layers is independent on the size of vocabulary, which significantly reduces the storage requirement without loss of accuracy. Experimental results on several datasets, including four public benchmaks and a Chinese text dataset synthesized by SynthText with more than 20,000 characters, shows that Hamming OCR achieves competitive results.

Related papers

Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models [92.18057318458528]
Token-Shuffle is a novel method that reduces the number of image tokens in Transformer. Our strategy requires no additional pretrained text-encoder and enables MLLMs to support extremely high-resolution image synthesis. In GenAI-benchmark, our 2.7B model achieves 0.77 overall score on hard prompts, outperforming AR models LlamaGen by 0.18 and diffusion models LDM by 0.15.
arXiv Detail & Related papers (2025-04-24T17:59:56Z)
KAP: MLLM-assisted OCR Text Enhancement for Hybrid Retrieval in Chinese Non-Narrative Documents [0.0]
We propose Knowledge-Aware Preprocessing (KAP), a novel framework that transforms noisy OCR outputs into retrieval-optimized text. KAP adopts a two-stage approach: it first extracts text using OCR, then employs Multimodal Large Language Models to refine the output. Empirical results demonstrate that KAP consistently and significantly outperforms conventional preprocessing approaches.
arXiv Detail & Related papers (2025-03-11T14:01:03Z)
Autoregressive Speech Synthesis without Vector Quantization [135.4776759536272]
We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS) MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition.
arXiv Detail & Related papers (2024-07-11T14:36:53Z)
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment [101.2489492032816]
VALL-E R is a robust and efficient zero-shot Text-to-Speech system. This research has the potential to be applied to meaningful projects, including the creation of speech for those affected by aphasia.
arXiv Detail & Related papers (2024-06-12T04:09:44Z)
DLoRA-TrOCR: Mixed Text Mode Optical Character Recognition Based On Transformer [12.966765239586994]
Multi- fonts, mixed scenes and complex layouts seriously affect the recognition accuracy of traditional OCR models. We propose a parameter-efficient mixed text recognition method based on pre-trained OCR Transformer, namely DLoRA-TrOCR.
arXiv Detail & Related papers (2024-04-19T09:28:16Z)
A multi-model-based deep learning framework for short text multiclass classification with the imbalanced and extremely small data set [0.6875312133832077]
This paper proposes a multimodel-based deep learning framework for short-text multiclass classification with an imbalanced and extremely small data set. It retains the state-of-the-art baseline performance in terms of precision, recall, accuracy, and F1 score.
arXiv Detail & Related papers (2022-06-24T00:51:02Z)
SVTR: Scene Text Recognition with a Single Visual Model [44.26135584093631]
We propose a Single Visual model for Scene Text recognition within the patch-wise image tokenization framework. The method, termed SVTR, firstly decomposes an image text into small patches named character components. Experimental results on both English and Chinese scene text recognition tasks demonstrate the effectiveness of SVTR.
arXiv Detail & Related papers (2022-04-30T04:37:01Z)
Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents. Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages. We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z)
Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation [98.05643473345474]
We propose a novel decoder, termed dynamic neural representational decoder (NRD) As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks. This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
arXiv Detail & Related papers (2021-07-30T04:50:56Z)
Rethinking Text Line Recognition Models [57.47147190119394]
We consider two decoder families (Connectionist Temporal Classification and Transformer) and three encoder modules (Bidirectional LSTMs, Self-Attention, and GRCLs) We compare their accuracy and performance on widely used public datasets of scene and handwritten text. Unlike the more common Transformer-based models, this architecture can handle inputs of arbitrary length.
arXiv Detail & Related papers (2021-04-15T21:43:13Z)
Offline Handwritten Chinese Text Recognition with Convolutional Neural Networks [5.984124397831814]
In this paper, we build the models using only the convolutional neural networks and use CTC as the loss function. We achieve 6.81% character error rate (CER) on the ICDAR 2013 competition set, which is the best published result without language model correction.
arXiv Detail & Related papers (2020-06-28T14:34:38Z)
All Word Embeddings from One Embedding [23.643059189673473]
In neural network-based models for natural language processing, the largest part of the parameters often consists of word embeddings. In this study, to reduce the total number of parameters, the embeddings for all words are represented by transforming a shared embedding. The proposed method, ALONE, constructs the embedding of a word by modifying the shared embedding with a filter vector, which is word-specific but non-trainable.
arXiv Detail & Related papers (2020-04-25T07:38:08Z)
Depth-Adaptive Graph Recurrent Network for Text Classification [71.20237659479703]
Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network. We propose a depth-adaptive mechanism for the S-LSTM, which allows the model to learn how many computational steps to conduct for different words as required.
arXiv Detail & Related papers (2020-02-29T03:09:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.