Towards Detecting, Recognizing, and Parsing the Address Information from
Bangla Signboard: A Deep Learning-based Approach
- URL: http://arxiv.org/abs/2311.13222v1
- Date: Wed, 22 Nov 2023 08:25:15 GMT
- Title: Towards Detecting, Recognizing, and Parsing the Address Information from
Bangla Signboard: A Deep Learning-based Approach
- Authors: Hasan Murad, Mohammed Eunus Ali
- Abstract summary: We have proposed an end-to-end system with deep learning-based models for detecting, recognizing, correcting, and parsing information from Bangla signboards.
We have created manually annotated and synthetic datasets to train signboard detection, address text detection, address text recognition, and address text models.
Finally, we have developed a Bangla address text using the state-of-the-art transformer-based pre-trained language model.
- Score: 1.3778851745408136
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieving textual information from natural scene images is an active
research area in the field of computer vision with numerous practical
applications. Detecting text regions and extracting text from signboards is a
challenging problem due to special characteristics like reflecting lights,
uneven illumination, or shadows found in real-life natural scene images. With
the advent of deep learning-based methods, different sophisticated techniques
have been proposed for text detection and text recognition from the natural
scene. Though a significant amount of effort has been devoted to extracting
natural scene text for resourceful languages like English, little has been done
for low-resource languages like Bangla. In this research work, we have proposed
an end-to-end system with deep learning-based models for efficiently detecting,
recognizing, correcting, and parsing address information from Bangla
signboards. We have created manually annotated datasets and synthetic datasets
to train signboard detection, address text detection, address text recognition,
address text correction, and address text parser models. We have conducted a
comparative study among different CTC-based and Encoder-Decoder model
architectures for Bangla address text recognition. Moreover, we have designed a
novel address text correction model using a sequence-to-sequence
transformer-based network to improve the performance of Bangla address text
recognition model by post-correction. Finally, we have developed a Bangla
address text parser using the state-of-the-art transformer-based pre-trained
language model.
Related papers
- The First Swahili Language Scene Text Detection and Recognition Dataset [55.83178123785643]
There is a significant gap in low-resource languages, especially the Swahili Language.
Swahili is widely spoken in East African countries but is still an under-explored language in scene text recognition.
We propose a comprehensive dataset of Swahili scene text images and evaluate the dataset on different scene text detection and recognition models.
arXiv Detail & Related papers (2024-05-19T03:55:02Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Research on Multilingual Natural Scene Text Detection Algorithm [4.514028820667202]
We propose a multilingual text detection model to address the issues of low accuracy and high difficulty in detecting multilingual text in natural scenes.
We introduce the SFM Swin Transformer feature extraction network to enhance the model's robustness in detecting characters and fonts across different languages.
To overcome this, we propose a Global Semantic Branch, extracting and preserving global features for more effective text detection.
arXiv Detail & Related papers (2023-12-18T12:46:35Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - Orientation-Independent Chinese Text Recognition in Scene Images [61.34060587461462]
We take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images.
Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information.
arXiv Detail & Related papers (2023-09-03T05:30:21Z) - Weakly Supervised Scene Text Generation for Low-resource Languages [19.243705770491577]
A large number of annotated training images is crucial for training successful scene text recognition models.
Existing scene text generation methods typically rely on a large amount of paired data, which is difficult to obtain for low-resource languages.
We propose a novel weakly supervised scene text generation method that leverages a few recognition-level labels as weak supervision.
arXiv Detail & Related papers (2023-06-25T15:26:06Z) - A Scene-Text Synthesis Engine Achieved Through Learning from Decomposed
Real-World Data [4.096453902709292]
Scene-text image synthesis techniques aim to naturally compose text instances on background scene images.
We propose a Learning-Based Text Synthesis engine (LBTS) that includes a text location proposal network (TLPNet) and a text appearance adaptation network (TAANet)
After training, those networks can be integrated and utilized to generate the synthetic dataset for scene text analysis tasks.
arXiv Detail & Related papers (2022-09-06T11:15:58Z) - MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining [68.05105411320842]
We propose a novel approach MaskOCR to unify vision and language pre-training in the classical encoder-decoder recognition framework.
We adopt the masked image modeling approach to pre-train the feature encoder using a large set of unlabeled real text images.
We transform text data into synthesized text images to unify the data modalities of vision and language, and enhance the language modeling capability of the sequence decoder.
arXiv Detail & Related papers (2022-06-01T08:27:19Z) - Vision-Language Pre-Training for Boosting Scene Text Detectors [57.08046351495244]
We specifically adapt vision-language joint learning for scene text detection.
We propose to learn contextualized, joint representations through vision-language pre-training.
The pre-trained model is able to produce more informative representations with richer semantics.
arXiv Detail & Related papers (2022-04-29T03:53:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.