TLGAN: document Text Localization using Generative Adversarial Nets
- URL: http://arxiv.org/abs/2010.11547v1
- Date: Thu, 22 Oct 2020 09:19:13 GMT
- Title: TLGAN: document Text Localization using Generative Adversarial Nets
- Authors: Dongyoung Kim, Myungsung Kwak, Eunji Won, Sejung Shin, Jeongyeon Nam
- Abstract summary: Text localization from digital image is first step for optical character recognition.
Deep neural networks are used to perform text localization from digital image.
Training only ten labeled receipt images from Robust Reading Challenge on Scanned Receipts OCR and Information Extraction.
TLGAN achieved 99.83% precision and 99.64% recall for SROIE test data.
- Score: 2.1378501793514277
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Text localization from the digital image is the first step for the optical
character recognition task. Conventional image processing based text
localization performs adequately for specific examples. Yet, a general text
localization are only archived by recent deep-learning based modalities. Here
we present document Text Localization Generative Adversarial Nets (TLGAN) which
are deep neural networks to perform the text localization from digital image.
TLGAN is an versatile and easy-train text localization model requiring a small
amount of data. Training only ten labeled receipt images from Robust Reading
Challenge on Scanned Receipts OCR and Information Extraction (SROIE), TLGAN
achieved 99.83% precision and 99.64% recall for SROIE test data. Our TLGAN is a
practical text localization solution requiring minimal effort for data labeling
and model training and producing a state-of-art performance.
Related papers
- Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - Towards Detecting, Recognizing, and Parsing the Address Information from
Bangla Signboard: A Deep Learning-based Approach [1.3778851745408136]
We have proposed an end-to-end system with deep learning-based models for detecting, recognizing, correcting, and parsing information from Bangla signboards.
We have created manually annotated and synthetic datasets to train signboard detection, address text detection, address text recognition, and address text models.
Finally, we have developed a Bangla address text using the state-of-the-art transformer-based pre-trained language model.
arXiv Detail & Related papers (2023-11-22T08:25:15Z) - Orientation-Independent Chinese Text Recognition in Scene Images [61.34060587461462]
We take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images.
Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information.
arXiv Detail & Related papers (2023-09-03T05:30:21Z) - Language Guided Local Infiltration for Interactive Image Retrieval [12.324893780690918]
Interactive Image Retrieval (IIR) aims to retrieve images that are generally similar to the reference image but under requested text modification.
We propose a Language Guided Local Infiltration (LGLI) system, which fully utilizes the text information and penetrates text features into image features.
Our method outperforms most state-of-the-art IIR approaches.
arXiv Detail & Related papers (2023-04-16T10:33:08Z) - Geometric Perception based Efficient Text Recognition [0.0]
In real-world applications with fixed camera positions, the underlying data tends to be regular scene text.
This paper introduces the underlying concepts, theory, implementation, and experiment results to develop specialized models.
We introduce a novel deep learning architecture (GeoTRNet), trained to identify digits in a regular scene image, only using the geometrical features present.
arXiv Detail & Related papers (2023-02-08T04:19:24Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - A Scene-Text Synthesis Engine Achieved Through Learning from Decomposed
Real-World Data [4.096453902709292]
Scene-text image synthesis techniques aim to naturally compose text instances on background scene images.
We propose a Learning-Based Text Synthesis engine (LBTS) that includes a text location proposal network (TLPNet) and a text appearance adaptation network (TAANet)
After training, those networks can be integrated and utilized to generate the synthetic dataset for scene text analysis tasks.
arXiv Detail & Related papers (2022-09-06T11:15:58Z) - Language Matters: A Weakly Supervised Pre-training Approach for Scene
Text Detection and Spotting [69.77701325270047]
This paper presents a weakly supervised pre-training method that can acquire effective scene text representations.
Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features.
Experiments show that our pre-trained model improves F-score by +2.5% and +4.8% while transferring its weights to other text detection and spotting networks.
arXiv Detail & Related papers (2022-03-08T08:10:45Z) - Scene Text Image Super-Resolution in the Wild [112.90416737357141]
Low-resolution text images are often seen in natural scenes such as documents captured by mobile phones.
Previous single image super-resolution (SISR) methods are trained on synthetic low-resolution images.
We pro-pose a real scene text SR dataset, termed TextZoom.
It contains paired real low-resolution and high-resolution images captured by cameras with different focal length in the wild.
arXiv Detail & Related papers (2020-05-07T09:18:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.