DUET: Detection Utilizing Enhancement for Text in Scanned or Captured
Documents
- URL: http://arxiv.org/abs/2106.05542v1
- Date: Thu, 10 Jun 2021 07:08:31 GMT
- Title: DUET: Detection Utilizing Enhancement for Text in Scanned or Captured
Documents
- Authors: Eun-Soo Jung, HyeongGwan Son, Kyusam Oh, Yongkeun Yun, Soonhwan Kwon,
Min Soo Kim
- Abstract summary: Our proposed model is designed to perform noise reduction and text region enhancement as well as text detection.
We enrich the training data for the model with synthesized document images that are fully labeled for text detection and enhancement.
Our methods are demonstrated in a real document dataset with performances exceeding those of other text detection methods.
- Score: 1.4866448722906016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel deep neural model for text detection in document images.
For robust text detection in noisy scanned documents, the advantages of
multi-task learning are adopted by adding an auxiliary task of text
enhancement. Namely, our proposed model is designed to perform noise reduction
and text region enhancement as well as text detection. Moreover, we enrich the
training data for the model with synthesized document images that are fully
labeled for text detection and enhancement, thus overcome the insufficiency of
labeled document image data. For the effective exploitation of the synthetic
and real data, the training process is separated in two phases. The first phase
is training only synthetic data in a fully-supervised manner. Then real data
with only detection labels are added in the second phase. The enhancement task
for the real data is weakly-supervised with information from their detection
labels. Our methods are demonstrated in a real document dataset with
performances exceeding those of other text detection methods. Moreover,
ablations are conducted and the results confirm the effectiveness of the
synthetic data, auxiliary task, and weak-supervision. Whereas the existing text
detection studies mostly focus on the text in scenes, our proposed method is
optimized to the applications for the text in scanned documents.
Related papers
- Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - Reading and Writing: Discriminative and Generative Modeling for
Self-Supervised Text Recognition [101.60244147302197]
We introduce contrastive learning and masked image modeling to learn discrimination and generation of text images.
Our method outperforms previous self-supervised text recognition methods by 10.2%-20.2% on irregular scene text recognition datasets.
Our proposed text recognizer exceeds previous state-of-the-art text recognition methods by averagely 5.3% on 11 benchmarks, with similar model size.
arXiv Detail & Related papers (2022-07-01T03:50:26Z) - Towards End-to-End Unified Scene Text Detection and Layout Analysis [60.68100769639923]
We introduce the task of unified scene text detection and layout analysis.
The first hierarchical scene text dataset is introduced to enable this novel research task.
We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way.
arXiv Detail & Related papers (2022-03-28T23:35:45Z) - Text-DIAE: Degradation Invariant Autoencoders for Text Recognition and
Document Enhancement [8.428866479825736]
Text-DIAE aims to solve two tasks, text recognition (handwritten or scene-text) and document image enhancement.
We define three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data.
Our method surpasses the state-of-the-art significantly in existing supervised and self-supervised settings.
arXiv Detail & Related papers (2022-03-09T15:44:36Z) - Language Matters: A Weakly Supervised Pre-training Approach for Scene
Text Detection and Spotting [69.77701325270047]
This paper presents a weakly supervised pre-training method that can acquire effective scene text representations.
Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features.
Experiments show that our pre-trained model improves F-score by +2.5% and +4.8% while transferring its weights to other text detection and spotting networks.
arXiv Detail & Related papers (2022-03-08T08:10:45Z) - Stroke-Based Scene Text Erasing Using Synthetic Data [0.0]
Scene text erasing can replace text regions with reasonable content in natural images.
The lack of a large-scale real-world scene-text removal dataset allows the existing methods to not work in full strength.
We enhance and make full use of the synthetic text and consequently train our model only on the dataset generated by the improved synthetic text engine.
This model can partially erase text instances in a scene image with a bounding box provided or work with an existing scene text detector for automatic scene text erasing.
arXiv Detail & Related papers (2021-04-23T09:29:41Z) - Scene text removal via cascaded text stroke detection and erasing [19.306751704904705]
Recent learning-based approaches show promising performance improvement for scene text removal task.
We propose a novel "end-to-end" framework based on accurate text stroke detection.
arXiv Detail & Related papers (2020-11-19T11:05:13Z) - TRIE: End-to-End Text Reading and Information Extraction for Document
Understanding [56.1416883796342]
We propose a unified end-to-end text reading and information extraction network.
multimodal visual and textual features of text reading are fused for information extraction.
Our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.
arXiv Detail & Related papers (2020-05-27T01:47:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.