Text-DIAE: Degradation Invariant Autoencoders for Text Recognition and
Document Enhancement
- URL: http://arxiv.org/abs/2203.04814v2
- Date: Thu, 10 Mar 2022 17:39:02 GMT
- Title: Text-DIAE: Degradation Invariant Autoencoders for Text Recognition and
Document Enhancement
- Authors: Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten,
Alicia Forn\'es, Yousri Kessentini, Josep Llad\'os, Lluis Gomez, Dimosthenis
Karatzas
- Abstract summary: Text-DIAE aims to solve two tasks, text recognition (handwritten or scene-text) and document image enhancement.
We define three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data.
Our method surpasses the state-of-the-art significantly in existing supervised and self-supervised settings.
- Score: 8.428866479825736
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we propose Text-Degradation Invariant Auto Encoder (Text-DIAE)
aimed to solve two tasks, text recognition (handwritten or scene-text) and
document image enhancement. We define three pretext tasks as learning
objectives to be optimized during pre-training without the usage of labelled
data. Each of the pre-text objectives is specifically tailored for the final
downstream tasks. We conduct several ablation experiments that show the
importance of each degradation for a specific domain. Exhaustive
experimentation shows that our method does not have limitations of previous
state-of-the-art based on contrastive losses while at the same time requiring
essentially fewer data samples to converge. Finally, we demonstrate that our
method surpasses the state-of-the-art significantly in existing supervised and
self-supervised settings in handwritten and scene text recognition and document
image enhancement. Our code and trained models will be made publicly available
at~\url{ http://Upon_Acceptance}.
Related papers
- TextDestroyer: A Training- and Annotation-Free Diffusion Method for Destroying Anomal Text from Images [84.08181780666698]
TextDestroyer is the first training- and annotation-free method for scene text destruction.
Our method scrambles text areas in the latent start code using a Gaussian distribution before reconstruction.
The advantages of TextDestroyer include: (1) it eliminates labor-intensive data annotation and resource-intensive training; (2) it achieves more thorough text destruction, preventing recognizable traces; and (3) it demonstrates better generalization capabilities, performing well on both real-world scenes and generated images.
arXiv Detail & Related papers (2024-11-01T04:41:00Z) - Leveraging Structure Knowledge and Deep Models for the Detection of Abnormal Handwritten Text [19.05500901000957]
We propose a two-stage detection algorithm that combines structure knowledge and deep models for handwritten text.
A shape regression network trained by a novel semi-supervised contrast training strategy is introduced and the positional relationship between the characters is fully employed.
Experiments on two handwritten text datasets show that the proposed method can greatly improve the detection performance.
arXiv Detail & Related papers (2024-10-15T14:57:10Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - Handwritten and Printed Text Segmentation: A Signature Case Study [0.0]
We develop novel approaches to address the challenges of handwritten and printed text segmentation.
Our objective is to recover text from different classes in their entirety, especially enhancing the segmentation performance on overlapping sections.
Our best configuration outperforms prior work on two different datasets by 17.9% and 7.3% on IoU scores.
arXiv Detail & Related papers (2023-07-15T21:49:22Z) - Copy Is All You Need [66.00852205068327]
We formulate text generation as progressively copying text segments from an existing text collection.
Our approach achieves better generation quality according to both automatic and human evaluations.
Our approach attains additional performance gains by simply scaling up to larger text collections.
arXiv Detail & Related papers (2023-07-13T05:03:26Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Progressive Scene Text Erasing with Self-Supervision [7.118419154170154]
Scene text erasing seeks to erase text contents from scene images.
Current state-of-the-art text erasing models are trained on large-scale synthetic data.
We employ self-supervision for feature representation on unlabeled real-world scene text images.
arXiv Detail & Related papers (2022-07-23T09:05:13Z) - Vision-Language Pre-Training for Boosting Scene Text Detectors [57.08046351495244]
We specifically adapt vision-language joint learning for scene text detection.
We propose to learn contextualized, joint representations through vision-language pre-training.
The pre-trained model is able to produce more informative representations with richer semantics.
arXiv Detail & Related papers (2022-04-29T03:53:54Z) - DUET: Detection Utilizing Enhancement for Text in Scanned or Captured
Documents [1.4866448722906016]
Our proposed model is designed to perform noise reduction and text region enhancement as well as text detection.
We enrich the training data for the model with synthesized document images that are fully labeled for text detection and enhancement.
Our methods are demonstrated in a real document dataset with performances exceeding those of other text detection methods.
arXiv Detail & Related papers (2021-06-10T07:08:31Z) - Scene text removal via cascaded text stroke detection and erasing [19.306751704904705]
Recent learning-based approaches show promising performance improvement for scene text removal task.
We propose a novel "end-to-end" framework based on accurate text stroke detection.
arXiv Detail & Related papers (2020-11-19T11:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.