UNITS: Unsupervised Intermediate Training Stage for Scene Text Detection
- URL: http://arxiv.org/abs/2205.04683v1
- Date: Tue, 10 May 2022 05:34:58 GMT
- Title: UNITS: Unsupervised Intermediate Training Stage for Scene Text Detection
- Authors: Youhui Guo, Yu Zhou, Xugong Qin, Enze Xie, Weiping Wang
- Abstract summary: We propose a new training paradigm for scene text detection, which introduces an textbfUNsupervised textbfIntermediate textbfTraining textbfStage (UNITS)
UNITS builds a buffer path to real-world data and can alleviate the gap between the pre-training stage and fine-tuning stage.
Three training strategies are further explored to perceive information from real-world data in an unsupervised way.
- Score: 16.925048424113463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent scene text detection methods are almost based on deep learning and
data-driven. Synthetic data is commonly adopted for pre-training due to
expensive annotation cost. However, there are obvious domain discrepancies
between synthetic data and real-world data. It may lead to sub-optimal
performance to directly adopt the model initialized by synthetic data in the
fine-tuning stage. In this paper, we propose a new training paradigm for scene
text detection, which introduces an \textbf{UN}supervised \textbf{I}ntermediate
\textbf{T}raining \textbf{S}tage (UNITS) that builds a buffer path to
real-world data and can alleviate the gap between the pre-training stage and
fine-tuning stage. Three training strategies are further explored to perceive
information from real-world data in an unsupervised way. With UNITS, scene text
detectors are improved without introducing any parameters and computations
during inference. Extensive experimental results show consistent performance
improvements on three public datasets.
Related papers
- Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Sequential Visual and Semantic Consistency for Semi-supervised Text
Recognition [56.968108142307976]
Scene text recognition (STR) is a challenging task that requires large-scale annotated data for training.
Most existing STR methods resort to synthetic data, which may introduce domain discrepancy and degrade the performance of STR models.
This paper proposes a novel semi-supervised learning method for STR that incorporates word-level consistency regularization from both visual and semantic aspects.
arXiv Detail & Related papers (2024-02-24T13:00:54Z) - Text2Data: Low-Resource Data Generation with Textual Control [104.38011760992637]
Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines.
We propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model.
It undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - A New Benchmark: On the Utility of Synthetic Data with Blender for Bare
Supervised Learning and Downstream Domain Adaptation [42.2398858786125]
Deep learning in computer vision has achieved great success with the price of large-scale labeled training data.
The uncontrollable data collection process produces non-IID training and test data, where undesired duplication may exist.
To circumvent them, an alternative is to generate synthetic data via 3D rendering with domain randomization.
arXiv Detail & Related papers (2023-03-16T09:03:52Z) - ASDOT: Any-Shot Data-to-Text Generation with Pretrained Language Models [82.63962107729994]
Any-Shot Data-to-Text (ASDOT) is a new approach flexibly applicable to diverse settings.
It consists of two steps, data disambiguation and sentence fusion.
Experimental results show that ASDOT consistently achieves significant improvement over baselines.
arXiv Detail & Related papers (2022-10-09T19:17:43Z) - A Scene-Text Synthesis Engine Achieved Through Learning from Decomposed
Real-World Data [4.096453902709292]
Scene-text image synthesis techniques aim to naturally compose text instances on background scene images.
We propose a Learning-Based Text Synthesis engine (LBTS) that includes a text location proposal network (TLPNet) and a text appearance adaptation network (TAANet)
After training, those networks can be integrated and utilized to generate the synthetic dataset for scene text analysis tasks.
arXiv Detail & Related papers (2022-09-06T11:15:58Z) - Non-Parametric Domain Adaptation for End-to-End Speech Translation [72.37869362559212]
End-to-End Speech Translation (E2E-ST) has received increasing attention due to the potential of its less error propagation, lower latency, and fewer parameters.
We propose a novel non-parametric method that leverages domain-specific text translation corpus to achieve domain adaptation for the E2E-ST system.
arXiv Detail & Related papers (2022-05-23T11:41:02Z) - Weakly Supervised Scene Text Detection using Deep Reinforcement Learning [6.918282834668529]
We propose a weak supervision method for scene text detection, which makes use of reinforcement learning (RL)
The reward received by the RL agent is estimated by a neural network, instead of being inferred from ground-truth labels.
We then use our proposed system in a weakly- and semi-supervised training on real-world data.
arXiv Detail & Related papers (2022-01-13T10:15:42Z) - On Exploring and Improving Robustness of Scene Text Detection Models [20.15225372544634]
We evaluate scene text detection models ICDAR2015-C (IC15-C) and CTW1500-C (CTW-C)
We perform a robustness analysis of six key components: pre-training data, backbone, feature fusion module, multi-scale predictions, representation of text instances and loss function.
We present a simple yet effective data-based method to destroy the smoothness of text regions by merging background and foreground.
arXiv Detail & Related papers (2021-10-12T02:36:48Z) - Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text
Detection in the Wild [11.045516338817132]
We propose a synthetic-to-real domain adaptation method for scene text detection.
A text self-training (TST) method and adversarial text instance alignment (ATA) for domain adaptive scene text detection are introduced.
Results demonstrate the effectiveness of the proposed method with up to 10% improvement.
arXiv Detail & Related papers (2020-09-03T16:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.