Domain Adaptive Scene Text Detection via Subcategorization
- URL: http://arxiv.org/abs/2212.00377v1
- Date: Thu, 1 Dec 2022 09:15:43 GMT
- Title: Domain Adaptive Scene Text Detection via Subcategorization
- Authors: Zichen Tian, Chuhui Xue, Jingyi Zhang, Shijian Lu
- Abstract summary: We study domain adaptive scene text detection, a largely neglected yet very meaningful task.
We design SCAST, a subcategory-aware self-training technique that mitigates the network overfitting and noisy pseudo labels.
SCAST achieves superior detection performance consistently across multiple public benchmarks.
- Score: 45.580559833129165
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most existing scene text detectors require large-scale training data which
cannot scale well due to two major factors: 1) scene text images often have
domain-specific distributions; 2) collecting large-scale annotated scene text
images is laborious. We study domain adaptive scene text detection, a largely
neglected yet very meaningful task that aims for optimal transfer of labelled
scene text images while handling unlabelled images in various new domains.
Specifically, we design SCAST, a subcategory-aware self-training technique that
mitigates the network overfitting and noisy pseudo labels in domain adaptive
scene text detection effectively. SCAST consists of two novel designs. For
labelled source data, it introduces pseudo subcategories for both foreground
texts and background stuff which helps train more generalizable source models
with multi-class detection objectives. For unlabelled target data, it mitigates
the network overfitting by co-regularizing the binary and subcategory
classifiers trained in the source domain. Extensive experiments show that SCAST
achieves superior detection performance consistently across multiple public
benchmarks, and it also generalizes well to other domain adaptive detection
tasks such as vehicle detection.
Related papers
- Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards
Enhancing Text Spotting Performance [15.513912470752041]
The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions.
Here, we investigate the problem of domain-adaptive scene text spotting, i.e., training a model on multi-domain source data.
The results clearly demonstrate the potential of intermediate representations to achieve significant performance on text spotting benchmarks across multiple domains.
arXiv Detail & Related papers (2023-10-02T06:08:01Z) - Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes [11.478236584340255]
We present a text spotting validation benchmark called Under-Water Text (UWT) for noisy underwater scenes.
We also design an efficient super-resolution based end-to-end transformer baseline called DA-TextSpotter.
The dataset, code and pre-trained models will be released upon acceptance.
arXiv Detail & Related papers (2023-10-01T03:27:41Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - Towards End-to-End Unified Scene Text Detection and Layout Analysis [60.68100769639923]
We introduce the task of unified scene text detection and layout analysis.
The first hierarchical scene text dataset is introduced to enable this novel research task.
We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way.
arXiv Detail & Related papers (2022-03-28T23:35:45Z) - Text Recognition in Real Scenarios with a Few Labeled Samples [55.07859517380136]
Scene text recognition (STR) is still a hot research topic in computer vision field.
This paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation.
Our approach can maximize the character-level confusion between the source domain and the target domain.
arXiv Detail & Related papers (2020-06-22T13:03:01Z) - Phase Consistent Ecological Domain Adaptation [76.75730500201536]
We focus on the task of semantic segmentation, where annotated synthetic data are aplenty, but annotating real data is laborious.
The first criterion, inspired by visual psychophysics, is that the map between the two image domains be phase-preserving.
The second criterion aims to leverage ecological statistics, or regularities in the scene which are manifest in any image of it, regardless of the characteristics of the illuminant or the imaging sensor.
arXiv Detail & Related papers (2020-04-10T06:58:03Z) - iFAN: Image-Instance Full Alignment Networks for Adaptive Object
Detection [48.83883375118966]
iFAN aims to precisely align feature distributions on both image and instance levels.
It outperforms state-of-the-art methods with a boost of 10%+ AP over the source-only baseline.
arXiv Detail & Related papers (2020-03-09T13:27:06Z) - DGST : Discriminator Guided Scene Text detector [11.817428636084305]
This paper proposes a detector framework based on the conditional generative adversarial networks to improve the segmentation effect of scene text detection.
Experiments on standard datasets demonstrate that the proposed D GST brings noticeable gain and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-02-28T01:47:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.