Self-Training for Domain Adaptive Scene Text Detection
- URL: http://arxiv.org/abs/2005.11487v1
- Date: Sat, 23 May 2020 07:36:23 GMT
- Title: Self-Training for Domain Adaptive Scene Text Detection
- Authors: Yudi Chen, Wei Wang, Yu Zhou, Fei Yang, Dongbao Yang, Weiping Wang
- Abstract summary: We propose a self-training framework to automatically mine hard examples with pseudo-labels from unannotated videos or images.
Experimental results on standard benchmarks, including ICDAR2015, MSRA-TD500, ICDAR 2017 MLT, demonstrate the effectiveness of our self-training method.
The simple Mask R-CNN adapted with self-training and fine-tuned on real data can achieve comparable or even superior results with the state-of-the-art methods.
- Score: 16.42511044274265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Though deep learning based scene text detection has achieved great progress,
well-trained detectors suffer from severe performance degradation for different
domains. In general, a tremendous amount of data is indispensable to train the
detector in the target domain. However, data collection and annotation are
expensive and time-consuming. To address this problem, we propose a
self-training framework to automatically mine hard examples with pseudo-labels
from unannotated videos or images. To reduce the noise of hard examples, a
novel text mining module is implemented based on the fusion of detection and
tracking results. Then, an image-to-video generation method is designed for the
tasks that videos are unavailable and only images can be used. Experimental
results on standard benchmarks, including ICDAR2015, MSRA-TD500, ICDAR2017 MLT,
demonstrate the effectiveness of our self-training method. The simple Mask
R-CNN adapted with self-training and fine-tuned on real data can achieve
comparable or even superior results with the state-of-the-art methods.
Related papers
- Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Weakly Supervised Two-Stage Training Scheme for Deep Video Fight
Detection Model [0.0]
Fight detection in videos is an emerging deep learning application with today's prevalence of surveillance systems and streaming media.
Previous work has largely relied on action recognition techniques to tackle this problem.
We design the fight detection model as a composition of an action-aware feature extractor and an anomaly score generator.
arXiv Detail & Related papers (2022-09-23T08:29:16Z) - SISL:Self-Supervised Image Signature Learning for Splicing Detection and
Localization [11.437760125881049]
We propose self-supervised approach for training splicing detection/localization models from frequency transforms of images.
Our proposed model can yield similar or better performances on standard datasets without relying on labels or metadata.
arXiv Detail & Related papers (2022-03-15T12:26:29Z) - Weakly Supervised Scene Text Detection using Deep Reinforcement Learning [6.918282834668529]
We propose a weak supervision method for scene text detection, which makes use of reinforcement learning (RL)
The reward received by the RL agent is estimated by a neural network, instead of being inferred from ground-truth labels.
We then use our proposed system in a weakly- and semi-supervised training on real-world data.
arXiv Detail & Related papers (2022-01-13T10:15:42Z) - Self-Supervision & Meta-Learning for One-Shot Unsupervised Cross-Domain
Detection [0.0]
We present an object detection algorithm able to perform unsupervised adaptation across domains by using only one target sample, seen at test time.
We exploit meta-learning to simulate single-sample cross domain learning episodes and better align to the test condition.
arXiv Detail & Related papers (2021-06-07T10:33:04Z) - Enhanced Few-shot Learning for Intrusion Detection in Railway Video
Surveillance [16.220077781635748]
An enhanced model-agnostic meta-learner is trained using both the original video frames and segmented masks of track area extracted from the video.
Numerical results show that the enhanced meta-learner successfully adapts unseen scene with only few newly collected video frame samples.
arXiv Detail & Related papers (2020-11-09T08:59:15Z) - Deep Traffic Sign Detection and Recognition Without Target Domain Real
Images [52.079665469286496]
We propose a novel database generation method that requires no real image from the target-domain, and (ii) templates of the traffic signs.
The method does not aim at overcoming the training with real data, but to be a compatible alternative when the real data is not available.
On large data sets, training with a fully synthetic data set almost matches the performance of training with a real one.
arXiv Detail & Related papers (2020-07-30T21:06:47Z) - Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed
Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels.
Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework.
We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z) - Syn2Real Transfer Learning for Image Deraining using Gaussian Processes [92.15895515035795]
CNN-based methods for image deraining have achieved excellent performance in terms of reconstruction error as well as visual quality.
Due to challenges in obtaining real world fully-labeled image deraining datasets, existing methods are trained only on synthetically generated data.
We propose a Gaussian Process-based semi-supervised learning framework which enables the network in learning to derain using synthetic dataset.
arXiv Detail & Related papers (2020-06-10T00:33:18Z) - Auto-Rectify Network for Unsupervised Indoor Depth Estimation [119.82412041164372]
We establish that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth.
We propose a data pre-processing method that rectifies training images by removing their relative rotations for effective learning.
Our results outperform the previous unsupervised SOTA method by a large margin on the challenging NYUv2 dataset.
arXiv Detail & Related papers (2020-06-04T08:59:17Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.