Weakly Supervised Scene Text Detection using Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2201.04866v1
- Date: Thu, 13 Jan 2022 10:15:42 GMT
- Title: Weakly Supervised Scene Text Detection using Deep Reinforcement Learning
- Authors: Emanuel Metzenthin, Christian Bartz, Christoph Meinel
- Abstract summary: We propose a weak supervision method for scene text detection, which makes use of reinforcement learning (RL)
The reward received by the RL agent is estimated by a neural network, instead of being inferred from ground-truth labels.
We then use our proposed system in a weakly- and semi-supervised training on real-world data.
- Score: 6.918282834668529
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The challenging field of scene text detection requires complex data
annotation, which is time-consuming and expensive. Techniques, such as weak
supervision, can reduce the amount of data needed. In this paper we propose a
weak supervision method for scene text detection, which makes use of
reinforcement learning (RL). The reward received by the RL agent is estimated
by a neural network, instead of being inferred from ground-truth labels. First,
we enhance an existing supervised RL approach to text detection with several
training optimizations, allowing us to close the performance gap to
regression-based algorithms. We then use our proposed system in a weakly- and
semi-supervised training on real-world data. Our results show that training in
a weakly supervised setting is feasible. However, we find that using our model
in a semi-supervised setting , e.g. when combining labeled synthetic data with
unannotated real-world data, produces the best results.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - Text2Data: Low-Resource Data Generation with Textual Control [104.38011760992637]
Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines.
We propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model.
It undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Semi-WTC: A Practical Semi-supervised Framework for Attack
Categorization through Weight-Task Consistency [19.97236038722335]
Supervised learning has been widely used for attack detection, which requires large amounts of high-quality data and labels.
We propose a semi-supervised fine-grained attack categorization framework consisting of an encoder and a two-branch structure.
We show that our model outperforms the state-of-the-art semi-supervised attack detection methods with a general 5% improvement in classification accuracy and a 90% reduction in training time.
arXiv Detail & Related papers (2022-05-19T16:30:31Z) - UNITS: Unsupervised Intermediate Training Stage for Scene Text Detection [16.925048424113463]
We propose a new training paradigm for scene text detection, which introduces an textbfUNsupervised textbfIntermediate textbfTraining textbfStage (UNITS)
UNITS builds a buffer path to real-world data and can alleviate the gap between the pre-training stage and fine-tuning stage.
Three training strategies are further explored to perceive information from real-world data in an unsupervised way.
arXiv Detail & Related papers (2022-05-10T05:34:58Z) - Weakly-Supervised Arbitrary-Shaped Text Detection with
Expectation-Maximization Algorithm [35.0126313032923]
We study weakly-supervised arbitrary-shaped text detection for combining various weak supervision forms.
We propose an Expectation-Maximization (EM) based weakly-supervised learning framework to train an accurate arbitrary-shaped text detector.
Our method yields comparable performance to state-of-the-art methods on three benchmarks.
arXiv Detail & Related papers (2020-12-01T11:45:39Z) - Learning to Count in the Crowd from Limited Labeled Data [109.2954525909007]
We focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples.
Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data.
arXiv Detail & Related papers (2020-07-07T04:17:01Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - Self-Training for Domain Adaptive Scene Text Detection [16.42511044274265]
We propose a self-training framework to automatically mine hard examples with pseudo-labels from unannotated videos or images.
Experimental results on standard benchmarks, including ICDAR2015, MSRA-TD500, ICDAR 2017 MLT, demonstrate the effectiveness of our self-training method.
The simple Mask R-CNN adapted with self-training and fine-tuned on real data can achieve comparable or even superior results with the state-of-the-art methods.
arXiv Detail & Related papers (2020-05-23T07:36:23Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.