Weakly-Supervised Arbitrary-Shaped Text Detection with
Expectation-Maximization Algorithm
- URL: http://arxiv.org/abs/2012.00424v1
- Date: Tue, 1 Dec 2020 11:45:39 GMT
- Title: Weakly-Supervised Arbitrary-Shaped Text Detection with
Expectation-Maximization Algorithm
- Authors: Mengbiao Zhao, Wei Feng, Fei Yin, Xu-Yao Zhang, Cheng-Lin Liu
- Abstract summary: We study weakly-supervised arbitrary-shaped text detection for combining various weak supervision forms.
We propose an Expectation-Maximization (EM) based weakly-supervised learning framework to train an accurate arbitrary-shaped text detector.
Our method yields comparable performance to state-of-the-art methods on three benchmarks.
- Score: 35.0126313032923
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Arbitrary-shaped text detection is an important and challenging task in
computer vision. Most existing methods require heavy data labeling efforts to
produce polygon-level text region labels for supervised training. In order to
reduce the cost in data labeling, we study weakly-supervised arbitrary-shaped
text detection for combining various weak supervision forms (e.g., image-level
tags, coarse, loose and tight bounding boxes), which are far easier for
annotation. We propose an Expectation-Maximization (EM) based weakly-supervised
learning framework to train an accurate arbitrary-shaped text detector using
only a small amount of polygon-level annotated data combined with a large
amount of weakly annotated data. Meanwhile, we propose a contour-based
arbitrary-shaped text detector, which is suitable for incorporating
weakly-supervised learning. Extensive experiments on three arbitrary-shaped
text benchmarks (CTW1500, Total-Text and ICDAR-ArT) show that (1) using only
10% strongly annotated data and 90% weakly annotated data, our method yields
comparable performance to state-of-the-art methods, (2) with 100% strongly
annotated data, our method outperforms existing methods on all three
benchmarks. We will make the weakly annotated datasets publicly available in
the future.
Related papers
- Looking at words and points with attention: a benchmark for
text-to-shape coherence [17.340484439401894]
The evaluation of coherence between generated 3D shapes and input textual descriptions lacks a clear benchmark.
We employ large language models to automatically refine descriptions associated with shapes.
To validate our approach, we conduct a user study and compare quantitatively our metric with existing ones.
The refined dataset, the new metric and a set of text-shape pairs validated by the user study comprise a novel, fine-grained benchmark.
arXiv Detail & Related papers (2023-09-14T17:59:48Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Collaborative Propagation on Multiple Instance Graphs for 3D Instance
Segmentation with Single-point Supervision [63.429704654271475]
We propose a novel weakly supervised method RWSeg that only requires labeling one object with one point.
With these sparse weak labels, we introduce a unified framework with two branches to propagate semantic and instance information.
Specifically, we propose a Cross-graph Competing Random Walks (CRW) algorithm that encourages competition among different instance graphs.
arXiv Detail & Related papers (2022-08-10T02:14:39Z) - Weakly Supervised Scene Text Detection using Deep Reinforcement Learning [6.918282834668529]
We propose a weak supervision method for scene text detection, which makes use of reinforcement learning (RL)
The reward received by the RL agent is estimated by a neural network, instead of being inferred from ground-truth labels.
We then use our proposed system in a weakly- and semi-supervised training on real-world data.
arXiv Detail & Related papers (2022-01-13T10:15:42Z) - Which and Where to Focus: A Simple yet Accurate Framework for
Arbitrary-Shaped Nearby Text Detection in Scene Images [8.180563824325086]
We propose a simple yet effective method for accurate arbitrary-shaped nearby scene text detection.
A One-to-Many Training Scheme (OMTS) is designed to eliminate confusion and enable the proposals to learn more appropriate groundtruths.
We also propose a Proposal Feature Attention Module (PFAM) to exploit more effective features for each proposal.
arXiv Detail & Related papers (2021-09-08T06:25:37Z) - Scene Text Detection with Scribble Lines [59.698806258671105]
We propose to annotate texts by scribble lines instead of polygons for text detection.
It is a general labeling method for texts with various shapes and requires low labeling costs.
Experiments show that the proposed method bridges the performance gap between the weakly labeling method and the original polygon-based labeling methods.
arXiv Detail & Related papers (2020-12-09T13:14:53Z) - Text Recognition -- Real World Data and Where to Find Them [36.10220484561196]
We present a method for exploiting weakly annotated images to improve text extraction pipelines.
The approach uses an arbitrary end-to-end text recognition system to obtain text region proposals and their, possibly erroneous, transcriptions.
It produces nearly error-free, localised instances of scene text, which we treat as "pseudo ground truth" (PGT)
arXiv Detail & Related papers (2020-07-06T22:23:27Z) - ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene
Text Detection [147.10751375922035]
We propose the ContourNet, which effectively handles false positives and large scale variance of scene texts.
Our method effectively suppresses these false positives by only outputting predictions with high response value in both directions.
arXiv Detail & Related papers (2020-04-10T08:15:23Z) - Weakly-Supervised Salient Object Detection via Scribble Annotations [54.40518383782725]
We propose a weakly-supervised salient object detection model to learn saliency from scribble labels.
We present a new metric, termed saliency structure measure, to measure the structure alignment of the predicted saliency maps.
Our method not only outperforms existing weakly-supervised/unsupervised methods, but also is on par with several fully-supervised state-of-the-art models.
arXiv Detail & Related papers (2020-03-17T12:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.