Polygon-free: Unconstrained Scene Text Detection with Box Annotations
- URL: http://arxiv.org/abs/2011.13307v3
- Date: Thu, 26 May 2022 10:47:26 GMT
- Title: Polygon-free: Unconstrained Scene Text Detection with Box Annotations
- Authors: Weijia Wu, Enze Xie, Ruimao Zhang, Wenhai Wang, Hong Zhou, Ping Luo
- Abstract summary: This study proposes an unconstrained text detection system termed Polygon-free (PF)
PF is trained with only upright bounding box annotations.
Experiments demonstrate that PF can combine general detectors to yield surprisingly high-quality pixel-level results.
- Score: 39.74109294551322
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although a polygon is a more accurate representation than an upright bounding
box for text detection, the annotations of polygons are extremely expensive and
challenging. Unlike existing works that employ fully-supervised training with
polygon annotations, this study proposes an unconstrained text detection system
termed Polygon-free (PF), in which most existing polygon-based text detectors
(e.g., PSENet [33],DB [16]) are trained with only upright bounding box
annotations. Our core idea is to transfer knowledge from synthetic data to real
data to enhance the supervision information of upright bounding boxes. This is
made possible with a simple segmentation network, namely Skeleton Attention
Segmentation Network (SASN), that includes three vital components (i.e.,
channel attention, spatial attention and skeleton attention map) and one soft
cross-entropy loss. Experiments demonstrate that the proposed Polygonfree
system can combine general detectors (e.g., EAST, PSENet, DB) to yield
surprisingly high-quality pixel-level results with only upright bounding box
annotations on a variety of datasets (e.g., ICDAR2019-Art, TotalText,
ICDAR2015). For example, without using polygon annotations, PSENet achieves an
80.5% F-score on TotalText [3] (vs. 80.9% of fully supervised counterpart),
31.1% better than training directly with upright bounding box annotations, and
saves 80%+ labeling costs. We hope that PF can provide a new perspective for
text detection to reduce the labeling costs. The code can be found at
https://github.com/weijiawu/Unconstrained-Text-Detection-with-Box-Supervisionand-Dynamic-Self-Traini ng.
Related papers
- Progressive Evolution from Single-Point to Polygon for Scene Text [79.29097971932529]
We introduce Point2Polygon, which can efficiently transform single-points into compact polygons.
Our method uses a coarse-to-fine process, starting with creating anchor points based on recognition confidence, then vertically and horizontally refining the polygon.
In training detectors with polygons generated by our method, we attained 86% of the accuracy relative to training with ground truth (GT); 3) Additionally, the proposed Point2Polygon can be seamlessly integrated to empower single-point spotters to generate polygons.
arXiv Detail & Related papers (2023-12-21T12:08:27Z) - LP-OVOD: Open-Vocabulary Object Detection by Linear Probing [8.202076059391315]
An object detector must identify both seen and unseen classes in test images without labeled examples of the unseen classes in training.
A typical approach for OVOD is to use joint text-image embeddings of CLIP to assign box proposals to their closest text label.
This method has a critical issue: many low-quality boxes, such as over- and under-covered-object boxes, have the same similarity score as high-quality boxes since CLIP is not trained on exact object location information.
We propose a novel method, LP-OVOD, that discards low-quality boxes by training a
arXiv Detail & Related papers (2023-10-26T02:37:08Z) - PBFormer: Capturing Complex Scene Text Shape with Polynomial Band
Transformer [28.52028534365144]
We present PBFormer, an efficient yet powerful scene text detector.
It unifies a transformer with a novel text shape shape Band (PB)
The simple operation can help detect small-scale texts.
arXiv Detail & Related papers (2023-08-29T03:41:27Z) - DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in
Transformer [94.35116535588332]
Transformer-based methods, which predict polygon points or Bezier curve control points to localize texts, are quite popular in scene text detection.
However, the used point label form implies the reading order of humans, which affects the robustness of Transformer model.
We propose DPText-DETR, which directly uses point coordinates as queries and dynamically updates them between decoder layers.
arXiv Detail & Related papers (2022-07-10T15:45:16Z) - Grasp-Oriented Fine-grained Cloth Segmentation without Real Supervision [66.56535902642085]
This paper tackles the problem of fine-grained region detection in deformed clothes using only a depth image.
We define up to 6 semantic regions of varying extent, including edges on the neckline, sleeve cuffs, and hem, plus top and bottom grasping points.
We introduce a U-net based network to segment and label these parts.
We show that training our network solely with synthetic data and the proposed DA yields results competitive with models trained on real data.
arXiv Detail & Related papers (2021-10-06T16:31:20Z) - Dense Supervision Propagation for Weakly Supervised Semantic Segmentation on 3D Point Clouds [59.63231842439687]
We train a semantic point cloud segmentation network with only a small portion of points being labeled.
We propose a cross-sample feature reallocating module to transfer similar features and therefore re-route the gradients across two samples.
Our weakly supervised method with only 10% and 1% of labels can produce compatible results with the fully supervised counterpart.
arXiv Detail & Related papers (2021-07-23T14:34:57Z) - Inter Extreme Points Geodesics for Weakly Supervised Segmentation [2.5772212255258777]
$textitInExtremIS$ is a weakly supervised 3D approach to train a deep image segmentation network.
Our fully-automatic method is trained end-to-end and does not require any test-time annotations.
arXiv Detail & Related papers (2021-07-01T16:16:50Z) - Scene Text Detection with Scribble Lines [59.698806258671105]
We propose to annotate texts by scribble lines instead of polygons for text detection.
It is a general labeling method for texts with various shapes and requires low labeling costs.
Experiments show that the proposed method bridges the performance gap between the weakly labeling method and the original polygon-based labeling methods.
arXiv Detail & Related papers (2020-12-09T13:14:53Z) - Weakly-Supervised Arbitrary-Shaped Text Detection with
Expectation-Maximization Algorithm [35.0126313032923]
We study weakly-supervised arbitrary-shaped text detection for combining various weak supervision forms.
We propose an Expectation-Maximization (EM) based weakly-supervised learning framework to train an accurate arbitrary-shaped text detector.
Our method yields comparable performance to state-of-the-art methods on three benchmarks.
arXiv Detail & Related papers (2020-12-01T11:45:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.