Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion
- URL: http://arxiv.org/abs/2202.10304v1
- Date: Mon, 21 Feb 2022 15:30:14 GMT
- Title: Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion
- Authors: Minghui Liao, Zhisheng Zou, Zhaoyi Wan, Cong Yao, Xiang Bai
- Abstract summary: segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
- Score: 62.269219152425556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, segmentation-based scene text detection methods have drawn
extensive attention in the scene text detection field, because of their
superiority in detecting the text instances of arbitrary shapes and extreme
aspect ratios, profiting from the pixel-level descriptions. However, the vast
majority of the existing segmentation-based approaches are limited to their
complex post-processing algorithms and the scale robustness of their
segmentation models, where the post-processing algorithms are not only isolated
to the model optimization but also time-consuming and the scale robustness is
usually strengthened by fusing multi-scale feature maps directly. In this
paper, we propose a Differentiable Binarization (DB) module that integrates the
binarization process, one of the most important steps in the post-processing
procedure, into a segmentation network. Optimized along with the proposed DB
module, the segmentation network can produce more accurate results, which
enhances the accuracy of text detection with a simple pipeline. Furthermore, an
efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale
robustness by fusing features of different scales adaptively. By incorporating
the proposed DB and ASF with the segmentation network, our proposed scene text
detector consistently achieves state-of-the-art results, in terms of both
detection accuracy and speed, on five standard benchmarks.
Related papers
- Adaptive Segmentation Network for Scene Text Detection [0.0]
We propose to automatically learn the discriminate segmentation threshold, which distinguishes text pixels from background pixels for segmentation-based scene text detectors.
Besides, we design a Global-information Enhanced Feature Pyramid Network (GE-FPN) for capturing text instances with macro size and extreme aspect ratios.
Finally, together with the proposed threshold learning strategy and text detection structure, we design an Adaptive Network (ASNet) for scene text detection.
arXiv Detail & Related papers (2023-07-27T17:37:56Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - RSCA: Real-time Segmentation-based Context-Aware Scene Text Detection [14.125634725954848]
We propose RSCA: a Real-time-based Context-Aware model for arbitrary-shaped scene text detection.
Based on these strategies, RSCA achieves state-of-the-art performance in both speed and accuracy, without complex label assignments or repeated feature aggregations.
arXiv Detail & Related papers (2021-05-26T18:43:17Z) - ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text
Spotting [108.93803186429017]
End-to-end text-spotting aims to integrate detection and recognition in a unified framework.
Here, we tackle end-to-end text spotting by presenting Adaptive Bezier Curve Network v2 (ABCNet v2)
Our main contributions are four-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve, which, compared with segmentation-based methods, can not only provide structured output but also controllable representation.
Comprehensive experiments conducted on various bilingual (English and Chinese) benchmark datasets demonstrate that ABCNet v2 can achieve state-of-the
arXiv Detail & Related papers (2021-05-08T07:46:55Z) - Learning Robust Feature Representations for Scene Text Detection [0.0]
We present a network architecture derived from the loss to maximize conditional log-likelihood.
By extending the layer of latent variables to multiple layers, the network is able to learn robust features on scale.
In experiments, the proposed algorithm significantly outperforms state-of-the-art methods in terms of both recall and precision.
arXiv Detail & Related papers (2020-05-26T01:06:47Z) - MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT
Prostate Segmentation via Online Sampling [66.01558025094333]
We propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate.
We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network.
Our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss.
arXiv Detail & Related papers (2020-05-15T10:37:02Z) - FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale
Context Aggregation and Feature Space Super-resolution [14.226301825772174]
We introduce a novel and efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP)
It is a lightweight cascaded structure for Convolutional Neural Networks (CNNs) to efficiently leverage context information.
We achieve 68.4% mIoU at 84 fps on the Cityscapes test set with a single Nivida Titan X (Maxwell) GPU card.
arXiv Detail & Related papers (2020-03-09T03:53:57Z) - Depthwise Non-local Module for Fast Salient Object Detection Using a
Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection.
The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.