Conceptual Text Region Network: Cognition-Inspired Accurate Scene Text
Detection
- URL: http://arxiv.org/abs/2103.09179v1
- Date: Tue, 16 Mar 2021 16:28:33 GMT
- Title: Conceptual Text Region Network: Cognition-Inspired Accurate Scene Text
Detection
- Authors: Chenwei Cui, Liangfu Lu, Zhiyuan Tan, Amir Hussain
- Abstract summary: We propose a human cognition-inspired framework, termed Conceptual Text Region Network (CTRNet)
CTRNet utilizes Conceptual Text Regions (CTRs), which is a class of cognition-based tools inheriting good mathematical properties, allowing for sophisticated label design.
CTRNet achieves state-of-the-art performance on benchmark CTW1500, Total-Text, MSRA-TD500, and ICDAR 2015 datasets, yielding performance gains of up to 2.0%.
- Score: 7.716899861923764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Segmentation-based methods are widely used for scene text detection due to
their superiority in describing arbitrary-shaped text instances. However, two
major problems still exist: 1) current label generation techniques are mostly
empirical and lack theoretical support, discouraging elaborate label design; 2)
as a result, most methods rely heavily on text kernel segmentation which is
unstable and requires deliberate tuning. To address these challenges, we
propose a human cognition-inspired framework, termed, Conceptual Text Region
Network (CTRNet). The framework utilizes Conceptual Text Regions (CTRs), which
is a class of cognition-based tools inheriting good mathematical properties,
allowing for sophisticated label design. Another component of CTRNet is an
inference pipeline that, with the help of CTRs, completely omits the need for
text kernel segmentation. Compared with previous segmentation-based methods,
our approach is not only more interpretable but also more accurate.
Experimental results show that CTRNet achieves state-of-the-art performance on
benchmark CTW1500, Total-Text, MSRA-TD500, and ICDAR 2015 datasets, yielding
performance gains of up to 2.0%. Notably, to the best of our knowledge, CTRNet
is among the first detection models to achieve F-measures higher than 85.0% on
all four of the benchmarks, with remarkable consistency and stability.
Related papers
- Towards Robust Real-Time Scene Text Detection: From Semantic to Instance
Representation Learning [19.856492291263102]
We propose representation learning for real-time scene text detection.
For semantic representation learning, we propose global-dense semantic contrast (GDSC) and top-down modeling (TDM)
With the proposed GDSC and TDM, the encoder network learns stronger representation without introducing any parameters and computations during inference.
The proposed method achieves 87.2% F-measure with 48.2 FPS on Total-Text and 89.6% F-measure with 36.9 FPS on MSRA-TD500.
arXiv Detail & Related papers (2023-08-14T15:14:37Z) - CBNet: A Plug-and-Play Network for Segmentation-Based Scene Text Detection [13.679267531492062]
We propose a Context-aware and Boundary-guided Network (CBN) to tackle these problems.
In CBN, a basic text detector is firstly used to predict initial segmentation results.
Finally, we introduce a boundary-guided module to expand enhanced text kernels adaptively with only the pixels on the contours.
arXiv Detail & Related papers (2022-12-05T15:15:27Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z) - On Exploring and Improving Robustness of Scene Text Detection Models [20.15225372544634]
We evaluate scene text detection models ICDAR2015-C (IC15-C) and CTW1500-C (CTW-C)
We perform a robustness analysis of six key components: pre-training data, backbone, feature fusion module, multi-scale predictions, representation of text instances and loss function.
We present a simple yet effective data-based method to destroy the smoothness of text regions by merging background and foreground.
arXiv Detail & Related papers (2021-10-12T02:36:48Z) - K-Net: Towards Unified Image Segmentation [78.32096542571257]
The framework, named K-Net, segments both instances and semantic categories consistently by a group of learnable kernels.
K-Net can be trained in an end-to-end manner with bipartite matching, and its training and inference are naturally NMS-free and box-free.
arXiv Detail & Related papers (2021-06-28T17:18:21Z) - PAN++: Towards Efficient and Accurate End-to-End Spotting of
Arbitrarily-Shaped Text [85.7020597476857]
We propose an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes.
PAN++ is based on the kernel representation that reformulates a text line as a text kernel (central region) surrounded by peripheral pixels.
As a pixel-based representation, the kernel representation can be predicted by a single fully convolutional network, which is very friendly to real-time applications.
arXiv Detail & Related papers (2021-05-02T07:04:30Z) - PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering
Network [54.03560668182197]
We propose a novel fully convolutional Point Gathering Network (PGNet) for reading arbitrarily-shaped text in real-time.
With a PG-CTC decoder, we gather high-level character classification vectors from two-dimensional space and decode them into text symbols without NMS and RoI operations.
Experiments prove that the proposed method achieves competitive accuracy, meanwhile significantly improving the running speed.
arXiv Detail & Related papers (2021-04-12T13:27:34Z) - Towards Accurate Scene Text Recognition with Semantic Reasoning Networks [52.86058031919856]
We propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition.
GSRM is introduced to capture global semantic context through multi-way parallel transmission.
Results on 7 public benchmarks, including regular text, irregular text and non-Latin long text, verify the effectiveness and robustness of the proposed method.
arXiv Detail & Related papers (2020-03-27T09:19:25Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.