Related papers: Conceptual Text Region Network: Cognition-Inspired Accurate Scene Text Detection

Conceptual Text Region Network: Cognition-Inspired Accurate Scene Text Detection

URL: http://arxiv.org/abs/2103.09179v1
Date: Tue, 16 Mar 2021 16:28:33 GMT
Title: Conceptual Text Region Network: Cognition-Inspired Accurate Scene Text Detection
Authors: Chenwei Cui, Liangfu Lu, Zhiyuan Tan, Amir Hussain
Abstract summary: We propose a human cognition-inspired framework, termed Conceptual Text Region Network (CTRNet) CTRNet utilizes Conceptual Text Regions (CTRs), which is a class of cognition-based tools inheriting good mathematical properties, allowing for sophisticated label design. CTRNet achieves state-of-the-art performance on benchmark CTW1500, Total-Text, MSRA-TD500, and ICDAR 2015 datasets, yielding performance gains of up to 2.0%.
Score: 7.716899861923764
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Segmentation-based methods are widely used for scene text detection due to their superiority in describing arbitrary-shaped text instances. However, two major problems still exist: 1) current label generation techniques are mostly empirical and lack theoretical support, discouraging elaborate label design; 2) as a result, most methods rely heavily on text kernel segmentation which is unstable and requires deliberate tuning. To address these challenges, we propose a human cognition-inspired framework, termed, Conceptual Text Region Network (CTRNet). The framework utilizes Conceptual Text Regions (CTRs), which is a class of cognition-based tools inheriting good mathematical properties, allowing for sophisticated label design. Another component of CTRNet is an inference pipeline that, with the help of CTRs, completely omits the need for text kernel segmentation. Compared with previous segmentation-based methods, our approach is not only more interpretable but also more accurate. Experimental results show that CTRNet achieves state-of-the-art performance on benchmark CTW1500, Total-Text, MSRA-TD500, and ICDAR 2015 datasets, yielding performance gains of up to 2.0%. Notably, to the best of our knowledge, CTRNet is among the first detection models to achieve F-measures higher than 85.0% on all four of the benchmarks, with remarkable consistency and stability.

Related papers

T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting [20.21019748095159]
Zero-shot object counting aims to count instances of arbitrary object categories specified by text descriptions. We present T2ICount, a diffusion-based framework that leverages rich prior knowledge and fine-grained visual understanding from pretrained diffusion models.
arXiv Detail & Related papers (2025-02-28T01:09:18Z)
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition [77.28814034644287]
We propose SVTRv2, a CTC model that beats leading EDTRs in both accuracy and inference speed. SVTRv2 introduces novel upgrades to handle text irregularity and utilize linguistic context. We evaluate SVTRv2 in both standard and recent challenging benchmarks.
arXiv Detail & Related papers (2024-11-24T14:21:35Z)
Towards Robust Real-Time Scene Text Detection: From Semantic to Instance Representation Learning [19.856492291263102]
We propose representation learning for real-time scene text detection. For semantic representation learning, we propose global-dense semantic contrast (GDSC) and top-down modeling (TDM) With the proposed GDSC and TDM, the encoder network learns stronger representation without introducing any parameters and computations during inference. The proposed method achieves 87.2% F-measure with 48.2 FPS on Total-Text and 89.6% F-measure with 36.9 FPS on MSRA-TD500.
arXiv Detail & Related papers (2023-08-14T15:14:37Z)
CBNet: A Plug-and-Play Network for Segmentation-Based Scene Text Detection [13.679267531492062]
We propose a Context-aware and Boundary-guided Network (CBN) to tackle these problems. In CBN, a basic text detector is firstly used to predict initial segmentation results. Finally, we introduce a boundary-guided module to expand enhanced text kernels adaptively with only the pixels on the contours.
arXiv Detail & Related papers (2022-12-05T15:15:27Z)
Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field. We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network. An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z)
On Exploring and Improving Robustness of Scene Text Detection Models [20.15225372544634]
We evaluate scene text detection models ICDAR2015-C (IC15-C) and CTW1500-C (CTW-C) We perform a robustness analysis of six key components: pre-training data, backbone, feature fusion module, multi-scale predictions, representation of text instances and loss function. We present a simple yet effective data-based method to destroy the smoothness of text regions by merging background and foreground.
arXiv Detail & Related papers (2021-10-12T02:36:48Z)
K-Net: Towards Unified Image Segmentation [78.32096542571257]
The framework, named K-Net, segments both instances and semantic categories consistently by a group of learnable kernels. K-Net can be trained in an end-to-end manner with bipartite matching, and its training and inference are naturally NMS-free and box-free.
arXiv Detail & Related papers (2021-06-28T17:18:21Z)
PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network [54.03560668182197]
We propose a novel fully convolutional Point Gathering Network (PGNet) for reading arbitrarily-shaped text in real-time. With a PG-CTC decoder, we gather high-level character classification vectors from two-dimensional space and decode them into text symbols without NMS and RoI operations. Experiments prove that the proposed method achieves competitive accuracy, meanwhile significantly improving the running speed.
arXiv Detail & Related papers (2021-04-12T13:27:34Z)
Towards Accurate Scene Text Recognition with Semantic Reasoning Networks [52.86058031919856]
We propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition. GSRM is introduced to capture global semantic context through multi-way parallel transmission. Results on 7 public benchmarks, including regular text, irregular text and non-Latin long text, verify the effectiveness and robustness of the proposed method.
arXiv Detail & Related papers (2020-03-27T09:19:25Z)
Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron. It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information. Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.