Arbitrary-Shaped Text Detection withAdaptive Text Region Representation
- URL: http://arxiv.org/abs/2104.00297v1
- Date: Thu, 1 Apr 2021 07:06:34 GMT
- Title: Arbitrary-Shaped Text Detection withAdaptive Text Region Representation
- Authors: Xiufeng Jiang, Shugong Xu (Fellow, IEEE), Shunqing Zhang (Senior
Member, IEEE), and Shan Cao
- Abstract summary: We propose a novel text regionrepresentation method, with a robust pipeline, which can precisely detect dense adjacent text instances.
We demonstrate that our new textregion representation is effective, and that the pipeline can precisely detect closely adjacent text instances ofarbitrary shapes.
- Score: 1.4546816913520362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text detection/localization, as an important task in computer vision, has
witnessed substantialadvancements in methodology and performance with
convolutional neural networks. However, the vastmajority of popular methods use
rectangles or quadrangles to describe text regions. These representationshave
inherent drawbacks, especially relating to dense adjacent text and loose
regional text boundaries,which usually cause difficulty detecting arbitrarily
shaped text. In this paper, we propose a novel text regionrepresentation
method, with a robust pipeline, which can precisely detect dense adjacent text
instances witharbitrary shapes. We consider a text instance to be composed of
an adaptive central text region mask anda corresponding expanding ratio between
the central text region and the full text region. More specifically,our
pipeline generates adaptive central text regions and corresponding expanding
ratios with a proposedtraining strategy, followed by a new proposed
post-processing algorithm which expands central text regionsto the complete
text instance with the corresponding expanding ratios. We demonstrated that our
new textregion representation is effective, and that the pipeline can precisely
detect closely adjacent text instances ofarbitrary shapes. Experimental results
on common datasets demonstrate superior performance o
Related papers
- Region Prompt Tuning: Fine-grained Scene Text Detection Utilizing Region Text Prompt [10.17947324152468]
Region prompt tuning method decomposes region text prompt into individual characters and splits visual feature map into region visual tokens.
This allows a character matches the local features of a token, thereby avoiding the omission of detailed features and fine-grained text.
Our proposed method combines a general score map from the image-text process with a region score map derived from character-token matching.
arXiv Detail & Related papers (2024-09-20T15:24:26Z) - RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection [20.630629383286262]
Open-vocabulary object detection requires solid modeling of the region-semantic relationship.
We propose RTGen to generate scalable open-vocabulary region-text pairs.
arXiv Detail & Related papers (2024-05-30T09:03:23Z) - Text Region Multiple Information Perception Network for Scene Text
Detection [19.574306663095243]
This paper proposes a plug-and-play module called the Region Multiple Information Perception Module (RMIPM) to enhance the detection performance of segmentation-based algorithms.
Specifically, we design an improved module that can perceive various types of information about scene text regions, such as text foreground classification maps, distance maps, direction maps, etc.
arXiv Detail & Related papers (2024-01-18T14:36:51Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - CORE-Text: Improving Scene Text Detection with Contrastive Relational
Reasoning [65.57338873921168]
Localizing text instances in natural scenes is regarded as a fundamental challenge in computer vision.
In this work, we quantitatively analyze the sub-text problem and present a simple yet effective design, COntrastive RElation (CORE) module.
We integrate the CORE module into a two-stage text detector of Mask R-CNN and devise our text detector CORE-Text.
arXiv Detail & Related papers (2021-12-14T16:22:25Z) - RayNet: Real-time Scene Arbitrary-shape Text Detection with Multiple
Rays [84.15123599963239]
We propose a novel detection framework for arbitrary-shape text detection, termed as RayNet.
RayNet uses Center Point Set (CPS) and Ray Distance (RD) to fit text, where CPS is used to determine the text general position and the RD is combined with CPS to compute Ray Points (RP) to localize the text accurate shape.
RayNet achieves impressive performance on existing curved text dataset (CTW1500) and quadrangle text dataset (ICDAR2015)
arXiv Detail & Related papers (2021-04-11T03:03:23Z) - ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene
Text Detection [147.10751375922035]
We propose the ContourNet, which effectively handles false positives and large scale variance of scene texts.
Our method effectively suppresses these false positives by only outputting predictions with high response value in both directions.
arXiv Detail & Related papers (2020-04-10T08:15:23Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.