ReLaText: Exploiting Visual Relationships for Arbitrary-Shaped Scene
Text Detection with Graph Convolutional Networks
- URL: http://arxiv.org/abs/2003.06999v1
- Date: Mon, 16 Mar 2020 03:33:48 GMT
- Title: ReLaText: Exploiting Visual Relationships for Arbitrary-Shaped Scene
Text Detection with Graph Convolutional Networks
- Authors: Chixiang Ma, Lei Sun, Zhuoyao Zhong, Qiang Huo
- Abstract summary: We introduce a new arbitrary-shaped text detection approach named ReLaText.
To demonstrate the effectiveness of this new formulation, we start from using a "link" relationship to address the challenging text-line grouping problem.
Our GCN based text-line grouping approach can achieve better text detection accuracy than previous text-line grouping methods.
- Score: 6.533254660400229
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a new arbitrary-shaped text detection approach named ReLaText by
formulating text detection as a visual relationship detection problem. To
demonstrate the effectiveness of this new formulation, we start from using a
"link" relationship to address the challenging text-line grouping problem
firstly. The key idea is to decompose text detection into two subproblems,
namely detection of text primitives and prediction of link relationships
between nearby text primitive pairs. Specifically, an anchor-free region
proposal network based text detector is first used to detect text primitives of
different scales from different feature maps of a feature pyramid network, from
which a text primitive graph is constructed by linking each pair of nearby text
primitives detected from a same feature map with an edge. Then, a Graph
Convolutional Network (GCN) based link relationship prediction module is used
to prune wrongly-linked edges in the text primitive graph to generate a number
of disjoint subgraphs, each representing a detected text instance. As GCN can
effectively leverage context information to improve link prediction accuracy,
our GCN based text-line grouping approach can achieve better text detection
accuracy than previous text-line grouping methods, especially when dealing with
text instances with large inter-character or very small inter-line spacings.
Consequently, the proposed ReLaText achieves state-of-the-art performance on
five public text detection benchmarks, namely RCTW-17, MSRA-TD500, Total-Text,
CTW1500 and DAST1500.
Related papers
- Leveraging Structure Knowledge and Deep Models for the Detection of Abnormal Handwritten Text [19.05500901000957]
We propose a two-stage detection algorithm that combines structure knowledge and deep models for handwritten text.
A shape regression network trained by a novel semi-supervised contrast training strategy is introduced and the positional relationship between the characters is fully employed.
Experiments on two handwritten text datasets show that the proposed method can greatly improve the detection performance.
arXiv Detail & Related papers (2024-10-15T14:57:10Z) - Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis [52.34110239735265]
We present Text Grouping Adapter (TGA), a module that can enable the utilization of various pre-trained text detectors to learn layout analysis.
Our comprehensive experiments demonstrate that, even with frozen pre-trained models, incorporating our TGA into various pre-trained text detectors and text spotters can achieve superior layout analysis performance.
arXiv Detail & Related papers (2024-05-13T05:48:35Z) - SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting [126.01629300244001]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter v2.
We enhance the relationship between two tasks using novel Recognition Conversion and Recognition Alignment modules.
SwinTextSpotter v2 achieved state-of-the-art performance on various multilingual (English, Chinese, and Vietnamese) benchmarks.
arXiv Detail & Related papers (2024-01-15T12:33:00Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - CORE-Text: Improving Scene Text Detection with Contrastive Relational
Reasoning [65.57338873921168]
Localizing text instances in natural scenes is regarded as a fundamental challenge in computer vision.
In this work, we quantitatively analyze the sub-text problem and present a simple yet effective design, COntrastive RElation (CORE) module.
We integrate the CORE module into a two-stage text detector of Mask R-CNN and devise our text detector CORE-Text.
arXiv Detail & Related papers (2021-12-14T16:22:25Z) - StrokeNet: Stroke Assisted and Hierarchical Graph Reasoning Networks [31.76016966100244]
StrokeNet is proposed to effectively detect the texts by capturing the fine-grained strokes.
Different from existing approaches that represent the text area by a series of points or rectangular boxes, we directly localize strokes of each text instance.
arXiv Detail & Related papers (2021-11-23T08:26:42Z) - Bidirectional Regression for Arbitrary-Shaped Text Detection [16.30976392505236]
This paper presents a novel text instance expression which integrates both foreground and background information into the pipeline.
A corresponding post-processing algorithm is also designed to sequentially combine the four prediction results and reconstruct the text instance accurately.
We evaluate our method on several challenging scene text benchmarks, including both curved and multi-oriented text datasets.
arXiv Detail & Related papers (2021-07-13T14:29:09Z) - Rethinking Text Segmentation: A Novel Dataset and A Text-Specific
Refinement Approach [34.63444886780274]
Text segmentation is a prerequisite in real-world text-related tasks.
We introduce Text Refinement Network (TexRNet), a novel text segmentation approach.
TexRNet consistently improves text segmentation performance by nearly 2% compared to other state-of-the-art segmentation methods.
arXiv Detail & Related papers (2020-11-27T22:50:09Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.