PuzzleNet: Scene Text Detection by Segment Context Graph Learning
- URL: http://arxiv.org/abs/2002.11371v1
- Date: Wed, 26 Feb 2020 09:21:05 GMT
- Title: PuzzleNet: Scene Text Detection by Segment Context Graph Learning
- Authors: Hao Liu, Antai Guo, Deqiang Jiang, Yiqing Hu, Bo Ren
- Abstract summary: We propose a novel decomposition-based method, termed Puzzle Networks (PuzzleNet), to address the challenging scene text detection task.
By building segments as context graphs, MSGCN effectively employs segment context to predict combinations of segments.
Our method can achieve better or comparable performance than current state-of-the-arts, which is beneficial from the exploitation of segment context graph.
- Score: 9.701699882807251
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, a series of decomposition-based scene text detection methods has
achieved impressive progress by decomposing challenging text regions into
pieces and linking them in a bottom-up manner. However, most of them merely
focus on linking independent text pieces while the context information is
underestimated. In the puzzle game, the solver often put pieces together in a
logical way according to the contextual information of each piece, in order to
arrive at the correct solution. Inspired by it, we propose a novel
decomposition-based method, termed Puzzle Networks (PuzzleNet), to address the
challenging scene text detection task in this work. PuzzleNet consists of the
Segment Proposal Network (SPN) that predicts the candidate text segments
fitting arbitrary shape of text region, and the two-branch Multiple-Similarity
Graph Convolutional Network (MSGCN) that models both appearance and geometry
correlations between each segment to its contextual ones. By building segments
as context graphs, MSGCN effectively employs segment context to predict
combinations of segments. Final detections of polygon shape are produced by
merging segments according to the predicted combinations. Evaluations on three
benchmark datasets, ICDAR15, MSRA-TD500 and SCUT-CTW1500, have demonstrated
that our method can achieve better or comparable performance than current
state-of-the-arts, which is beneficial from the exploitation of segment context
graph.
Related papers
- TextPSG: Panoptic Scene Graph Generation from Textual Descriptions [78.1140391134517]
We study a new problem of Panoptic Scene Graph Generation from Purely Textual Descriptions (Caption-to-PSG)
The key idea is to leverage the large collection of free image-caption data on the Web alone to generate panoptic scene graphs.
We propose a new framework TextPSG consisting of four modules, i.e., a region grouper, an entity grounder, a segment merger, and a label generator.
arXiv Detail & Related papers (2023-10-10T22:36:15Z) - Self-supervised Scene Text Segmentation with Object-centric Layered
Representations Augmented by Text Regions [22.090074821554754]
We propose a self-supervised scene text segmentation algorithm with layered decoupling of representations derived from the object-centric manner to segment images into texts and background.
On several public scene text datasets, our method outperforms the state-of-the-art unsupervised segmentation algorithms.
arXiv Detail & Related papers (2023-08-25T05:00:05Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Text Reading Order in Uncontrolled Conditions by Sparse Graph
Segmentation [71.40119152422295]
We propose a lightweight, scalable and generalizable approach to identify text reading order.
The model is language-agnostic and runs effectively across multi-language datasets.
It is small enough to be deployed on virtually any platform including mobile devices.
arXiv Detail & Related papers (2023-05-04T06:21:00Z) - Towards End-to-End Unified Scene Text Detection and Layout Analysis [60.68100769639923]
We introduce the task of unified scene text detection and layout analysis.
The first hierarchical scene text dataset is introduced to enable this novel research task.
We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way.
arXiv Detail & Related papers (2022-03-28T23:35:45Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z) - StrokeNet: Stroke Assisted and Hierarchical Graph Reasoning Networks [31.76016966100244]
StrokeNet is proposed to effectively detect the texts by capturing the fine-grained strokes.
Different from existing approaches that represent the text area by a series of points or rectangular boxes, we directly localize strokes of each text instance.
arXiv Detail & Related papers (2021-11-23T08:26:42Z) - Comprehensive Studies for Arbitrary-shape Scene Text Detection [78.50639779134944]
We propose a unified framework for the bottom-up based scene text detection methods.
Under the unified framework, we ensure the consistent settings for non-core modules.
With the comprehensive investigations and elaborate analyses, it reveals the advantages and disadvantages of previous models.
arXiv Detail & Related papers (2021-07-25T13:18:55Z) - Parts2Words: Learning Joint Embedding of Point Clouds and Texts by
Bidirectional Matching between Parts and Words [32.47815081044594]
We propose to learn joint embedding of point clouds and texts by bidirectional matching between parts from shapes and words from texts.
Specifically, we first segment the point clouds into parts, and then leverage optimal transport method to match parts and words in an optimized feature space.
Experiments demonstrate that our method achieves a significant improvement in accuracy over the SOTAs on multi-modal retrieval tasks.
arXiv Detail & Related papers (2021-07-05T08:55:34Z) - All You Need is a Second Look: Towards Arbitrary-Shaped Text Detection [39.17648241471479]
In this paper, we propose a two-stage segmentation-based detector, termed as NASK (Need A Second looK), for arbitrary-shaped text detection.
arXiv Detail & Related papers (2021-06-24T01:44:10Z) - Deep Relational Reasoning Graph Network for Arbitrary Shape Text
Detection [20.244378408779554]
We propose a novel unified relational reasoning graph network for arbitrary shape text detection.
An innovative local graph bridges a text proposal model via CNN and a deep relational reasoning network via Graph Convolutional Network (GCN)
Experiments on public available datasets demonstrate the state-of-the-art performance of our method.
arXiv Detail & Related papers (2020-03-17T01:50:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.