StrokeNet: Stroke Assisted and Hierarchical Graph Reasoning Networks
- URL: http://arxiv.org/abs/2111.11718v1
- Date: Tue, 23 Nov 2021 08:26:42 GMT
- Title: StrokeNet: Stroke Assisted and Hierarchical Graph Reasoning Networks
- Authors: Lei Li, Kai Fan and Chun Yuan
- Abstract summary: StrokeNet is proposed to effectively detect the texts by capturing the fine-grained strokes.
Different from existing approaches that represent the text area by a series of points or rectangular boxes, we directly localize strokes of each text instance.
- Score: 31.76016966100244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene text detection is still a challenging task, as there may be extremely
small or low-resolution strokes, and close or arbitrary-shaped texts. In this
paper, StrokeNet is proposed to effectively detect the texts by capturing the
fine-grained strokes, and infer structural relations between the hierarchical
representation in the graph. Different from existing approaches that represent
the text area by a series of points or rectangular boxes, we directly localize
strokes of each text instance through Stroke Assisted Prediction Network
(SAPN). Besides, Hierarchical Relation Graph Network (HRGN) is adopted to
perform relational reasoning and predict the likelihood of linkages,
effectively splitting the close text instances and grouping node classification
results into arbitrary-shaped text region. We introduce a novel dataset with
stroke-level annotations, namely SynthStroke, for offline pre-training of our
model. Experiments on wide-ranging benchmarks verify the State-of-the-Art
performance of our method. Our dataset and code will be available.
Related papers
- Pretraining Language Models with Text-Attributed Heterogeneous Graphs [28.579509154284448]
We present a new pretraining framework for Language Models (LMs) that explicitly considers the topological and heterogeneous information in Text-Attributed Heterogeneous Graphs (TAHGs)
We propose a topology-aware pretraining task to predict nodes involved in the context graph by jointly optimizing an LM and an auxiliary heterogeneous graph neural network.
We conduct link prediction and node classification tasks on three datasets from various domains.
arXiv Detail & Related papers (2023-10-19T08:41:21Z) - ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings [20.25180279903009]
We propose Contrastive Graph-Text pretraining (ConGraT) for jointly learning separate representations of texts and nodes in a text-attributed graph (TAG)
Our method trains a language model (LM) and a graph neural network (GNN) to align their representations in a common latent space using a batch-wise contrastive learning objective inspired by CLIP.
Experiments demonstrate that ConGraT outperforms baselines on various downstream tasks, including node and text category classification, link prediction, and language modeling.
arXiv Detail & Related papers (2023-05-23T17:53:30Z) - Text Reading Order in Uncontrolled Conditions by Sparse Graph
Segmentation [71.40119152422295]
We propose a lightweight, scalable and generalizable approach to identify text reading order.
The model is language-agnostic and runs effectively across multi-language datasets.
It is small enough to be deployed on virtually any platform including mobile devices.
arXiv Detail & Related papers (2023-05-04T06:21:00Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - A Robust Stacking Framework for Training Deep Graph Models with
Multifaceted Node Features [61.92791503017341]
Graph Neural Networks (GNNs) with numerical node features and graph structure as inputs have demonstrated superior performance on various supervised learning tasks with graph data.
The best models for such data types in most standard supervised learning settings with IID (non-graph) data are not easily incorporated into a GNN.
Here we propose a robust stacking framework that fuses graph-aware propagation with arbitrary models intended for IID data.
arXiv Detail & Related papers (2022-06-16T22:46:33Z) - Hierarchical Heterogeneous Graph Representation Learning for Short Text
Classification [60.233529926965836]
We propose a new method called SHINE, which is based on graph neural network (GNN) for short text classification.
First, we model the short text dataset as a hierarchical heterogeneous graph consisting of word-level component graphs.
Then, we dynamically learn a short document graph that facilitates effective label propagation among similar short texts.
arXiv Detail & Related papers (2021-10-30T05:33:05Z) - Towards Efficient Scene Understanding via Squeeze Reasoning [71.1139549949694]
We propose a novel framework called Squeeze Reasoning.
Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector.
We show that our approach can be modularized as an end-to-end trained block and can be easily plugged into existing networks.
arXiv Detail & Related papers (2020-11-06T12:17:01Z) - Deep Relational Reasoning Graph Network for Arbitrary Shape Text
Detection [20.244378408779554]
We propose a novel unified relational reasoning graph network for arbitrary shape text detection.
An innovative local graph bridges a text proposal model via CNN and a deep relational reasoning network via Graph Convolutional Network (GCN)
Experiments on public available datasets demonstrate the state-of-the-art performance of our method.
arXiv Detail & Related papers (2020-03-17T01:50:07Z) - ReLaText: Exploiting Visual Relationships for Arbitrary-Shaped Scene
Text Detection with Graph Convolutional Networks [6.533254660400229]
We introduce a new arbitrary-shaped text detection approach named ReLaText.
To demonstrate the effectiveness of this new formulation, we start from using a "link" relationship to address the challenging text-line grouping problem.
Our GCN based text-line grouping approach can achieve better text detection accuracy than previous text-line grouping methods.
arXiv Detail & Related papers (2020-03-16T03:33:48Z) - Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning [72.52804406378023]
Cross-modal retrieval between videos and texts has attracted growing attentions due to the rapid emergence of videos on the web.
To improve fine-grained video-text retrieval, we propose a Hierarchical Graph Reasoning model, which decomposes video-text matching into global-to-local levels.
arXiv Detail & Related papers (2020-03-01T03:44:19Z) - PuzzleNet: Scene Text Detection by Segment Context Graph Learning [9.701699882807251]
We propose a novel decomposition-based method, termed Puzzle Networks (PuzzleNet), to address the challenging scene text detection task.
By building segments as context graphs, MSGCN effectively employs segment context to predict combinations of segments.
Our method can achieve better or comparable performance than current state-of-the-arts, which is beneficial from the exploitation of segment context graph.
arXiv Detail & Related papers (2020-02-26T09:21:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.