A Survey on Graph Neural Networks and Graph Transformers in Computer
Vision: A Task-Oriented Perspective
- URL: http://arxiv.org/abs/2209.13232v1
- Date: Tue, 27 Sep 2022 08:10:14 GMT
- Title: A Survey on Graph Neural Networks and Graph Transformers in Computer
Vision: A Task-Oriented Perspective
- Authors: Chaoqi Chen, Yushuang Wu, Qiyuan Dai, Hong-Yu Zhou, Mutian Xu, Sibei
Yang, Xiaoguang Han, Yizhou Yu
- Abstract summary: Graph Neural Networks (GNNs) have gained momentum in graph representation learning.
graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation.
This paper presents a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective.
- Score: 62.30794059878963
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph Neural Networks (GNNs) have gained momentum in graph representation
learning and boosted the state of the art in a variety of areas, such as data
mining (\emph{e.g.,} social network analysis and recommender systems), computer
vision (\emph{e.g.,} object detection and point cloud learning), and natural
language processing (\emph{e.g.,} relation extraction and sequence learning),
to name a few. With the emergence of Transformers in natural language
processing and computer vision, graph Transformers embed a graph structure into
the Transformer architecture to overcome the limitations of local neighborhood
aggregation while avoiding strict structural inductive biases. In this paper,
we present a comprehensive review of GNNs and graph Transformers in computer
vision from a task-oriented perspective. Specifically, we divide their
applications in computer vision into five categories according to the modality
of input data, \emph{i.e.,} 2D natural images, videos, 3D data, vision +
language, and medical images. In each category, we further divide the
applications according to a set of vision tasks. Such a task-oriented taxonomy
allows us to examine how each task is tackled by different GNN-based approaches
and how well these approaches perform. Based on the necessary preliminaries, we
provide the definitions and challenges of the tasks, in-depth coverage of the
representative approaches, as well as discussions regarding insights,
limitations, and future directions.
Related papers
- Graph Transformers: A Survey [15.68583521879617]
Graph transformers are a recent advancement in machine learning, offering a new class of neural network models for graph-structured data.
This survey provides an in-depth review of recent progress and challenges in graph transformer research.
arXiv Detail & Related papers (2024-07-13T05:15:24Z) - A Survey on Structure-Preserving Graph Transformers [2.5252594834159643]
We provide a comprehensive overview of structure-preserving graph transformers and generalize these methods from the perspective of their design objective.
We also discuss challenges and future directions for graph transformer models to preserve the graph structure and understand the nature of graphs.
arXiv Detail & Related papers (2024-01-29T14:18:09Z) - Graph Neural Networks in Vision-Language Image Understanding: A Survey [6.813036707969848]
2D image understanding is a complex problem within computer vision.
It holds the key to providing human-level scene comprehension.
In recent years graph neural networks (GNNs) have become a standard component of many 2D image understanding pipelines.
arXiv Detail & Related papers (2023-03-07T09:56:23Z) - SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense
Reasoning [61.57887011165744]
multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning.
We propose a Scene Graph Enhanced Image-Text Learning framework to incorporate visual scene graphs in commonsense reasoning.
arXiv Detail & Related papers (2021-12-16T03:16:30Z) - Graph Neural Networks: Methods, Applications, and Opportunities [1.2183405753834562]
This article provides a comprehensive survey of graph neural networks (GNNs) in each learning setting.
The approaches for each learning task are analyzed from both theoretical as well as empirical standpoints.
Various applications and benchmark datasets are also provided, along with open challenges still plaguing the general applicability of GNNs.
arXiv Detail & Related papers (2021-08-24T13:46:19Z) - Transformers in Vision: A Survey [101.07348618962111]
Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence.
Transformers require minimal inductive biases for their design and are naturally suited as set-functions.
This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline.
arXiv Detail & Related papers (2021-01-04T18:57:24Z) - Learning Physical Graph Representations from Visual Scenes [56.7938395379406]
Physical Scene Graphs (PSGs) represent scenes as hierarchical graphs with nodes corresponding intuitively to object parts at different scales, and edges to physical connections between parts.
PSGNet augments standard CNNs by including: recurrent feedback connections to combine low and high-level image information; graph pooling and vectorization operations that convert spatially-uniform feature maps into object-centric graph structures.
We show that PSGNet outperforms alternative self-supervised scene representation algorithms at scene segmentation tasks.
arXiv Detail & Related papers (2020-06-22T16:10:26Z) - GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training [62.73470368851127]
Graph representation learning has emerged as a powerful technique for addressing real-world problems.
We design Graph Contrastive Coding -- a self-supervised graph neural network pre-training framework.
We conduct experiments on three graph learning tasks and ten graph datasets.
arXiv Detail & Related papers (2020-06-17T16:18:35Z) - Graph Representation Learning via Graphical Mutual Information
Maximization [86.32278001019854]
We propose a novel concept, Graphical Mutual Information (GMI), to measure the correlation between input graphs and high-level hidden representations.
We develop an unsupervised learning model trained by maximizing GMI between the input and output of a graph neural encoder.
arXiv Detail & Related papers (2020-02-04T08:33:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.