PVG: Progressive Vision Graph for Vision Recognition
- URL: http://arxiv.org/abs/2308.00574v1
- Date: Tue, 1 Aug 2023 14:35:29 GMT
- Title: PVG: Progressive Vision Graph for Vision Recognition
- Authors: Jiafu Wu, Jian Li, Jiangning Zhang, Boshen Zhang, Mingmin Chi, Yabiao
Wang, Chengjie Wang
- Abstract summary: We propose a Progressive Vision Graph (PVG) architecture for vision recognition task.
PVG contains three main components: 1) Progressively Separated Graph Construction (PSGC), 2) Neighbor nodes information aggregation and update module, and 3) Graph error Linear Unit (GraphLU)
- Score: 25.752613030302534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolution-based and Transformer-based vision backbone networks process
images into the grid or sequence structures, respectively, which are inflexible
for capturing irregular objects. Though Vision GNN (ViG) adopts graph-level
features for complex images, it has some issues, such as inaccurate neighbor
node selection, expensive node information aggregation calculation, and
over-smoothing in the deep layers. To address the above problems, we propose a
Progressive Vision Graph (PVG) architecture for vision recognition task.
Compared with previous works, PVG contains three main components: 1)
Progressively Separated Graph Construction (PSGC) to introduce second-order
similarity by gradually increasing the channel of the global graph branch and
decreasing the channel of local branch as the layer deepens; 2) Neighbor nodes
information aggregation and update module by using Max pooling and mathematical
Expectation (MaxE) to aggregate rich neighbor information; 3) Graph error
Linear Unit (GraphLU) to enhance low-value information in a relaxed form to
reduce the compression of image detail information for alleviating the
over-smoothing. Extensive experiments on mainstream benchmarks demonstrate the
superiority of PVG over state-of-the-art methods, e.g., our PVG-S obtains 83.0%
Top-1 accuracy on ImageNet-1K that surpasses GNN-based ViG-S by +0.9 with the
parameters reduced by 18.5%, while the largest PVG-B obtains 84.2% that has
+0.5 improvement than ViG-B. Furthermore, our PVG-S obtains +1.3 box AP and
+0.4 mask AP gains than ViG-S on COCO dataset.
Related papers
- GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs [5.895049552752008]
Vision graph neural networks (ViG) offer a new avenue for exploration in computer vision.
A major bottleneck in ViGs is the inefficient k-nearest neighbor (KNN) operation used for graph construction.
We propose a new method for designing ViGs, Dynamic Axial Graph Construction (DAGC), which is more efficient than KNN.
We also propose a novel CNN-GNN architecture, GreedyViG, which uses DAGC.
arXiv Detail & Related papers (2024-05-10T23:21:16Z) - Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints.
A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism.
We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z) - GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition [37.02054260449195]
Multi-Label Image Recognition (MLIR) is a challenging task that aims to predict multiple object labels in a single image.
We present the first fully graph convolutional model, Group K-nearest neighbor based Graph convolutional Network (GKGNet)
Our experiments demonstrate that GKGNet achieves state-of-the-art performance with significantly lower computational costs.
arXiv Detail & Related papers (2023-08-28T07:50:04Z) - Global Context Vision Transformers [78.5346173956383]
We propose global context vision transformer (GC ViT), a novel architecture that enhances parameter and compute utilization for computer vision.
We address the lack of the inductive bias in ViTs, and propose to leverage a modified fused inverted residual blocks in our architecture.
Our proposed GC ViT achieves state-of-the-art results across image classification, object detection and semantic segmentation tasks.
arXiv Detail & Related papers (2022-06-20T18:42:44Z) - Vision GNN: An Image is Worth Graph of Nodes [49.3335689216822]
We propose to represent the image as a graph structure and introduce a new Vision GNN (ViG) architecture to extract graph-level feature for visual tasks.
Based on the graph representation of images, we build our ViG model to transform and exchange information among all the nodes.
Extensive experiments on image recognition and object detection tasks demonstrate the superiority of our ViG architecture.
arXiv Detail & Related papers (2022-06-01T07:01:04Z) - Exploiting Neighbor Effect: Conv-Agnostic GNNs Framework for Graphs with
Heterophily [58.76759997223951]
We propose a new metric based on von Neumann entropy to re-examine the heterophily problem of GNNs.
We also propose a Conv-Agnostic GNN framework (CAGNNs) to enhance the performance of most GNNs on heterophily datasets.
arXiv Detail & Related papers (2022-03-19T14:26:43Z) - SoGCN: Second-Order Graph Convolutional Networks [20.840026487716404]
We show that multi-layer second-order graph convolution (SoGC) is sufficient to attain the ability of expressing spectral filters with arbitrary coefficients.
We build our Second-Order Graph Convolutional Networks (SoGCN) with SoGC and design a synthetic dataset to verify its filter fitting capability.
arXiv Detail & Related papers (2021-10-14T03:56:34Z) - Semi-supervised Hyperspectral Image Classification with Graph Clustering
Convolutional Networks [41.78245271989529]
We propose a graph convolution network (GCN) based framework for HSI classification.
In particular, we first cluster the pixels with similar spectral features into a superpixel and build the graph based on the superpixels of the input HSI.
We then partition it into several sub-graphs by pruning the edges with weak weights, so as to strengthen the correlations of nodes with high similarity.
arXiv Detail & Related papers (2020-12-20T14:16:59Z) - Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs.
In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings.
We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z) - GPS-Net: Graph Property Sensing Network for Scene Graph Generation [91.60326359082408]
Scene graph generation (SGG) aims to detect objects in an image along with their pairwise relationships.
GPS-Net fully explores three properties for SGG: edge direction information, the difference in priority between nodes, and the long-tailed distribution of relationships.
GPS-Net achieves state-of-the-art performance on three popular databases: VG, OI, and VRD by significant gains under various settings and metrics.
arXiv Detail & Related papers (2020-03-29T07:22:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.