Related papers: PVG: Progressive Vision Graph for Vision Recognition

PVG: Progressive Vision Graph for Vision Recognition

URL: http://arxiv.org/abs/2308.00574v1
Date: Tue, 1 Aug 2023 14:35:29 GMT
Title: PVG: Progressive Vision Graph for Vision Recognition
Authors: Jiafu Wu, Jian Li, Jiangning Zhang, Boshen Zhang, Mingmin Chi, Yabiao Wang, Chengjie Wang
Abstract summary: We propose a Progressive Vision Graph (PVG) architecture for vision recognition task. PVG contains three main components: 1) Progressively Separated Graph Construction (PSGC), 2) Neighbor nodes information aggregation and update module, and 3) Graph error Linear Unit (GraphLU)
Score: 25.752613030302534
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Convolution-based and Transformer-based vision backbone networks process images into the grid or sequence structures, respectively, which are inflexible for capturing irregular objects. Though Vision GNN (ViG) adopts graph-level features for complex images, it has some issues, such as inaccurate neighbor node selection, expensive node information aggregation calculation, and over-smoothing in the deep layers. To address the above problems, we propose a Progressive Vision Graph (PVG) architecture for vision recognition task. Compared with previous works, PVG contains three main components: 1) Progressively Separated Graph Construction (PSGC) to introduce second-order similarity by gradually increasing the channel of the global graph branch and decreasing the channel of local branch as the layer deepens; 2) Neighbor nodes information aggregation and update module by using Max pooling and mathematical Expectation (MaxE) to aggregate rich neighbor information; 3) Graph error Linear Unit (GraphLU) to enhance low-value information in a relaxed form to reduce the compression of image detail information for alleviating the over-smoothing. Extensive experiments on mainstream benchmarks demonstrate the superiority of PVG over state-of-the-art methods, e.g., our PVG-S obtains 83.0% Top-1 accuracy on ImageNet-1K that surpasses GNN-based ViG-S by +0.9 with the parameters reduced by 18.5%, while the largest PVG-B obtains 84.2% that has +0.5 improvement than ViG-B. Furthermore, our PVG-S obtains +1.3 box AP and +0.4 mask AP gains than ViG-S on COCO dataset.

Related papers

How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings [106.3726679697804]
We compare the two most common techniques for mitigating this spectral bias: Fourier feature encodings (FFE) and multigrid parametric encodings (MPE) MPEs are seen as the standard for low dimensional mappings, but MPEs often outperform them and learn representations with higher resolution and finer detail. We prove that MPEs improve a network's performance through the structure of their grid and not their learnable embedding.
arXiv Detail & Related papers (2025-04-18T02:18:08Z)
ClusterViG: Efficient Globally Aware Vision GNNs via Image Partitioning [7.325055402812975]
Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have dominated the field of Computer Vision (CV) Recent works addressing this bottleneck impose constraints on the flexibility of GNNs to build unstructured graphs. We propose a novel method called Dynamic Efficient Graph Convolution (DEGC) for designing efficient and globally aware ViGs.
arXiv Detail & Related papers (2025-01-18T02:59:10Z)
No-Reference Point Cloud Quality Assessment via Graph Convolutional Network [89.12589881881082]
Three-dimensional (3D) point cloud, as an emerging visual media format, is increasingly favored by consumers. Point clouds inevitably suffer from quality degradation and information loss through multimedia communication systems. We propose a novel no-reference PCQA method by using a graph convolutional network (GCN) to characterize the mutual dependencies of multi-view 2D projected image contents.
arXiv Detail & Related papers (2024-11-12T11:39:05Z)
GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs [5.895049552752008]
Vision graph neural networks (ViG) offer a new avenue for exploration in computer vision. A major bottleneck in ViGs is the inefficient k-nearest neighbor (KNN) operation used for graph construction. We propose a new method for designing ViGs, Dynamic Axial Graph Construction (DAGC), which is more efficient than KNN. We also propose a novel CNN-GNN architecture, GreedyViG, which uses DAGC.
arXiv Detail & Related papers (2024-05-10T23:21:16Z)
Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints. A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism. We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z)
GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition [37.02054260449195]
Multi-Label Image Recognition (MLIR) is a challenging task that aims to predict multiple object labels in a single image. We present the first fully graph convolutional model, Group K-nearest neighbor based Graph convolutional Network (GKGNet) Our experiments demonstrate that GKGNet achieves state-of-the-art performance with significantly lower computational costs.
arXiv Detail & Related papers (2023-08-28T07:50:04Z)
Global Context Vision Transformers [78.5346173956383]
We propose global context vision transformer (GC ViT), a novel architecture that enhances parameter and compute utilization for computer vision. We address the lack of the inductive bias in ViTs, and propose to leverage a modified fused inverted residual blocks in our architecture. Our proposed GC ViT achieves state-of-the-art results across image classification, object detection and semantic segmentation tasks.
arXiv Detail & Related papers (2022-06-20T18:42:44Z)
Vision GNN: An Image is Worth Graph of Nodes [49.3335689216822]
We propose to represent the image as a graph structure and introduce a new Vision GNN (ViG) architecture to extract graph-level feature for visual tasks. Based on the graph representation of images, we build our ViG model to transform and exchange information among all the nodes. Extensive experiments on image recognition and object detection tasks demonstrate the superiority of our ViG architecture.
arXiv Detail & Related papers (2022-06-01T07:01:04Z)
Exploiting Neighbor Effect: Conv-Agnostic GNNs Framework for Graphs with Heterophily [58.76759997223951]
We propose a new metric based on von Neumann entropy to re-examine the heterophily problem of GNNs. We also propose a Conv-Agnostic GNN framework (CAGNNs) to enhance the performance of most GNNs on heterophily datasets.
arXiv Detail & Related papers (2022-03-19T14:26:43Z)
SoGCN: Second-Order Graph Convolutional Networks [20.840026487716404]
We show that multi-layer second-order graph convolution (SoGC) is sufficient to attain the ability of expressing spectral filters with arbitrary coefficients. We build our Second-Order Graph Convolutional Networks (SoGCN) with SoGC and design a synthetic dataset to verify its filter fitting capability.
arXiv Detail & Related papers (2021-10-14T03:56:34Z)
Semi-supervised Hyperspectral Image Classification with Graph Clustering Convolutional Networks [41.78245271989529]
We propose a graph convolution network (GCN) based framework for HSI classification. In particular, we first cluster the pixels with similar spectral features into a superpixel and build the graph based on the superpixels of the input HSI. We then partition it into several sub-graphs by pruning the edges with weak weights, so as to strengthen the correlations of nodes with high similarity.
arXiv Detail & Related papers (2020-12-20T14:16:59Z)
GPS-Net: Graph Property Sensing Network for Scene Graph Generation [91.60326359082408]
Scene graph generation (SGG) aims to detect objects in an image along with their pairwise relationships. GPS-Net fully explores three properties for SGG: edge direction information, the difference in priority between nodes, and the long-tailed distribution of relationships. GPS-Net achieves state-of-the-art performance on three popular databases: VG, OI, and VRD by significant gains under various settings and metrics.
arXiv Detail & Related papers (2020-03-29T07:22:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.