Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
- URL: http://arxiv.org/abs/2504.08710v1
- Date: Fri, 11 Apr 2025 17:20:26 GMT
- Title: Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
- Authors: Joshua Fixelle,
- Abstract summary: We propose the Hypergraph Vision Transformer (HgVT), which incorporates a hierarchical bipartite hypergraph structure into the vision transformer framework.<n>HgVT achieves strong performance on image classification and retrieval, positioning it as an efficient framework for semantic-based vision tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent advancements in computer vision have highlighted the scalability of Vision Transformers (ViTs) across various tasks, yet challenges remain in balancing adaptability, computational efficiency, and the ability to model higher-order relationships. Vision Graph Neural Networks (ViGs) offer an alternative by leveraging graph-based methodologies but are hindered by the computational bottlenecks of clustering algorithms used for edge generation. To address these issues, we propose the Hypergraph Vision Transformer (HgVT), which incorporates a hierarchical bipartite hypergraph structure into the vision transformer framework to capture higher-order semantic relationships while maintaining computational efficiency. HgVT leverages population and diversity regularization for dynamic hypergraph construction without clustering, and expert edge pooling to enhance semantic extraction and facilitate graph-based image retrieval. Empirical results demonstrate that HgVT achieves strong performance on image classification and retrieval, positioning it as an efficient framework for semantic-based vision tasks.
Related papers
- DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition [7.762533819978473]
We propose a novel vision architecture, termed Dilated Vision HyperGraph Neural Network (DVHGNN)<n>DVHGNN is designed to leverage multi-scale hypergraph to efficiently capture high-order correlations among objects.<n>Our DVHGNN-S achieves an impressive top-1 accuracy of 83.1% on ImageNet-1K, surpassing ViG-S by +1.0% and ViHGNN-S by +0.6%.
arXiv Detail & Related papers (2025-03-19T03:45:23Z) - ClusterViG: Efficient Globally Aware Vision GNNs via Image Partitioning [7.325055402812975]
Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have dominated the field of Computer Vision (CV)<n>Recent works addressing this bottleneck impose constraints on the flexibility of GNNs to build unstructured graphs.<n>We propose a novel method called Dynamic Efficient Graph Convolution (DEGC) for designing efficient and globally aware ViGs.
arXiv Detail & Related papers (2025-01-18T02:59:10Z) - GIMS: Image Matching System Based on Adaptive Graph Construction and Graph Neural Network [7.711922592226936]
We introduce an innovative adaptive graph construction method that utilizes a filtering mechanism based on distance and dynamic threshold similarity.<n>We also combine the global awareness capabilities of Transformers to enhance the model's representation of graph structures.<n>Our system achieves an average improvement of 3.8x-40.3x in overall matching performance.
arXiv Detail & Related papers (2024-12-24T07:05:55Z) - SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers [0.0]
Vision Transformers (ViTs) have redefined image classification by leveraging self-attention to capture complex patterns and long-range dependencies between image patches.<n>A key challenge for ViTs is efficiently incorporating multi-scale feature representations, which is inherent in convolutional neural networks (CNNs) through their hierarchical structure.<n>We propose SAG-ViT, a Scale-Aware Graph Attention ViT that integrates multi-scale feature capabilities of CNNs, representational power of ViTs, graph-attended patching to enable richer contextual representation.
arXiv Detail & Related papers (2024-11-14T13:15:27Z) - UnSeGArmaNet: Unsupervised Image Segmentation using Graph Neural Networks with Convolutional ARMA Filters [10.940349832919699]
We propose an unsupervised segmentation framework with a pre-trained ViT.
By harnessing the graph structure inherent within the image, the proposed method achieves a notable performance in segmentation.
The proposed method provides state-of-the-art performance (even comparable to supervised methods) on benchmark image segmentation datasets.
arXiv Detail & Related papers (2024-10-08T15:10:09Z) - SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs.
We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning.
It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - Hypergraph Transformer for Semi-Supervised Classification [50.92027313775934]
We propose a novel hypergraph learning framework, HyperGraph Transformer (HyperGT)
HyperGT uses a Transformer-based neural network architecture to effectively consider global correlations among all nodes and hyperedges.
It achieves comprehensive hypergraph representation learning by effectively incorporating global interactions while preserving local connectivity patterns.
arXiv Detail & Related papers (2023-12-18T17:50:52Z) - T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining.
Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z) - Graph Reasoning Transformer for Image Parsing [67.76633142645284]
We propose a novel Graph Reasoning Transformer (GReaT) for image parsing to enable image patches to interact following a relation reasoning pattern.
Compared to the conventional transformer, GReaT has higher interaction efficiency and a more purposeful interaction pattern.
Results show that GReaT achieves consistent performance gains with slight computational overheads on the state-of-the-art transformer baselines.
arXiv Detail & Related papers (2022-09-20T08:21:37Z) - Spectral Graph Convolutional Networks With Lifting-based Adaptive Graph
Wavelets [81.63035727821145]
Spectral graph convolutional networks (SGCNs) have been attracting increasing attention in graph representation learning.
We propose a novel class of spectral graph convolutional networks that implement graph convolutions with adaptive graph wavelets.
arXiv Detail & Related papers (2021-08-03T17:57:53Z) - Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model [58.17021225930069]
We explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA)
We propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly.
Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works.
arXiv Detail & Related papers (2021-05-31T16:20:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.