HVT: A Comprehensive Vision Framework for Learning in Non-Euclidean Space
- URL: http://arxiv.org/abs/2409.16897v2
- Date: Thu, 26 Sep 2024 02:39:02 GMT
- Title: HVT: A Comprehensive Vision Framework for Learning in Non-Euclidean Space
- Authors: Jacob Fein-Ashley, Ethan Feng, Minh Pham,
- Abstract summary: This paper introduces the Hyperbolic Vision Transformer (HVT), a novel extension of the Vision Transformer (ViT) that integrates hyperbolic geometry.
While traditional ViTs operate in Euclidean space, our method enhances the self-attention mechanism by leveraging hyperbolic distance and M"obius transformations.
We present rigorous mathematical formulations, showing how hyperbolic geometry can be incorporated into attention layers, feed-forward networks, and optimization.
- Score: 1.1858475445768824
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data representation in non-Euclidean spaces has proven effective for capturing hierarchical and complex relationships in real-world datasets. Hyperbolic spaces, in particular, provide efficient embeddings for hierarchical structures. This paper introduces the Hyperbolic Vision Transformer (HVT), a novel extension of the Vision Transformer (ViT) that integrates hyperbolic geometry. While traditional ViTs operate in Euclidean space, our method enhances the self-attention mechanism by leveraging hyperbolic distance and M\"obius transformations. This enables more effective modeling of hierarchical and relational dependencies in image data. We present rigorous mathematical formulations, showing how hyperbolic geometry can be incorporated into attention layers, feed-forward networks, and optimization. We offer improved performance for image classification using the ImageNet dataset.
Related papers
- AdS-GNN -- a Conformally Equivariant Graph Neural Network [9.96018310438305]
We build a neural network that is equivariant under general conformal transformations.<n>We validate our model on tasks from computer vision and statistical physics.
arXiv Detail & Related papers (2025-05-19T09:08:52Z) - Cross Paradigm Representation and Alignment Transformer for Image Deraining [40.66823807648992]
We propose a novel Cross Paradigm Representation and Alignment Transformer (CPRAformer)
Its core idea is the hierarchical representation and alignment, leveraging the strengths of both paradigms to aid image reconstruction.
We use two types of self-attention in the Transformer blocks: sparse prompt channel self-attention (SPC-SA) and spatial pixel refinement self-attention (SPR-SA)
arXiv Detail & Related papers (2025-04-23T06:44:46Z) - Hypergraph Vision Transformers: Images are More than Nodes, More than Edges [0.0]
We propose the Hypergraph Vision Transformer (HgVT), which incorporates a hierarchical bipartite hypergraph structure into the vision transformer framework.
HgVT achieves strong performance on image classification and retrieval, positioning it as an efficient framework for semantic-based vision tasks.
arXiv Detail & Related papers (2025-04-11T17:20:26Z) - Parallel Sequence Modeling via Generalized Spatial Propagation Network [80.66202109995726]
Generalized Spatial Propagation Network (GSPN) is a new attention mechanism for optimized vision tasks that inherently captures 2D spatial structures.
GSPN overcomes limitations by directly operating on spatially coherent image data and forming dense pairwise connections through a line-scan approach.
GSPN achieves superior spatial fidelity and state-of-the-art performance in vision tasks, including ImageNet classification, class-guided image generation, and text-to-image generation.
arXiv Detail & Related papers (2025-01-21T18:56:19Z) - Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.
We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Dynamic Hyperbolic Attention Network for Fine Hand-object Reconstruction [76.5549647815413]
We propose the first precise hand-object reconstruction method in hyperbolic space, namely Dynamic Hyperbolic Attention Network (DHANet)
Our method learns mesh features with rich geometry-image multi-modal information and models better hand-object interaction.
arXiv Detail & Related papers (2023-09-06T13:00:10Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - Complex Hyperbolic Knowledge Graph Embeddings with Fast Fourier
Transform [29.205221688430733]
The choice of geometric space for knowledge graph (KG) embeddings can have significant effects on the performance of KG completion tasks.
Recent explorations of the complex hyperbolic geometry further improved the hyperbolic embeddings for capturing a variety of hierarchical structures.
This paper aims to utilize the representation capacity of the complex hyperbolic geometry in multi-relational KG embeddings.
arXiv Detail & Related papers (2022-11-07T15:46:00Z) - AMCAD: Adaptive Mixed-Curvature Representation based Advertisement
Retrieval System [18.07821800367287]
We present a web-scale Adaptive Mixed-Curvature ADvertisement retrieval system (AMCAD) to automatically capture the complex and heterogeneous graph structures in non-Euclidean spaces.
To deploy AMCAD in Taobao, one of the largest ecommerce platforms with hundreds of million users, we design an efficient two-layer online retrieval framework.
arXiv Detail & Related papers (2022-03-28T12:29:30Z) - Enhancing Hyperbolic Graph Embeddings via Contrastive Learning [7.901082408569372]
We propose a novel Hyperbolic Graph Contrastive Learning (HGCL) framework which learns node representations through multiple hyperbolic spaces.
Experimental results on multiple real-world datasets demonstrate the superiority of the proposed HGCL.
arXiv Detail & Related papers (2022-01-21T06:10:05Z) - Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN
Design [8.250374560598493]
Hyperbolic neural networks have been popular in the recent past due to their ability to represent hierarchical data sets effectively and efficiently.
The challenge in developing these networks lies in the nonlinearity of the embedding space namely, the Hyperbolic space.
We present a novel fully hyperbolic neural network which uses the concept of projections (embeddings) followed by an intrinsic aggregation and a nonlinearity all within the hyperbolic space.
arXiv Detail & Related papers (2021-12-03T03:20:27Z) - Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model [58.17021225930069]
We explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA)
We propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly.
Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works.
arXiv Detail & Related papers (2021-05-31T16:20:03Z) - Spatial-Spectral Clustering with Anchor Graph for Hyperspectral Image [88.60285937702304]
This paper proposes a novel unsupervised approach called spatial-spectral clustering with anchor graph (SSCAG) for HSI data clustering.
The proposed SSCAG is competitive against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-24T08:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.