Related papers: HVT: A Comprehensive Vision Framework for Learning in Non-Euclidean Space

Related papers

Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models [84.78794648147608]
A persistent geometric anomaly, the Modality Gap, remains.<n>Prior approaches to bridge this gap are largely limited by oversimplified isotropic assumptions.<n>We propose the Fixed-frame Modality Gap Theory, which decomposes the modality gap into stable biases and anisotropic residuals.<n>We then introduce ReAlign, a training-free modality alignment strategy.
arXiv Detail & Related papers (2026-02-02T13:59:39Z)
HexFormer: Hyperbolic Vision Transformer with Exponential Map Aggregation [12.198535149754058]
Hyperbolic geometry provides a natural framework for representing hierarchical and relational structures.<n>HexFormer is a hyperbolic vision transformer for image classification that incorporates exponential map aggregation.<n>HexFormer incorporates a novel attention mechanism based on exponential map aggregation, which yields more accurate and stable aggregated representations.
arXiv Detail & Related papers (2026-01-27T17:56:49Z)
Training-Free Dual Hyperbolic Adapters for Better Cross-Modal Reasoning [38.464005168841986]
We develop a new adaptation method for large vision-language models, called textitTraining-free Dual Hyperbolic Adapters (T-DHA)<n>We characterize the vision-language relationship between semantic concepts, which typically has a hierarchical tree structure, in the hyperbolic space instead of the traditional Euclidean space.<n>Our extensive experimental results on various datasets demonstrate that the T-DHA method significantly outperforms existing state-of-the-art methods in few-shot image recognition and domain generalization tasks.
arXiv Detail & Related papers (2025-12-09T17:12:22Z)
Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds [49.95082206008502]
Alignment across Trees is a method that constructs and aligns tree-like hierarchical features for both image and text modalities.<n>We introduce a semantic-aware visual feature extraction framework that applies a cross-attention mechanism to visual class tokens from intermediate Transformer layers.
arXiv Detail & Related papers (2025-10-31T11:32:15Z)
Proximal Vision Transformer: Enhancing Feature Representation through Two-Stage Manifold Geometry [7.3623134099785155]
Vision Transformer (ViT) has become widely recognized in computer vision, leveraging its self-attention mechanism to achieve remarkable success across various tasks.<n>This paper proposes a novel framework that integrates ViT with the proximal tools, enabling a unified geometric optimization approach.<n> Experimental results confirm that the proposed method outperforms traditional ViT in terms of classification accuracy and data distribution.
arXiv Detail & Related papers (2025-08-23T16:39:09Z)
AdS-GNN -- a Conformally Equivariant Graph Neural Network [9.96018310438305]
We build a neural network that is equivariant under general conformal transformations.<n>We validate our model on tasks from computer vision and statistical physics.
arXiv Detail & Related papers (2025-05-19T09:08:52Z)
Cross Paradigm Representation and Alignment Transformer for Image Deraining [40.66823807648992]
We propose a novel Cross Paradigm Representation and Alignment Transformer (CPRAformer) Its core idea is the hierarchical representation and alignment, leveraging the strengths of both paradigms to aid image reconstruction. We use two types of self-attention in the Transformer blocks: sparse prompt channel self-attention (SPC-SA) and spatial pixel refinement self-attention (SPR-SA)
arXiv Detail & Related papers (2025-04-23T06:44:46Z)
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges [0.0]
We propose the Hypergraph Vision Transformer (HgVT), which incorporates a hierarchical bipartite hypergraph structure into the vision transformer framework. HgVT achieves strong performance on image classification and retrieval, positioning it as an efficient framework for semantic-based vision tasks.
arXiv Detail & Related papers (2025-04-11T17:20:26Z)
Parallel Sequence Modeling via Generalized Spatial Propagation Network [80.66202109995726]
Generalized Spatial Propagation Network (GSPN) is a new attention mechanism for optimized vision tasks that inherently captures 2D spatial structures. GSPN overcomes limitations by directly operating on spatially coherent image data and forming dense pairwise connections through a line-scan approach. GSPN achieves superior spatial fidelity and state-of-the-art performance in vision tasks, including ImageNet classification, class-guided image generation, and text-to-image generation.
arXiv Detail & Related papers (2025-01-21T18:56:19Z)
SMLNet: A SPD Manifold Learning Network for Infrared and Visible Image Fusion [60.18614468818683]
We propose a novel SPD (symmetric positive definite) manifold learning for multi-modal image fusion.<n>Our framework exhibits superior performance compared to the current state-of-the-art methods.
arXiv Detail & Related papers (2024-11-16T03:09:49Z)
Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration. We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z)
Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem. This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z)
Dynamic Hyperbolic Attention Network for Fine Hand-object Reconstruction [76.5549647815413]
We propose the first precise hand-object reconstruction method in hyperbolic space, namely Dynamic Hyperbolic Attention Network (DHANet) Our method learns mesh features with rich geometry-image multi-modal information and models better hand-object interaction.
arXiv Detail & Related papers (2023-09-06T13:00:10Z)
VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables. The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning. We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z)
Complex Hyperbolic Knowledge Graph Embeddings with Fast Fourier Transform [29.205221688430733]
The choice of geometric space for knowledge graph (KG) embeddings can have significant effects on the performance of KG completion tasks. Recent explorations of the complex hyperbolic geometry further improved the hyperbolic embeddings for capturing a variety of hierarchical structures. This paper aims to utilize the representation capacity of the complex hyperbolic geometry in multi-relational KG embeddings.
arXiv Detail & Related papers (2022-11-07T15:46:00Z)
AMCAD: Adaptive Mixed-Curvature Representation based Advertisement Retrieval System [18.07821800367287]
We present a web-scale Adaptive Mixed-Curvature ADvertisement retrieval system (AMCAD) to automatically capture the complex and heterogeneous graph structures in non-Euclidean spaces. To deploy AMCAD in Taobao, one of the largest ecommerce platforms with hundreds of million users, we design an efficient two-layer online retrieval framework.
arXiv Detail & Related papers (2022-03-28T12:29:30Z)
Enhancing Hyperbolic Graph Embeddings via Contrastive Learning [7.901082408569372]
We propose a novel Hyperbolic Graph Contrastive Learning (HGCL) framework which learns node representations through multiple hyperbolic spaces. Experimental results on multiple real-world datasets demonstrate the superiority of the proposed HGCL.
arXiv Detail & Related papers (2022-01-21T06:10:05Z)
Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design [8.250374560598493]
Hyperbolic neural networks have been popular in the recent past due to their ability to represent hierarchical data sets effectively and efficiently. The challenge in developing these networks lies in the nonlinearity of the embedding space namely, the Hyperbolic space. We present a novel fully hyperbolic neural network which uses the concept of projections (embeddings) followed by an intrinsic aggregation and a nonlinearity all within the hyperbolic space.
arXiv Detail & Related papers (2021-12-03T03:20:27Z)
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model [58.17021225930069]
We explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA) We propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly. Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works.
arXiv Detail & Related papers (2021-05-31T16:20:03Z)
Spatial-Spectral Clustering with Anchor Graph for Hyperspectral Image [88.60285937702304]
This paper proposes a novel unsupervised approach called spatial-spectral clustering with anchor graph (SSCAG) for HSI data clustering. The proposed SSCAG is competitive against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-24T08:09:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.