H3Former: Hypergraph-based Semantic-Aware Aggregation via Hyperbolic Hierarchical Contrastive Loss for Fine-Grained Visual Classification
- URL: http://arxiv.org/abs/2511.10260v1
- Date: Fri, 14 Nov 2025 01:42:11 GMT
- Title: H3Former: Hypergraph-based Semantic-Aware Aggregation via Hyperbolic Hierarchical Contrastive Loss for Fine-Grained Visual Classification
- Authors: Yongji Zhang, Siqi Li, Kuiyang Huang, Yue Gao, Yu Jiang,
- Abstract summary: H3Former is a novel token-to-region framework for fine-grained visual classification.<n>SAAM exploits multi-scale contextual cues to dynamically construct a weighted hypergraph among tokens.<n>HHCL enforces hierarchical semantic constraints in a non-Euclidean embedding space.
- Score: 17.431440244641585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-Grained Visual Classification (FGVC) remains a challenging task due to subtle inter-class differences and large intra-class variations. Existing approaches typically rely on feature-selection mechanisms or region-proposal strategies to localize discriminative regions for semantic analysis. However, these methods often fail to capture discriminative cues comprehensively while introducing substantial category-agnostic redundancy. To address these limitations, we propose H3Former, a novel token-to-region framework that leverages high-order semantic relations to aggregate local fine-grained representations with structured region-level modeling. Specifically, we propose the Semantic-Aware Aggregation Module (SAAM), which exploits multi-scale contextual cues to dynamically construct a weighted hypergraph among tokens. By applying hypergraph convolution, SAAM captures high-order semantic dependencies and progressively aggregates token features into compact region-level representations. Furthermore, we introduce the Hyperbolic Hierarchical Contrastive Loss (HHCL), which enforces hierarchical semantic constraints in a non-Euclidean embedding space. The HHCL enhances inter-class separability and intra-class consistency while preserving the intrinsic hierarchical relationships among fine-grained categories. Comprehensive experiments conducted on four standard FGVC benchmarks validate the superiority of our H3Former framework.
Related papers
- AG-VAS: Anchor-Guided Zero-Shot Visual Anomaly Segmentation with Large Multimodal Models [21.682989096955467]
AG-VAS (Anchor-Guided Visual Anomaly) is a new framework that expands the LMM vocabulary with three learnable semantic anchor tokens.<n>AG-VAS achieves consistent state-of-the-art performance in the zero-shot setting.
arXiv Detail & Related papers (2026-03-01T22:25:23Z) - SSVP: Synergistic Semantic-Visual Prompting for Industrial Zero-Shot Anomaly Detection [55.54007781679915]
We propose Synergistic Semantic-Visual Prompting (SSVP), that efficiently fuses diverse visual encodings to elevate model's fine-grained perception.<n>SSVP achieves state-of-the-art performance with 93.0% Image-AUROC and 92.2% Pixel-AUROC on MVTec-AD, significantly outperforming existing zero-shot approaches.
arXiv Detail & Related papers (2026-01-14T04:42:19Z) - Multi-label Classification with Panoptic Context Aggregation Networks [61.82285737410154]
This paper introduces the Deep Panoptic Context Aggregation Network (PanCAN), a novel approach that hierarchically integrates multi-order geometric contexts.<n>PanCAN learns multi-order neighborhood relationships at each scale by combining random walks with an attention mechanism.<n>Experiments on NUS-WIDE, PASCAL VOC,2007, and MS-COCO benchmarks demonstrate that PanCAN consistently achieves competitive results.
arXiv Detail & Related papers (2025-12-29T14:16:21Z) - HyperbolicRAG: Enhancing Retrieval-Augmented Generation with Hyperbolic Representations [11.678218711095269]
Graph-based RAG enables large language models to access external knowledge.<n>We propose HyperbolicRAG, a retrieval framework that integrates hyperbolic geometry into graph-based RAG.
arXiv Detail & Related papers (2025-11-24T06:27:58Z) - The Finer the Better: Towards Granular-aware Open-set Domain Generalization [31.197204515055756]
Open-Set Domain Generalization tackles the realistic scenario where deployed models encounter both domain shifts and novel object categories.<n>Existing methods still fall into the dilemma between structural risk of known-classes and open-space risk from unknown-classes.<n>We propose a Semantic-enhanced CLIP framework that explicitly addresses this dilemma through fine-grained semantic enhancement.
arXiv Detail & Related papers (2025-11-21T06:19:19Z) - EnGraf-Net: Multiple Granularity Branch Network with Fine-Coarse Graft Grained for Classification Task [0.8299692647308321]
Fine-grained classification models are designed to focus on the relevant details necessary to distinguish highly similar classes.<n>Part-based approaches, including automatic cropping methods, suffer from an incomplete representation of local features.<n>We leverage semantic associations structured as a hierarchy (taxonomy) as supervised signals within an end-to-end deep neural network model, termed EnGraf-Net.
arXiv Detail & Related papers (2025-09-25T12:11:42Z) - Contrastive Prompt Clustering for Weakly Supervised Semantic Segmentation [41.065931555596975]
We propose Contrastive Prompt Clustering (CPC), a novel WSSS framework.<n> CPC exploits Large Language Models (LLMs) to derive category clusters that encode intrinsic inter-class relationships.<n> Experiments on PASCAL VOC 2012 and MS 2014 demonstrate that CPC surpasses existing state-of-the-art methods in WSSS.
arXiv Detail & Related papers (2025-08-23T12:49:08Z) - HVL: Semi-Supervised Segmentation leveraging Hierarchical Vision-Language Synergy with Dynamic Text-Spatial Query Alignment [16.926158907882012]
We propose a unified Vision-Language framework that integrates domain-invariant text embeddings as object queries in a transformer-based segmentation network.<n>Our results show that language-guided segmentation bridges the label efficiency gap and enables new levels of fine-grained generalization.
arXiv Detail & Related papers (2025-06-16T19:05:33Z) - Split Matching for Inductive Zero-shot Semantic Segmentation [56.47556212515178]
Zero-shot Semantic (ZSS) aims to segment categories that are not annotated during training.<n>We propose Split Matching (SM), a novel assignment strategy that decouples Hungarian matching into two components.<n>SM is the first to introduce decoupled Hungarian matching under the inductive ZSS setting, and achieves state-of-the-art performance on two standard benchmarks.
arXiv Detail & Related papers (2025-05-08T07:56:30Z) - Category-Adaptive Cross-Modal Semantic Refinement and Transfer for Open-Vocabulary Multi-Label Recognition [59.203152078315235]
We propose a novel category-adaptive cross-modal semantic refinement and transfer (C$2$SRT) framework to explore the semantic correlation.<n>The proposed framework consists of two complementary modules, i.e., intra-category semantic refinement (ISR) module and inter-category semantic transfer (IST) module.<n>Experiments on OV-MLR benchmarks clearly demonstrate that the proposed C$2$SRT framework outperforms current state-of-the-art algorithms.
arXiv Detail & Related papers (2024-12-09T04:00:18Z) - Data-free Knowledge Distillation for Fine-grained Visual Categorization [9.969720644789781]
We propose an approach called DFKD-FGVC that extends DFKD to fine-grained visual categorization(FGVC) tasks.
We evaluate our approach on three widely-used FGVC benchmarks (Aircraft, Cars196, and CUB200) and demonstrate its superior performance.
arXiv Detail & Related papers (2024-04-18T09:44:56Z) - Channel DropBlock: An Improved Regularization Method for Fine-Grained
Visual Classification [58.07257910065007]
Existing approaches mainly tackle this problem by introducing attention mechanisms to locate the discriminative parts or feature encoding approaches to extract the highly parameterized features in a weakly-supervised fashion.
In this work, we propose a lightweight yet effective regularization method named Channel DropBlock (CDB) in combination with two alternative correlation metrics, to address this problem.
arXiv Detail & Related papers (2021-06-07T09:03:02Z) - Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain
Adaptation using Structurally Regularized Deep Clustering [119.88565565454378]
Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain.
We propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one.
Our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings.
arXiv Detail & Related papers (2020-12-08T08:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.