Related papers: H3Former: Hypergraph-based Semantic-Aware Aggregation via Hyperbolic Hierarchical Contrastive Loss for Fine-Grained Visual Classification

H3Former: Hypergraph-based Semantic-Aware Aggregation via Hyperbolic Hierarchical Contrastive Loss for Fine-Grained Visual Classification

URL: http://arxiv.org/abs/2511.10260v1
Date: Fri, 14 Nov 2025 01:42:11 GMT
Title: H3Former: Hypergraph-based Semantic-Aware Aggregation via Hyperbolic Hierarchical Contrastive Loss for Fine-Grained Visual Classification
Authors: Yongji Zhang, Siqi Li, Kuiyang Huang, Yue Gao, Yu Jiang,
Abstract summary: H3Former is a novel token-to-region framework for fine-grained visual classification.<n>SAAM exploits multi-scale contextual cues to dynamically construct a weighted hypergraph among tokens.<n>HHCL enforces hierarchical semantic constraints in a non-Euclidean embedding space.
Score: 17.431440244641585
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-Grained Visual Classification (FGVC) remains a challenging task due to subtle inter-class differences and large intra-class variations. Existing approaches typically rely on feature-selection mechanisms or region-proposal strategies to localize discriminative regions for semantic analysis. However, these methods often fail to capture discriminative cues comprehensively while introducing substantial category-agnostic redundancy. To address these limitations, we propose H3Former, a novel token-to-region framework that leverages high-order semantic relations to aggregate local fine-grained representations with structured region-level modeling. Specifically, we propose the Semantic-Aware Aggregation Module (SAAM), which exploits multi-scale contextual cues to dynamically construct a weighted hypergraph among tokens. By applying hypergraph convolution, SAAM captures high-order semantic dependencies and progressively aggregates token features into compact region-level representations. Furthermore, we introduce the Hyperbolic Hierarchical Contrastive Loss (HHCL), which enforces hierarchical semantic constraints in a non-Euclidean embedding space. The HHCL enhances inter-class separability and intra-class consistency while preserving the intrinsic hierarchical relationships among fine-grained categories. Comprehensive experiments conducted on four standard FGVC benchmarks validate the superiority of our H3Former framework.

Related papers

AG-VAS: Anchor-Guided Zero-Shot Visual Anomaly Segmentation with Large Multimodal Models [21.682989096955467]
AG-VAS (Anchor-Guided Visual Anomaly) is a new framework that expands the LMM vocabulary with three learnable semantic anchor tokens.<n>AG-VAS achieves consistent state-of-the-art performance in the zero-shot setting.
arXiv Detail & Related papers (2026-03-01T22:25:23Z)
SSVP: Synergistic Semantic-Visual Prompting for Industrial Zero-Shot Anomaly Detection [55.54007781679915]
We propose Synergistic Semantic-Visual Prompting (SSVP), that efficiently fuses diverse visual encodings to elevate model's fine-grained perception.<n>SSVP achieves state-of-the-art performance with 93.0% Image-AUROC and 92.2% Pixel-AUROC on MVTec-AD, significantly outperforming existing zero-shot approaches.
arXiv Detail & Related papers (2026-01-14T04:42:19Z)
Multi-label Classification with Panoptic Context Aggregation Networks [61.82285737410154]
This paper introduces the Deep Panoptic Context Aggregation Network (PanCAN), a novel approach that hierarchically integrates multi-order geometric contexts.<n>PanCAN learns multi-order neighborhood relationships at each scale by combining random walks with an attention mechanism.<n>Experiments on NUS-WIDE, PASCAL VOC,2007, and MS-COCO benchmarks demonstrate that PanCAN consistently achieves competitive results.
arXiv Detail & Related papers (2025-12-29T14:16:21Z)
HyperbolicRAG: Enhancing Retrieval-Augmented Generation with Hyperbolic Representations [11.678218711095269]
Graph-based RAG enables large language models to access external knowledge.<n>We propose HyperbolicRAG, a retrieval framework that integrates hyperbolic geometry into graph-based RAG.
arXiv Detail & Related papers (2025-11-24T06:27:58Z)
The Finer the Better: Towards Granular-aware Open-set Domain Generalization [31.197204515055756]
Open-Set Domain Generalization tackles the realistic scenario where deployed models encounter both domain shifts and novel object categories.<n>Existing methods still fall into the dilemma between structural risk of known-classes and open-space risk from unknown-classes.<n>We propose a Semantic-enhanced CLIP framework that explicitly addresses this dilemma through fine-grained semantic enhancement.
arXiv Detail & Related papers (2025-11-21T06:19:19Z)
EnGraf-Net: Multiple Granularity Branch Network with Fine-Coarse Graft Grained for Classification Task [0.8299692647308321]
Fine-grained classification models are designed to focus on the relevant details necessary to distinguish highly similar classes.<n>Part-based approaches, including automatic cropping methods, suffer from an incomplete representation of local features.<n>We leverage semantic associations structured as a hierarchy (taxonomy) as supervised signals within an end-to-end deep neural network model, termed EnGraf-Net.
arXiv Detail & Related papers (2025-09-25T12:11:42Z)
Contrastive Prompt Clustering for Weakly Supervised Semantic Segmentation [41.065931555596975]
We propose Contrastive Prompt Clustering (CPC), a novel WSSS framework.<n> CPC exploits Large Language Models (LLMs) to derive category clusters that encode intrinsic inter-class relationships.<n> Experiments on PASCAL VOC 2012 and MS 2014 demonstrate that CPC surpasses existing state-of-the-art methods in WSSS.
arXiv Detail & Related papers (2025-08-23T12:49:08Z)
HVL: Semi-Supervised Segmentation leveraging Hierarchical Vision-Language Synergy with Dynamic Text-Spatial Query Alignment [16.926158907882012]
We propose a unified Vision-Language framework that integrates domain-invariant text embeddings as object queries in a transformer-based segmentation network.<n>Our results show that language-guided segmentation bridges the label efficiency gap and enables new levels of fine-grained generalization.
arXiv Detail & Related papers (2025-06-16T19:05:33Z)
Split Matching for Inductive Zero-shot Semantic Segmentation [56.47556212515178]
Zero-shot Semantic (ZSS) aims to segment categories that are not annotated during training.<n>We propose Split Matching (SM), a novel assignment strategy that decouples Hungarian matching into two components.<n>SM is the first to introduce decoupled Hungarian matching under the inductive ZSS setting, and achieves state-of-the-art performance on two standard benchmarks.
arXiv Detail & Related papers (2025-05-08T07:56:30Z)
Category-Adaptive Cross-Modal Semantic Refinement and Transfer for Open-Vocabulary Multi-Label Recognition [59.203152078315235]
We propose a novel category-adaptive cross-modal semantic refinement and transfer (C$2$SRT) framework to explore the semantic correlation.<n>The proposed framework consists of two complementary modules, i.e., intra-category semantic refinement (ISR) module and inter-category semantic transfer (IST) module.<n>Experiments on OV-MLR benchmarks clearly demonstrate that the proposed C$2$SRT framework outperforms current state-of-the-art algorithms.
arXiv Detail & Related papers (2024-12-09T04:00:18Z)
Data-free Knowledge Distillation for Fine-grained Visual Categorization [9.969720644789781]
We propose an approach called DFKD-FGVC that extends DFKD to fine-grained visual categorization(FGVC) tasks. We evaluate our approach on three widely-used FGVC benchmarks (Aircraft, Cars196, and CUB200) and demonstrate its superior performance.
arXiv Detail & Related papers (2024-04-18T09:44:56Z)
Channel DropBlock: An Improved Regularization Method for Fine-Grained Visual Classification [58.07257910065007]
Existing approaches mainly tackle this problem by introducing attention mechanisms to locate the discriminative parts or feature encoding approaches to extract the highly parameterized features in a weakly-supervised fashion. In this work, we propose a lightweight yet effective regularization method named Channel DropBlock (CDB) in combination with two alternative correlation metrics, to address this problem.
arXiv Detail & Related papers (2021-06-07T09:03:02Z)
Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain Adaptation using Structurally Regularized Deep Clustering [119.88565565454378]
Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain. We propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one. Our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings.
arXiv Detail & Related papers (2020-12-08T08:52:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.