Geodesic Prototype Matching via Diffusion Maps for Interpretable Fine-Grained Recognition
- URL: http://arxiv.org/abs/2509.17050v1
- Date: Sun, 21 Sep 2025 12:15:04 GMT
- Title: Geodesic Prototype Matching via Diffusion Maps for Interpretable Fine-Grained Recognition
- Authors: Junhao Jia, Yunyou Liu, Yifei Sun, Huangwei Chen, Feiwei Qin, Changmiao Wang, Yong Peng,
- Abstract summary: We propose a novel paradigm for prototype-based recognition that anchors similarity within the intrinsic geometry of deep features.<n>Specifically, we distill the latent manifold structure of each class into a diffusion space and introduce a differentiable Nystr"om distances.<n>To ensure efficiency, we employ compact per-class landmark sets with periodic updates.
- Score: 8.300049635963141
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nonlinear manifolds are widespread in deep visual features, where Euclidean distances often fail to capture true similarity. This limitation becomes particularly severe in prototype-based interpretable fine-grained recognition, where subtle semantic distinctions are essential. To address this challenge, we propose a novel paradigm for prototype-based recognition that anchors similarity within the intrinsic geometry of deep features. Specifically, we distill the latent manifold structure of each class into a diffusion space and introduce a differentiable Nystr\"om interpolation, making the geometry accessible to both unseen samples and learnable prototypes. To ensure efficiency, we employ compact per-class landmark sets with periodic updates. This design keeps the embedding aligned with the evolving backbone, enabling fast and scalable inference. Extensive experiments on the CUB-200-2011 and Stanford Cars datasets show that our GeoProto framework produces prototypes focusing on semantically aligned parts, significantly outperforming Euclidean prototype networks.
Related papers
- Divide, Conquer and Unite: Hierarchical Style-Recalibrated Prototype Alignment for Federated Medical Image Segmentation [66.82598255715696]
Federated learning enables multiple medical institutions to train a global model without sharing data.<n>Current approaches primarily focus on final-layer features, overlooking critical multi-level cues.<n>We propose FedBCS to bridge feature representation gaps via domain-invariant contextual prototypes alignment.
arXiv Detail & Related papers (2025-11-14T04:15:34Z) - Few to Big: Prototype Expansion Network via Diffusion Learner for Point Cloud Few-shot Semantic Segmentation [12.971351926107289]
Prototype Expansion Network (PENet) is a framework that constructs big-capacity prototypes from two annotated feature sources.<n>PENet significantly outperforms state-of-the-art methods across various few-shot settings.
arXiv Detail & Related papers (2025-09-16T09:29:46Z) - Rethinking Features-Fused-Pyramid-Neck for Object Detection [0.0]
This paper presents an independent hierarchy pyramid (IHP) architecture to evaluate the effectiveness of the features-unfusedpyramid-neck for multi-head detectors.<n>We also introduce soft nearest neighbor (SNI) with a weight downscaling factor to mitigate the impact of feature fusion at different hierarchies.<n>These advancements culminate in our secondary features alignment solution (SA) for real-time detection, achieving state-of-the-art results on Pascal and MS.
arXiv Detail & Related papers (2025-05-19T08:01:11Z) - GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning [51.677086019209554]
We propose a Generalized Structural Sparse to capture powerful relationships across modalities for pair-wise similarity learning.
The distance metric delicately encapsulates two formats of diagonal and block-diagonal terms.
Experiments on cross-modal and two extra uni-modal retrieval tasks have validated its superiority and flexibility.
arXiv Detail & Related papers (2024-10-20T03:45:50Z) - Beyond Prototypes: Semantic Anchor Regularization for Better
Representation Learning [82.29761875805369]
One of the ultimate goals of representation learning is to achieve compactness within a class and well-separability between classes.
We propose a novel perspective to use pre-defined class anchors serving as feature centroid to unidirectionally guide feature learning.
The proposed Semantic Anchor Regularization (SAR) can be used in a plug-and-play manner in the existing models.
arXiv Detail & Related papers (2023-12-19T05:52:38Z) - Engineering the Neural Collapse Geometry of Supervised-Contrastive Loss [28.529476019629097]
Supervised-contrastive loss (SCL) is an alternative to cross-entropy (CE) for classification tasks.
We propose methods to engineer the geometry of learnt feature embeddings by modifying the contrastive loss.
arXiv Detail & Related papers (2023-10-02T04:23:17Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - ContraFeat: Contrasting Deep Features for Semantic Discovery [102.4163768995288]
StyleGAN has shown strong potential for disentangled semantic control.
Existing semantic discovery methods on StyleGAN rely on manual selection of modified latent layers to obtain satisfactory manipulation results.
We propose a model that automates this process and achieves state-of-the-art semantic discovery performance.
arXiv Detail & Related papers (2022-12-14T15:22:13Z) - Semi-Supervised Manifold Learning with Complexity Decoupled Chart Autoencoders [45.29194877564103]
This work introduces a chart autoencoder with an asymmetric encoding-decoding process that can incorporate additional semi-supervised information such as class labels.
We discuss the approximation power of such networks and derive a bound that essentially depends on the intrinsic dimension of the data manifold rather than the dimension of ambient space.
arXiv Detail & Related papers (2022-08-22T19:58:03Z) - Quadric hypersurface intersection for manifold learning in feature space [52.83976795260532]
manifold learning technique suitable for moderately high dimension and large datasets.
The technique is learned from the training data in the form of an intersection of quadric hypersurfaces.
At test time, this manifold can be used to introduce an outlier score for arbitrary new points.
arXiv Detail & Related papers (2021-02-11T18:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.