Related papers: Unified Knowledge Distillation Framework: Fine-Grained Alignment and Geometric Relationship Preservation for Deep Face Recognition

Unified Knowledge Distillation Framework: Fine-Grained Alignment and Geometric Relationship Preservation for Deep Face Recognition

URL: http://arxiv.org/abs/2508.11376v1
Date: Fri, 15 Aug 2025 10:20:29 GMT
Title: Unified Knowledge Distillation Framework: Fine-Grained Alignment and Geometric Relationship Preservation for Deep Face Recognition
Authors: Durgesh Mishra, Rishabh Uikey,
Abstract summary: We propose a unified approach that integrates two novel loss functions, Instance-Level Embedding Distillation and Relation-Based Pairwise Similarity Distillation.<n>Our framework outperforms state-of-the-art distillation methods across multiple benchmark face recognition datasets.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge Distillation is crucial for optimizing face recognition models for deployment in computationally limited settings, such as edge devices. Traditional KD methods, such as Raw L2 Feature Distillation or Feature Consistency loss, often fail to capture both fine-grained instance-level details and complex relational structures, leading to suboptimal performance. We propose a unified approach that integrates two novel loss functions, Instance-Level Embedding Distillation and Relation-Based Pairwise Similarity Distillation. Instance-Level Embedding Distillation focuses on aligning individual feature embeddings by leveraging a dynamic hard mining strategy, thereby enhancing learning from challenging examples. Relation-Based Pairwise Similarity Distillation captures relational information through pairwise similarity relationships, employing a memory bank mechanism and a sample mining strategy. This unified framework ensures both effective instance-level alignment and preservation of geometric relationships between samples, leading to a more comprehensive distillation process. Our unified framework outperforms state-of-the-art distillation methods across multiple benchmark face recognition datasets, as demonstrated by extensive experimental evaluations. Interestingly, when using strong teacher networks compared to the student, our unified KD enables the student to even surpass the teacher's accuracy.

Related papers

Understanding Catastrophic Interference: On the Identifibility of Latent Representations [67.05452287233122]
Catastrophic interference, also known as catastrophic forgetting, is a fundamental challenge in machine learning.<n>We propose a novel theoretical framework that formulates catastrophic interference as an identification problem.<n>Our approach provides both theoretical guarantees and practical performance improvements across both synthetic and benchmark datasets.
arXiv Detail & Related papers (2025-09-27T00:53:32Z)
Local Dense Logit Relations for Enhanced Knowledge Distillation [12.350115738581223]
Local Logit Distillation captures inter-class relationships and recombining logit information.<n>We introduce an Adaptive Decay Weight (ADW) strategy, which can dynamically adjust the weights for critical category pairs.<n>Our method improves the student's performance by transferring fine-grained knowledge and emphasizing the most critical relationships.
arXiv Detail & Related papers (2025-07-21T16:25:38Z)
Cross-View Consistency Regularisation for Knowledge Distillation [13.918476599394603]
This work is inspired by the success of cross-view learning in fields such as semi-supervised learning.<n>We introduce within-view and cross-view regularisations to standard logit-based distillation frameworks.<n>We also perform confidence-based soft label mining to improve the quality of distilling signals from the teacher.
arXiv Detail & Related papers (2024-12-21T05:41:47Z)
On Distilling the Displacement Knowledge for Few-Shot Class-Incremental Learning [17.819582979803286]
Few-shot Class-Incremental Learning (FSCIL) addresses the challenges of evolving data distributions and the difficulty of data acquisition in real-world scenarios.<n>To counteract the catastrophic forgetting typically encountered in FSCIL, knowledge distillation is employed as a way to maintain the knowledge from learned data distribution.
arXiv Detail & Related papers (2024-12-15T02:10:18Z)
Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching [53.05954114863596]
We propose a brand-new Deep Boosting Learning (DBL) algorithm for image-text matching. An anchor branch is first trained to provide insights into the data properties. A target branch is concurrently tasked with more adaptive margin constraints to further enlarge the relative distance between matched and unmatched samples.
arXiv Detail & Related papers (2024-04-28T08:44:28Z)
Learning to Maximize Mutual Information for Chain-of-Thought Distillation [13.660167848386806]
Distilling Step-by-Step(DSS) has demonstrated promise by imbuing smaller models with the superior reasoning capabilities of their larger counterparts. However, DSS overlooks the intrinsic relationship between the two training tasks, leading to ineffective integration of CoT knowledge with the task of label prediction. We propose a variational approach to solve this problem using a learning-based method.
arXiv Detail & Related papers (2024-03-05T22:21:45Z)
CORSD: Class-Oriented Relational Self Distillation [16.11986532440837]
Knowledge distillation conducts an effective model compression method while holding some limitations. We propose a novel training framework named Class-Oriented Self Distillation (CORSD) to address the limitations.
arXiv Detail & Related papers (2023-04-28T16:00:31Z)
EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR) We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model. We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z)
Weakly Supervised Semantic Segmentation via Alternative Self-Dual Teaching [82.71578668091914]
This paper establishes a compact learning framework that embeds the classification and mask-refinement components into a unified deep model. We propose a novel alternative self-dual teaching (ASDT) mechanism to encourage high-quality knowledge interaction.
arXiv Detail & Related papers (2021-12-17T11:56:56Z)
Adaptive Hierarchical Similarity Metric Learning with Noisy Labels [138.41576366096137]
We propose an Adaptive Hierarchical Similarity Metric Learning method. It considers two noise-insensitive information, textiti.e., class-wise divergence and sample-wise consistency. Our method achieves state-of-the-art performance compared with current deep metric learning approaches.
arXiv Detail & Related papers (2021-10-29T02:12:18Z)
Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network. We show that the seemingly different self-supervision task can serve as a simple yet powerful solution. By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.