Cross-Architecture Distillation Made Simple with Redundancy Suppression
- URL: http://arxiv.org/abs/2507.21844v1
- Date: Tue, 29 Jul 2025 14:21:40 GMT
- Title: Cross-Architecture Distillation Made Simple with Redundancy Suppression
- Authors: Weijia Zhang, Yuehao Liu, Wu Ran, Chao Ma,
- Abstract summary: We describe a simple method for cross-architecture knowledge distillation, where the knowledge transfer is cast into a redundant information suppression formulation.<n>We propose to extract the architecture-agnostic knowledge in heterogeneous representations by reducing the redundant architecture-exclusive information.<n>Our method is devoid of the architecture-specific designs and complex operations in the pioneering method of OFA.
- Score: 8.844066299737845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We describe a simple method for cross-architecture knowledge distillation, where the knowledge transfer is cast into a redundant information suppression formulation. Existing methods introduce sophisticated modules, architecture-tailored designs, and excessive parameters, which impair their efficiency and applicability. We propose to extract the architecture-agnostic knowledge in heterogeneous representations by reducing the redundant architecture-exclusive information. To this end, we present a simple redundancy suppression distillation (RSD) loss, which comprises cross-architecture invariance maximisation and feature decorrelation objectives. To prevent the student from entirely losing its architecture-specific capabilities, we further design a lightweight module that decouples the RSD objective from the student's internal representations. Our method is devoid of the architecture-specific designs and complex operations in the pioneering method of OFA. It outperforms OFA on CIFAR-100 and ImageNet-1k benchmarks with only a fraction of their parameter overhead, which highlights its potential as a simple and strong baseline to the cross-architecture distillation community.
Related papers
- Large-Scale Model Enabled Semantic Communication Based on Robust Knowledge Distillation [53.16213723669751]
Large-scale models (LSMs) can be an effective framework for semantic representation and understanding.<n>However, their direct deployment is often hindered by high computational complexity and resource requirements.<n>This paper proposes a novel knowledge distillation based semantic communication framework.
arXiv Detail & Related papers (2025-08-04T07:47:18Z) - Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models [3.287942619833188]
We systematically evaluate the transferability of knowledge distillation from a Transformer teacher model to eight subquadratic student architectures.<n>Our study investigates which subquadratic model can most effectively approximate the teacher model's learned representations through knowledge distillation.
arXiv Detail & Related papers (2025-04-19T17:49:52Z) - Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation [158.37640586809187]
Restoring any degraded image efficiently via just one model has become increasingly significant.<n>Our approach, termed AnyIR, takes a unified path that leverages inherent similarity across various degradations.<n>To fuse the degradation awareness and the contextualized attention, a spatial-frequency parallel fusion strategy is proposed.
arXiv Detail & Related papers (2025-04-19T09:54:46Z) - Relation Extraction with Instance-Adapted Predicate Descriptions [9.021267901894912]
Relation extraction plays a major role in downstream applications such as knowledge discovery and question answering.<n>In this paper, we revisit fine-tuning such smaller models using a novel dual-encoder architecture with a joint contrastive and cross-entropy loss.<n>Our approach achieved F1 score improvements ranging from 1% to 2% over state-of-the-art methods with a simple but elegant formulation.
arXiv Detail & Related papers (2025-03-22T15:36:41Z) - A Lightweight Deep Exclusion Unfolding Network for Single Image Reflection Removal [68.0573194557999]
Single Image Reflection Removal (SIRR) is a canonical blind source separation problem.<n>We propose a novel Deep Exclusion unfolding Network (DExNet) for SIRR.<n>DExNet is constructed by unfolding and parameterizing a simple iterative Sparse and Auxiliary Feature Update (i-SAFU) algorithm.
arXiv Detail & Related papers (2025-03-03T07:54:27Z) - An Architectural Approach to Enhance Deep Long-Tailed Learning [5.214135587370722]
We introduce Long-Tailed Differential Architecture Search (LTDAS)<n>We conduct extensive experiments to explore architectural components that demonstrate better performance on long-tailed data.<n>This ensures that the architecture obtained through our search process incorporates superior components.
arXiv Detail & Related papers (2024-11-09T07:19:56Z) - One-for-All: Bridge the Gap Between Heterogeneous Architectures in
Knowledge Distillation [69.65734716679925]
Knowledge distillation has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme.
Most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family.
We propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures.
arXiv Detail & Related papers (2023-10-30T11:13:02Z) - Heterogeneous Continual Learning [88.53038822561197]
We propose a novel framework to tackle the continual learning (CL) problem with changing network architectures.
We build on top of the distillation family of techniques and modify it to a new setting where a weaker model takes the role of a teacher.
We also propose Quick Deep Inversion (QDI) to recover prior task visual features to support knowledge transfer.
arXiv Detail & Related papers (2023-06-14T15:54:42Z) - $\alpha$ DARTS Once More: Enhancing Differentiable Architecture Search
by Masked Image Modeling [25.75814720792934]
Differentiable architecture search (DARTS) has been a mainstream direction in automatic machine learning.
We propose to additionally inject semantic information by formulating a patch recovery approach.
Our method surpasses all previous DARTS variants and achieves state-of-the-art results on CIFAR-10, CIFAR-100, and ImageNet.
arXiv Detail & Related papers (2022-11-18T09:07:19Z) - Pixel Distillation: A New Knowledge Distillation Scheme for Low-Resolution Image Recognition [124.80263629921498]
We propose Pixel Distillation that extends knowledge distillation into the input level while simultaneously breaking architecture constraints.
Such a scheme can achieve flexible cost control for deployment, as it allows the system to adjust both network architecture and image quality according to the overall requirement of resources.
arXiv Detail & Related papers (2021-12-17T14:31:40Z) - Multi-Stage Progressive Image Restoration [167.6852235432918]
We propose a novel synergistic design that can optimally balance these competing goals.
Our main proposal is a multi-stage architecture, that progressively learns restoration functions for the degraded inputs.
The resulting tightly interlinked multi-stage architecture, named as MPRNet, delivers strong performance gains on ten datasets.
arXiv Detail & Related papers (2021-02-04T18:57:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.