Related papers: Beyond Student: An Asymmetric Network for Neural Network Inheritance

Beyond Student: An Asymmetric Network for Neural Network Inheritance

URL: http://arxiv.org/abs/2602.09509v2
Date: Wed, 11 Feb 2026 01:41:57 GMT
Title: Beyond Student: An Asymmetric Network for Neural Network Inheritance
Authors: Yiyun Zhou, Jingwei Shi, Mingjing Xu, Zhonghua Jiang, Jingyuan Chen,
Abstract summary: InherNet is a neural network inheritance method that performs asymmetric low-rank decomposition on the teacher's weights.<n> Experimental results across unimodal and multimodal tasks demonstrate that InherNet achieves higher performance compared to student networks of similar parameter sizes.
Score: 18.289627626976753
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Knowledge Distillation (KD) has emerged as a powerful technique for model compression, enabling lightweight student networks to benefit from the performance of redundant teacher networks. However, the inherent capacity gap often limits the performance of student networks. Inspired by the expressiveness of pretrained teacher networks, a compelling research question arises: is there a type of network that can not only inherit the teacher's structure but also maximize the inheritance of its knowledge? Furthermore, how does the performance of such an inheriting network compare to that of student networks, all benefiting from the same teacher network? To further explore this question, we propose InherNet, a neural network inheritance method that performs asymmetric low-rank decomposition on the teacher's weights and reconstructs a lightweight yet expressive network without significant architectural disruption. By leveraging Singular Value Decomposition (SVD) for initialization to ensure the inheritance of principal knowledge, InherNet effectively balances depth, width, and compression efficiency. Experimental results across unimodal and multimodal tasks demonstrate that InherNet achieves higher performance compared to student networks of similar parameter sizes. Our findings reveal a promising direction for future research in efficient model compression beyond traditional distillation.

Related papers

Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning [57.3885832382455]
We show that introducing static network sparsity alone can unlock further scaling potential beyond dense counterparts with state-of-the-art architectures.<n>Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity.
arXiv Detail & Related papers (2025-06-20T17:54:24Z)
ProARD: progressive adversarial robustness distillation: provide wide range of robust students [1.529342790344802]
Adrial Robustness Distillation (ARD) has emerged as an effective method to enhance the robustness of lightweight deep neural networks against adversarial attacks.<n>Current approaches require training a new student network from scratch to meet specific constraints, leading to substantial computational costs and increased CO2 emissions.<n>This paper proposes Progressive Adrial Robustness Distillation (ProARD), enabling the efficient one-time training of a dynamic network.
arXiv Detail & Related papers (2025-06-09T11:39:25Z)
The impact of allocation strategies in subset learning on the expressive power of neural networks [0.0]
We investigate how different allocations of a fixed number of learnable weights influence the capacity of neural networks.<n>We establish conditions under which allocations have maximal or minimal expressive power in linear recurrent neural networks and linear multilayer feedforward networks.<n>Our results emphasize the critical role of strategically distributing learnable weights across the network, showing that a more widespread allocation generally enhances the network's expressive power.
arXiv Detail & Related papers (2025-02-10T09:43:43Z)
Teacher Encoder-Student Decoder Denoising Guided Segmentation Network for Anomaly Detection [15.545036112870841]
We propose a novel model named PFADSeg, which integrates a pre-trained teacher network, a denoising student network with multi-scale feature fusion, and a guided anomaly segmentation network into a unified framework.<n> evaluated on the MVTec AD dataset, PFADSeg achieves state-of-the-art results with an image-level AUC of 98.9%, a pixel-level mean precision of 76.4%, and an instance-level mean precision of 78.7%.
arXiv Detail & Related papers (2025-01-21T12:55:04Z)
Adaptive Teaching with Shared Classifier for Knowledge Distillation [6.03477652126575]
Knowledge distillation (KD) is a technique used to transfer knowledge from a teacher network to a student network. We propose adaptive teaching with a shared classifier (ATSC) Our approach achieves state-of-the-art results on the CIFAR-100 and ImageNet datasets in both single-teacher and multiteacher scenarios.
arXiv Detail & Related papers (2024-06-12T08:51:08Z)
PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor [6.089685202183291]
Adversarial robustness of the neural network is a significant concern when it is applied to security-critical domains. Previous works pretrain the teacher network to make it robust against the adversarial examples aimed at itself. We propose PeerAiD to make a peer network learn the adversarial examples of the student network instead of adversarial examples aimed at itself.
arXiv Detail & Related papers (2024-03-11T12:36:14Z)
Continual Learning: Forget-free Winning Subnetworks for Video Representations [75.40220771931132]
Winning Subnetwork (WSN) in terms of task performance is considered for various continual learning tasks.<n>It leverages pre-existing weights from dense networks to achieve efficient learning in Task Incremental Learning (TIL) and Task-agnostic Incremental Learning (TaIL) scenarios.<n>The use of Fourier Subneural Operator (FSO) within WSN is considered for Video Incremental Learning (VIL)
arXiv Detail & Related papers (2023-12-19T09:11:49Z)
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks [44.31729147722701]
We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks.<n>This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
arXiv Detail & Related papers (2023-05-11T17:19:30Z)
Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers. It is an instance of a key structural condition that applies across broad domains of machine learning. For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z)
Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods [58.44819696433327]
We investigate the risk of two-layer ReLU neural networks in a teacher regression model. We find that the student network provably outperforms any solution methods.
arXiv Detail & Related papers (2022-05-30T02:51:36Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Efficient Crowd Counting via Structured Knowledge Transfer [122.30417437707759]
Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications. We propose a novel Structured Knowledge Transfer framework to generate a lightweight but still highly effective student network. Our models obtain at least 6.5$times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-03-23T08:05:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.