Boosting Residual Networks with Group Knowledge
- URL: http://arxiv.org/abs/2308.13772v2
- Date: Thu, 14 Dec 2023 07:04:39 GMT
- Title: Boosting Residual Networks with Group Knowledge
- Authors: Shengji Tang, Peng Ye, Baopu Li, Weihao Lin, Tao Chen, Tong He, Chong
Yu, Wanli Ouyang
- Abstract summary: Recent research understands the residual networks from a new perspective of the implicit ensemble model.
Previous methods such as depth and stimulative training have further improved the performance of the residual network by sampling and training of itss.
We propose a group knowledge based training framework for boosting the performance of residual networks.
- Score: 75.73793561417702
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent research understands the residual networks from a new perspective of
the implicit ensemble model. From this view, previous methods such as
stochastic depth and stimulative training have further improved the performance
of the residual network by sampling and training of its subnets. However, they
both use the same supervision for all subnets of different capacities and
neglect the valuable knowledge generated by subnets during training. In this
manuscript, we mitigate the significant knowledge distillation gap caused by
using the same kind of supervision and advocate leveraging the subnets to
provide diverse knowledge. Based on this motivation, we propose a group
knowledge based training framework for boosting the performance of residual
networks. Specifically, we implicitly divide all subnets into hierarchical
groups by subnet-in-subnet sampling, aggregate the knowledge of different
subnets in each group during training, and exploit upper-level group knowledge
to supervise lower-level subnet groups. Meanwhile, We also develop a subnet
sampling strategy that naturally samples larger subnets, which are found to be
more helpful than smaller subnets in boosting performance for hierarchical
groups. Compared with typical subnet training and other methods, our method
achieves the best efficiency and performance trade-offs on multiple datasets
and network structures. The code is at https://github.com/tsj-001/AAAI24-GKT.
Related papers
- PSE-Net: Channel Pruning for Convolutional Neural Networks with Parallel-subnets Estimator [16.698190973547362]
We introduce PSE-Net, a novel parallel-subnets estimator for efficient channel pruning.
Our proposed algorithm facilitates the efficiency of supernet training.
We develop a prior-distributed-based sampling algorithm to boost the performance of classical evolutionary search.
arXiv Detail & Related papers (2024-08-29T03:20:43Z) - Neural Subnetwork Ensembles [2.44755919161855]
This dissertation introduces and formalizes a low-cost framework for constructing Subnetwork Ensembles.
Child networks are formed by sampling, perturbing, and optimizingworks from a trained parent model.
Our findings reveal that this approach can greatly improve training efficiency, parametric utilization, and generalization performance.
arXiv Detail & Related papers (2023-11-23T17:01:16Z) - Deep Image Clustering with Contrastive Learning and Multi-scale Graph
Convolutional Networks [58.868899595936476]
This paper presents a new deep clustering approach termed image clustering with contrastive learning and multi-scale graph convolutional networks (IcicleGCN)
Experiments on multiple image datasets demonstrate the superior clustering performance of IcicleGCN over the state-of-the-art.
arXiv Detail & Related papers (2022-07-14T19:16:56Z) - DeepCluE: Enhanced Image Clustering via Multi-layer Ensembles in Deep
Neural Networks [53.88811980967342]
This paper presents a Deep Clustering via Ensembles (DeepCluE) approach.
It bridges the gap between deep clustering and ensemble clustering by harnessing the power of multiple layers in deep neural networks.
Experimental results on six image datasets confirm the advantages of DeepCluE over the state-of-the-art deep clustering approaches.
arXiv Detail & Related papers (2022-06-01T09:51:38Z) - Deep Embedded Clustering with Distribution Consistency Preservation for
Attributed Networks [15.895606627146291]
In this study, we propose an end-to-end deep embedded clustering model for attributed networks.
It utilizes graph autoencoder and node attribute autoencoder to respectively learn node representations and cluster assignments.
The proposed model achieves significantly better or competitive performance compared with the state-of-the-art methods.
arXiv Detail & Related papers (2022-05-28T02:35:34Z) - Prioritized Subnet Sampling for Resource-Adaptive Supernet Training [136.6591624918964]
We propose Prioritized Subnet Sampling to train a resource-adaptive supernet, termed PSS-Net.
Experiments on ImageNet using MobileNetV1/V2 show that our PSS-Net can well outperform state-of-the-art resource-adaptive supernets.
arXiv Detail & Related papers (2021-09-12T04:43:51Z) - Embedded Knowledge Distillation in Depth-level Dynamic Neural Network [8.207403859762044]
We propose an elegant Depth-level Dynamic Neural Network (DDNN) integrated different-depth sub-nets of similar architectures.
In this article, we design the Embedded-Knowledge-Distillation (EKD) training mechanism for the DDNN to implement semantic knowledge transfer from the teacher (full) net to multiple sub-nets.
Experiments on CIFAR-10, CIFAR-100, and ImageNet datasets demonstrate that sub-nets in DDNN with EKD training achieves better performance than the depth-level pruning or individually training.
arXiv Detail & Related papers (2021-03-01T06:35:31Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z) - Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN [80.17705319689139]
We propose a data-free knowledge amalgamate strategy to craft a well-behaved multi-task student network from multiple single/multi-task teachers.
The proposed method without any training data achieves the surprisingly competitive results, even compared with some full-supervised methods.
arXiv Detail & Related papers (2020-03-20T03:20:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.