Be Your Own Best Competitor! Multi-Branched Adversarial Knowledge
Transfer
- URL: http://arxiv.org/abs/2010.04516v1
- Date: Fri, 9 Oct 2020 11:57:45 GMT
- Title: Be Your Own Best Competitor! Multi-Branched Adversarial Knowledge
Transfer
- Authors: Mahdi Ghorbani, Fahimeh Fooladgar, Shohreh Kasaei
- Abstract summary: The proposed method has been devoted to both lightweight image classification and encoder-decoder architectures to boost the performance of small and compact models without incurring extra computational overhead at the inference process.
The obtained results show that the proposed model has achieved significant improvement over earlier ideas of self-distillation methods.
- Score: 15.499267533387039
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural network architectures have attained remarkable improvements in
scene understanding tasks. Utilizing an efficient model is one of the most
important constraints for limited-resource devices. Recently, several
compression methods have been proposed to diminish the heavy computational
burden and memory consumption. Among them, the pruning and quantizing methods
exhibit a critical drop in performances by compressing the model parameters.
While the knowledge distillation methods improve the performance of compact
models by focusing on training lightweight networks with the supervision of
cumbersome networks. In the proposed method, the knowledge distillation has
been performed within the network by constructing multiple branches over the
primary stream of the model, known as the self-distillation method. Therefore,
the ensemble of sub-neural network models has been proposed to transfer the
knowledge among themselves with the knowledge distillation policies as well as
an adversarial learning strategy. Hence, The proposed ensemble of sub-models is
trained against a discriminator model adversarially. Besides, their knowledge
is transferred within the ensemble by four different loss functions. The
proposed method has been devoted to both lightweight image classification and
encoder-decoder architectures to boost the performance of small and compact
models without incurring extra computational overhead at the inference process.
Extensive experimental results on the main challenging datasets show that the
proposed network outperforms the primary model in terms of accuracy at the same
number of parameters and computational cost. The obtained results show that the
proposed model has achieved significant improvement over earlier ideas of
self-distillation methods. The effectiveness of the proposed models has also
been illustrated in the encoder-decoder model.
Related papers
- LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks.
By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections.
Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z) - Optimizing Dense Feed-Forward Neural Networks [0.0]
We propose a novel feed-forward neural network constructing method based on pruning and transfer learning.
Our approach can compress the number of parameters by more than 70%.
We also evaluate the transfer learning level comparing the refined model and the original one training from scratch a neural network.
arXiv Detail & Related papers (2023-12-16T23:23:16Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - Bayesian sparsification for deep neural networks with Bayesian model
reduction [0.6144680854063939]
We advocate for the use of Bayesian model reduction (BMR) as a more efficient alternative for pruning of model weights.
BMR allows a post-hoc elimination of redundant model weights based on the posterior estimates under a straightforward (non-hierarchical) generative model.
We illustrate the potential of BMR across various deep learning architectures, from classical networks like LeNet to modern frameworks such as Vision and Transformers-Mixers.
arXiv Detail & Related papers (2023-09-21T14:10:47Z) - Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data.
However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations.
This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z) - Reconciliation of Pre-trained Models and Prototypical Neural Networks in
Few-shot Named Entity Recognition [35.34238362639678]
We propose a one-line-code normalization method to reconcile such a mismatch with empirical and theoretical grounds.
Our work also provides an analytical viewpoint for addressing the general problems in few-shot name entity recognition.
arXiv Detail & Related papers (2022-11-07T02:33:45Z) - "Understanding Robustness Lottery": A Geometric Visual Comparative
Analysis of Neural Network Pruning Approaches [29.048660060344574]
This work aims to shed light on how different pruning methods alter the network's internal feature representation and the corresponding impact on model performance.
We introduce a visual geometric analysis of feature representations to compare and highlight the impact of pruning on model performance and feature representation.
The proposed tool provides an environment for in-depth comparison of pruning methods and a comprehensive understanding of how model response to common data corruption.
arXiv Detail & Related papers (2022-06-16T04:44:13Z) - Automatic Block-wise Pruning with Auxiliary Gating Structures for Deep
Convolutional Neural Networks [9.293334856614628]
This paper presents a novel structured network pruning method with auxiliary gating structures.
Our experiments demonstrate that our method can achieve state-of-the-arts compression performance for the classification tasks.
arXiv Detail & Related papers (2022-05-07T09:03:32Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - Deep Variational Models for Collaborative Filtering-based Recommender
Systems [63.995130144110156]
Deep learning provides accurate collaborative filtering models to improve recommender system results.
Our proposed models apply the variational concept to injectity in the latent space of the deep architecture.
Results show the superiority of the proposed approach in scenarios where the variational enrichment exceeds the injected noise effect.
arXiv Detail & Related papers (2021-07-27T08:59:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.