Dynamic Collective Intelligence Learning: Finding Efficient Sparse Model
via Refined Gradients for Pruned Weights
- URL: http://arxiv.org/abs/2109.04660v2
- Date: Mon, 31 Jul 2023 23:10:45 GMT
- Title: Dynamic Collective Intelligence Learning: Finding Efficient Sparse Model
via Refined Gradients for Pruned Weights
- Authors: Jangho Kim, Jayeon Yoo, Yeji Song, KiYoon Yoo, Nojun Kwak
- Abstract summary: Dynamic pruning methods try to find diverse sparsity patterns during training by utilizing Straight-Through-Estimator (STE) to approximate gradients of pruned weights.
We introduce refined gradients to update the pruned weights by forming dual forwarding paths from two sets (pruned and unpruned) of weights.
We propose a novel Dynamic Collective Intelligence Learning (DCIL) which makes use of the learning synergy between the collective intelligence of both weight sets.
- Score: 31.68257673664519
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the growth of deep neural networks (DNN), the number of DNN parameters
has drastically increased. This makes DNN models hard to be deployed on
resource-limited embedded systems. To alleviate this problem, dynamic pruning
methods have emerged, which try to find diverse sparsity patterns during
training by utilizing Straight-Through-Estimator (STE) to approximate gradients
of pruned weights. STE can help the pruned weights revive in the process of
finding dynamic sparsity patterns. However, using these coarse gradients causes
training instability and performance degradation owing to the unreliable
gradient signal of the STE approximation. In this work, to tackle this issue,
we introduce refined gradients to update the pruned weights by forming dual
forwarding paths from two sets (pruned and unpruned) of weights. We propose a
novel Dynamic Collective Intelligence Learning (DCIL) which makes use of the
learning synergy between the collective intelligence of both weight sets. We
verify the usefulness of the refined gradients by showing enhancements in the
training stability and the model performance on the CIFAR and ImageNet
datasets. DCIL outperforms various previously proposed pruning schemes
including other dynamic pruning methods with enhanced stability during
training.
Related papers
- FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental
Regularization [5.182014186927254]
Federated Learning (FL) has been successfully adopted for distributed training and inference of large-scale Deep Neural Networks (DNNs)
We contribute with a novel FL framework (coined FedDIP) which combines (i) dynamic model pruning with error feedback to eliminate redundant information exchange.
We provide convergence analysis of FedDIP and report on a comprehensive performance and comparative assessment against state-of-the-art methods.
arXiv Detail & Related papers (2023-09-13T08:51:19Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories.
We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Inverse-Dirichlet Weighting Enables Reliable Training of Physics
Informed Neural Networks [2.580765958706854]
We describe and remedy a failure mode that may arise from multi-scale dynamics with scale imbalances during training of deep neural networks.
PINNs are popular machine-learning templates that allow for seamless integration of physical equation models with data.
For inverse modeling using sequential training, we find that inverse-Dirichlet weighting protects a PINN against catastrophic forgetting.
arXiv Detail & Related papers (2021-07-02T10:01:37Z) - Training Generative Adversarial Networks by Solving Ordinary
Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training.
From this perspective, we hypothesise that instabilities in training GANs arise from the integration error.
We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z) - Neural networks with late-phase weights [66.72777753269658]
We show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning.
At the end of learning, we obtain back a single model by taking a spatial average in weight space.
arXiv Detail & Related papers (2020-07-25T13:23:37Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.