The rise of the lottery heroes: why zero-shot pruning is hard
- URL: http://arxiv.org/abs/2202.12400v1
- Date: Thu, 24 Feb 2022 22:49:36 GMT
- Title: The rise of the lottery heroes: why zero-shot pruning is hard
- Authors: Enzo Tartaglione
- Abstract summary: Recent advances in deep learning optimization showed that just a subset of parameters are really necessary to successfully train a model.
Finding these trainable sub-networks is a typically costly process.
This inhibits practical applications: can the learned sub-graph structures in deep learning models be found at training time?
- Score: 3.1473798197405944
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in deep learning optimization showed that just a subset of
parameters are really necessary to successfully train a model. Potentially,
such a discovery has broad impact from the theory to application; however, it
is known that finding these trainable sub-network is a typically costly
process. This inhibits practical applications: can the learned sub-graph
structures in deep learning models be found at training time? In this work we
explore such a possibility, observing and motivating why common approaches
typically fail in the extreme scenarios of interest, and proposing an approach
which potentially enables training with reduced computational effort. The
experiments on either challenging architectures and datasets suggest the
algorithmic accessibility over such a computational gain, and in particular a
trade-off between accuracy achieved and training complexity deployed emerges.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Training Neural Networks with Internal State, Unconstrained
Connectivity, and Discrete Activations [66.53734987585244]
True intelligence may require the ability of a machine learning model to manage internal state.
We show that we have not yet discovered the most effective algorithms for training such models.
We present one attempt to design such a training algorithm, applied to an architecture with binary activations and only a single matrix of weights.
arXiv Detail & Related papers (2023-12-22T01:19:08Z) - Efficient Sub-structured Knowledge Distillation [52.5931565465661]
We propose an approach that is much simpler in its formulation and far more efficient for training than existing approaches.
We transfer the knowledge from a teacher model to its student model by locally matching their predictions on all sub-structures, instead of the whole output space.
arXiv Detail & Related papers (2022-03-09T15:56:49Z) - Review of Pedestrian Trajectory Prediction Methods: Comparing Deep
Learning and Knowledge-based Approaches [0.0]
This paper compares deep learning algorithms with classical knowledge-based models that are widely used to simulate pedestrian dynamics.
The ability of deep-learning algorithms for large-scale simulation and the description of collective dynamics remains to be demonstrated.
arXiv Detail & Related papers (2021-11-11T08:35:14Z) - An Operator Theoretic Perspective on Pruning Deep Neural Networks [2.624902795082451]
We make use of recent advances in dynamical systems theory to define a new class of theoretically motivated pruning algorithms.
We show that these algorithms can be equivalent to magnitude and gradient based pruning, unifying these seemingly disparate methods.
arXiv Detail & Related papers (2021-10-28T02:33:50Z) - Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective.
In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems.
We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z) - Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep
Learning [29.473503894240096]
We focus on the use of the categorical cross-entropy loss to model data that is not strictly categorical, but rather takes values on the simplex.
This practice is standard in neural network architectures with label smoothing and actor-mimic reinforcement learning, amongst others.
We propose probabilistically-inspired alternatives to these models, providing an approach that is more principled and theoretically appealing.
arXiv Detail & Related papers (2020-11-10T16:44:35Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z) - Towards Practical Lottery Ticket Hypothesis for Adversarial Training [78.30684998080346]
We show there exists a subset of the aforementioned sub-networks that converge significantly faster during the training process.
As a practical application of our findings, we demonstrate that such sub-networks can help in cutting down the total time of adversarial training.
arXiv Detail & Related papers (2020-03-06T03:11:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.