Related papers: Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning

Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning

URL: http://arxiv.org/abs/2406.01820v1
Date: Mon, 3 Jun 2024 22:19:42 GMT
Title: Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning
Authors: Leonardo Iurada, Marco Ciccone, Tatiana Tommasi,
Abstract summary: We propose an algorithm to align the training dynamics of the sparse network with that of the dense one. We show how the usually neglected data-dependent component in the NTK's spectrum can be taken into account. Path eXclusion (PX) is able to find lottery tickets even at high sparsity levels.
Score: 14.792099973449794
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in neural network pruning have shown how it is possible to reduce the computational costs and memory demands of deep learning models before training. We focus on this framework and propose a new pruning at initialization algorithm that leverages the Neural Tangent Kernel (NTK) theory to align the training dynamics of the sparse network with that of the dense one. Specifically, we show how the usually neglected data-dependent component in the NTK's spectrum can be taken into account by providing an analytical upper bound to the NTK's trace obtained by decomposing neural networks into individual paths. This leads to our Path eXclusion (PX), a foresight pruning method designed to preserve the parameters that mostly influence the NTK's trace. PX is able to find lottery tickets (i.e. good paths) even at high sparsity levels and largely reduces the need for additional training. When applied to pre-trained models it extracts subnetworks directly usable for several downstream tasks, resulting in performance comparable to those of the dense counterpart but with substantial cost and computational savings. Code available at: https://github.com/iurada/px-ntk-pruning

Related papers

Testing RadiX-Nets: Advances in Viable Sparse Topologies [0.9555447998395205]
Sparsification of hyper-parametrized deep neural networks (DNNs) creates simpler representations of complex data. RadiX-Nets, a subgroup of DNNs, maintain runtime which counteracts their lack of neural connections. This paper presents a testing suite for RadiX-Nets in scalable models.
arXiv Detail & Related papers (2023-11-06T23:27:28Z)
Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network. We provide analytical expressions for these speed limits for linear and linearizable neural networks. Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z)
NTK-SAP: Improving neural network pruning by aligning training dynamics [13.887349224871045]
Recent advances in neural kernel (NTK) theory suggest that the training dynamics of large enough neural networks is closely related to the spectrum of the NTK. We propose to prune the connections that have the least influence on the spectrum of the NTK. We name our foresight pruning algorithm Neural Kernel Spectrum-Aware Pruning (NTK-SAP)
arXiv Detail & Related papers (2023-04-06T03:10:03Z)
Neural Capacitance: A New Perspective of Neural Network Selection via Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction. We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training. Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
A quantum algorithm for training wide and deep classical neural networks [72.2614468437919]
We show that conditions amenable to classical trainability via gradient descent coincide with those necessary for efficiently solving quantum linear systems. We numerically demonstrate that the MNIST image dataset satisfies such conditions. We provide empirical evidence for $O(log n)$ training of a convolutional neural network with pooling.
arXiv Detail & Related papers (2021-07-19T23:41:03Z)
Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets. We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels. We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z)
Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network. We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z)
A Revision of Neural Tangent Kernel-based Approaches for Neural Networks [34.75076385561115]
We use the neural tangent kernel to show that networks can fit any finite training sample perfectly. A simple and analytic kernel function was derived as indeed equivalent to a fully-trained network. Our tighter analysis resolves the scaling problem and enables the validation of the original NTK-based results.
arXiv Detail & Related papers (2020-07-02T05:07:55Z)
Pruning neural networks without any data by iteratively conserving synaptic flow [27.849332212178847]
Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainableworks. We provide an affirmative answer to this question through theory driven algorithm design.
arXiv Detail & Related papers (2020-06-09T19:21:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.