Related papers: Projection-Free CNN Pruning via Frank-Wolfe with Momentum: Sparser Models with Less Pretraining

Projection-Free CNN Pruning via Frank-Wolfe with Momentum: Sparser Models with Less Pretraining

URL: http://arxiv.org/abs/2512.01147v1
Date: Sun, 30 Nov 2025 23:48:53 GMT
Title: Projection-Free CNN Pruning via Frank-Wolfe with Momentum: Sparser Models with Less Pretraining
Authors: Hamza ElMokhtar Shili, Natasha Patnaik, Isabelle Ruble, Kathryn Jarjoura, Daniel Suarez Aguirre,
Abstract summary: "Lottery Ticket Hypothesis" suggests existence of smaller sub-networks within larger pre-trained networks that perform comparatively well.<n>We compare simple magnitude-based pruning, a Frank-Wolfe style pruning scheme, and an FW method with momentum on a CNN trained on MNIST.<n>We find that FW with momentum yields pruned networks that are both sparser and more accurate than the original dense model.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate algorithmic variants of the Frank-Wolfe (FW) optimization method for pruning convolutional neural networks. This is motivated by the "Lottery Ticket Hypothesis", which suggests the existence of smaller sub-networks within larger pre-trained networks that perform comparatively well (if not better). Whilst most literature in this area focuses on Deep Neural Networks more generally, we specifically consider Convolutional Neural Networks for image classification tasks. Building on the hypothesis, we compare simple magnitude-based pruning, a Frank-Wolfe style pruning scheme, and an FW method with momentum on a CNN trained on MNIST. Our experiments track test accuracy, loss, sparsity, and inference time as we vary the dense pre-training budget from 1 to 10 epochs. We find that FW with momentum yields pruned networks that are both sparser and more accurate than the original dense model and the simple pruning baselines, while incurring minimal inference-time overhead in our implementation. Moreover, FW with momentum reaches these accuracies after only a few epochs of pre-training, indicating that full pre-training of the dense model is not required in this setting.

Related papers

Quantization vs Pruning: Insights from the Strong Lottery Ticket Hypothesis [5.494111035517599]
Quantization is an essential technique for making neural networks more efficient, yet our theoretical understanding of it remains limited.<n>Previous works demonstrated that extremely low-precision networks, such as binary networks, can be constructed by pruning large, randomly- approximationd networks.<n>We build on foundational results by Borgs et al. on the Number Partitioning Problem to derive new theoretical results for the Random Subset Sum Problem in a quantized setting.
arXiv Detail & Related papers (2025-08-14T18:51:34Z)
Concurrent Training and Layer Pruning of Deep Neural Networks [0.0]
We propose an algorithm capable of identifying and eliminating irrelevant layers of a neural network during the early stages of training. We employ a structure using residual connections around nonlinear network sections that allow the flow of information through the network once a nonlinear section is pruned.
arXiv Detail & Related papers (2024-06-06T23:19:57Z)
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks. We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z)
Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network. We provide analytical expressions for these speed limits for linear and linearizable neural networks. Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z)
On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK) In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z)
The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training [111.15069968583042]
Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training. We empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent. Our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning.
arXiv Detail & Related papers (2022-02-05T21:19:41Z)
Neural Capacitance: A New Perspective of Neural Network Selection via Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction. We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training. Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
A Framework for Neural Network Pruning Using Gibbs Distributions [34.0576955010317]
Gibbs pruning is a novel framework for expressing and designing neural network pruning methods. It can train and prune a network simultaneously in such a way that the learned weights and pruning mask are well-adapted for each other. We achieve a new state-of-the-art result for pruning ResNet-56 with the CIFAR-10 dataset.
arXiv Detail & Related papers (2020-06-08T23:04:53Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.