Towards Scalable Lottery Ticket Networks using Genetic Algorithms
- URL: http://arxiv.org/abs/2508.08877v1
- Date: Tue, 12 Aug 2025 12:03:21 GMT
- Title: Towards Scalable Lottery Ticket Networks using Genetic Algorithms
- Authors: Julian Schönberger, Maximilian Zorn, Jonas Nüßlein, Thomas Gabor, Philipp Altmann,
- Abstract summary: This work explores the usage of genetic algorithms for identifying strong lottery ticketworks.<n>We find that for instances of binary and multi-class classification tasks, our approach achieves better accuracies and sparsity levels than the current state-of-the-art.
- Score: 2.9144530940712707
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Building modern deep learning systems that are not just effective but also efficient requires rethinking established paradigms for model training and neural architecture design. Instead of adapting highly overparameterized networks and subsequently applying model compression techniques to reduce resource consumption, a new class of high-performing networks skips the need for expensive parameter updates, while requiring only a fraction of parameters, making them highly scalable. The Strong Lottery Ticket Hypothesis posits that within randomly initialized, sufficiently overparameterized neural networks, there exist subnetworks that can match the accuracy of the trained original model-without any training. This work explores the usage of genetic algorithms for identifying these strong lottery ticket subnetworks. We find that for instances of binary and multi-class classification tasks, our approach achieves better accuracies and sparsity levels than the current state-of-the-art without requiring any gradient information. In addition, we provide justification for the need for appropriate evaluation metrics when scaling to more complex network architectures and learning tasks.
Related papers
- Manifold meta-learning for reduced-complexity neural system identification [1.0276024900942875]
We propose a meta-learning framework that discovers a low-dimensional manifold.<n>This manifold is learned from a meta-dataset of input-output sequences generated by a class of related dynamical systems.<n>Unlike bilevel meta-learning approaches, our method employs an auxiliary neural network to map datasets directly onto the learned manifold.
arXiv Detail & Related papers (2025-04-16T06:49:56Z) - Automatic Construction of Pattern Classifiers Capable of Continuous Incremental Learning and Unlearning Tasks Based on Compact-Sized Probabilistic Neural Network [0.0]
This paper proposes a novel approach to pattern classification using a probabilistic neural network model.<n>The strategy is based on a compact-sized probabilistic neural network capable of continuous incremental learning and unlearning tasks.
arXiv Detail & Related papers (2025-01-01T05:02:53Z) - Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Low-rank lottery tickets: finding efficient low-rank neural networks via
matrix differential equations [2.3488056916440856]
We propose a novel algorithm to find efficient low-rankworks.
Theseworks are determined and adapted already during the training phase.
Our method automatically and dynamically adapts the ranks during training to achieve a desired approximation accuracy.
arXiv Detail & Related papers (2022-05-26T18:18:12Z) - Towards Disentangling Information Paths with Coded ResNeXt [11.884259630414515]
We take a novel approach to enhance the transparency of the function of the whole network.
We propose a neural network architecture for classification, in which the information that is relevant to each class flows through specific paths.
arXiv Detail & Related papers (2022-02-10T21:45:49Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - Provably Training Neural Network Classifiers under Fairness Constraints [70.64045590577318]
We show that overparametrized neural networks could meet the constraints.
Key ingredient of building a fair neural network classifier is establishing no-regret analysis for neural networks.
arXiv Detail & Related papers (2020-12-30T18:46:50Z) - On transfer learning of neural networks using bi-fidelity data for
uncertainty propagation [0.0]
We explore the application of transfer learning techniques using training data generated from both high- and low-fidelity models.
In the former approach, a neural network model mapping the inputs to the outputs of interest is trained based on the low-fidelity data.
The high-fidelity data is then used to adapt the parameters of the upper layer(s) of the low-fidelity network, or train a simpler neural network to map the output of the low-fidelity network to that of the high-fidelity model.
arXiv Detail & Related papers (2020-02-11T15:56:11Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.