Synaptic Pruning: A Biological Inspiration for Deep Learning Regularization
- URL: http://arxiv.org/abs/2508.09330v2
- Date: Sun, 05 Oct 2025 07:16:18 GMT
- Title: Synaptic Pruning: A Biological Inspiration for Deep Learning Regularization
- Authors: Gideon Vos, Liza van Eijk, Zoltan Sarnyai, Mostafa Rahimi Azghadi,
- Abstract summary: We propose a magnitude-based synaptic pruning method that better reflects biology.<n> integrated directly into the training loop as a dropout replacement.<n>Experiments on multiple time series forecasting models including RNN, LSTM, and Patch Time Series Transformer show consistent gains.
- Score: 4.080348897270072
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Synaptic pruning in biological brains removes weak connections to improve efficiency. In contrast, dropout regularization in artificial neural networks randomly deactivates neurons without considering activity-dependent pruning. We propose a magnitude-based synaptic pruning method that better reflects biology by progressively removing low-importance connections during training. Integrated directly into the training loop as a dropout replacement, our approach computes weight importance from absolute magnitudes across layers and applies a cubic schedule to gradually increase global sparsity. At fixed intervals, pruning masks permanently remove low-importance weights while maintaining gradient flow for active ones, eliminating the need for separate pruning and fine-tuning phases. Experiments on multiple time series forecasting models including RNN, LSTM, and Patch Time Series Transformer across four datasets show consistent gains. Our method ranked best overall, with statistically significant improvements confirmed by Friedman tests (p < 0.01). In financial forecasting, it reduced Mean Absolute Error by up to 20% over models with no or standard dropout, and up to 52% in select transformer models. This dynamic pruning mechanism advances regularization by coupling weight elimination with progressive sparsification, offering easy integration into diverse architectures. Its strong performance, especially in financial time series forecasting, highlights its potential as a practical alternative to conventional dropout techniques.
Related papers
- DropoutTS: Sample-Adaptive Dropout for Robust Time Series Forecasting [59.868414584142336]
DropoutTS is a model-agnostic plugin that shifts the paradigm from "what" to "how much" to learn.<n>It maps noise to adaptive dropout rates - selectively suppressing spurious fluctuations while preserving fine-grained fidelity.
arXiv Detail & Related papers (2026-01-29T13:49:20Z) - Plug-and-Play Homeostatic Spark: Zero-Cost Acceleration for SNN Training Across Paradigms [40.57310813106791]
Spiking neural networks offer event driven computation, sparse activation, and hardware efficiency, yet training often converges slowly and lacks stability.<n>We present Adaptive Homeostatic Spiking Activity Regulation (AHSAR), an extremely simple plug in and training paradigm method.<n>AHSAR stabilizes optimization and accelerates convergence without changing the model architecture, loss, or gradients.
arXiv Detail & Related papers (2025-12-04T17:26:46Z) - NIRVANA: Structured pruning reimagined for large language models compression [50.651730342011014]
We introduce NIRVANA, a novel pruning method designed to balance immediate zero-shot preservation accuracy with robust fine-tuning.<n>To further address the unique challenges posed by structured pruning, NIRVANA incorporates an adaptive sparsity allocation mechanism across layers and modules.<n>Experiments conducted on Llama3, Qwen, T5 models demonstrate that NIRVANA outperforms existing structured pruning methods under equivalent sparsity constraints.
arXiv Detail & Related papers (2025-09-17T17:59:00Z) - Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks [59.552873049024775]
We show that compute-optimally trained models exhibit a remarkably precise universality.<n>With learning rate decay, the collapse becomes so tight that differences in the normalized curves across models fall below the noise floor.<n>We explain these phenomena by connecting collapse to the power-law structure in typical neural scaling laws.
arXiv Detail & Related papers (2025-07-02T20:03:34Z) - Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.<n>Non-smooth regularization is often incorporated into machine learning tasks.<n>We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z) - Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of
Neurons [27.289945121113277]
We introduce DemP, a method that controls the proliferation of dead neurons, dynamically leading to sparsity.
Experiments on CIFAR10 and ImageNet datasets demonstrate superior accuracy-sparsity tradeoffs.
arXiv Detail & Related papers (2024-03-12T14:28:06Z) - Uncertainty-Aware Deep Attention Recurrent Neural Network for
Heterogeneous Time Series Imputation [0.25112747242081457]
Missingness is ubiquitous in multivariate time series and poses an obstacle to reliable downstream analysis.
We propose DEep Attention Recurrent Imputation (Imputation), which jointly estimates missing values and their associated uncertainty.
Experiments show that I surpasses the SOTA in diverse imputation tasks using real-world datasets.
arXiv Detail & Related papers (2024-01-04T13:21:11Z) - Learning a Consensus Sub-Network with Polarization Regularization and One Pass Training [2.895034191799291]
Pruning schemes create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph.<n>We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks.<n>Our results on CIFAR-10, CIFAR-100, and Tiny Imagenet suggest that our scheme can remove 50% of connections in deep networks with 1% reduction in classification accuracy.
arXiv Detail & Related papers (2023-02-17T09:37:17Z) - Desire Backpropagation: A Lightweight Training Algorithm for Multi-Layer
Spiking Neural Networks based on Spike-Timing-Dependent Plasticity [13.384228628766236]
Spiking neural networks (SNNs) are a viable alternative to conventional artificial neural networks.
We present desire backpropagation, a method to derive the desired spike activity of all neurons, including the hidden ones.
We trained three-layer networks to classify MNIST and Fashion-MNIST images and reached an accuracy of 98.41% and 87.56%, respectively.
arXiv Detail & Related papers (2022-11-10T08:32:13Z) - Distribution Mismatch Correction for Improved Robustness in Deep Neural
Networks [86.42889611784855]
normalization methods increase the vulnerability with respect to noise and input corruptions.
We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer.
In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions.
arXiv Detail & Related papers (2021-10-05T11:36:25Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.