On How Iterative Magnitude Pruning Discovers Local Receptive Fields in Fully Connected Neural Networks
- URL: http://arxiv.org/abs/2412.06545v3
- Date: Mon, 10 Mar 2025 20:43:54 GMT
- Title: On How Iterative Magnitude Pruning Discovers Local Receptive Fields in Fully Connected Neural Networks
- Authors: William T. Redman, Zhangyang Wang, Alessandro Ingrosso, Sebastian Goldt,
- Abstract summary: iterative magnitude pruning (IMP) has become a popular method for extracting sparse convolutionalworks that can be trained to high performance.<n>Recent work showed that applying IMP to fully connected neural networks (FCNs) leads to the emergence of local receptive fields (RFs)<n>Inspired by results showing that training on synthetic images with highly non-Gaussian statistics (e.g., sharp edges) is sufficient to drive the emergence of local RFs in FCNs, we hypothesize that IMP iteratively increases the non-Gaussian statistics of FCN representations.
- Score: 92.66231524298554
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Since its use in the Lottery Ticket Hypothesis, iterative magnitude pruning (IMP) has become a popular method for extracting sparse subnetworks that can be trained to high performance. Despite its success, the mechanism that drives the success of IMP remains unclear. One possibility is that IMP is capable of extracting subnetworks with good inductive biases that facilitate performance. Supporting this idea, recent work showed that applying IMP to fully connected neural networks (FCNs) leads to the emergence of local receptive fields (RFs), a feature of mammalian visual cortex and convolutional neural networks that facilitates image processing. However, it remains unclear why IMP would uncover localized features in the first place. Inspired by results showing that training on synthetic images with highly non-Gaussian statistics (e.g., sharp edges) is sufficient to drive the emergence of local RFs in FCNs, we hypothesize that IMP iteratively increases the non-Gaussian statistics of FCN representations, creating a feedback loop that enhances localization. Here, we demonstrate first that non-Gaussian input statistics are indeed necessary for IMP to discover localized RFs. We then develop a new method for measuring the effect of individual weights on the statistics of the FCN representations ("cavity method"), which allows us to show that IMP systematically increases the non-Gaussianity of pre-activations, leading to the formation of localized RFs. Our work, which is the first to study the effect of IMP on the statistics of the representations of neural networks, sheds parsimonious light on one way in which IMP can drive the formation of strong inductive biases.
Related papers
- Bayesian Reasoning Enabled by Spin-Orbit Torque Magnetic Tunnel Junctions [7.081096702778852]
We present proof-of-concept experiments demonstrating the use of spin-orbit torque magnetic tunnel junctions (SOT-MTJs) in Bayesian network reasoning.
The parameters of the network can also approach the optimum through a simple point-by-point training algorithm.
We developed a simple medical diagnostic system using the SOT-MTJ as a random number generator and sampler.
arXiv Detail & Related papers (2025-04-11T05:02:27Z) - Out-of-Distribution Detection using Neural Activation Prior [15.673290330356194]
Out-of-distribution detection (OOD) is a crucial technique for deploying machine learning models in the real world.
We propose a simple yet effective Neural Activation Prior (NAP) for OOD detection.
Our method achieves the state-of-the-art performance on CIFAR benchmark and ImageNet dataset.
arXiv Detail & Related papers (2024-02-28T08:45:07Z) - Beyond IID weights: sparse and low-rank deep Neural Networks are also Gaussian Processes [3.686808512438363]
We extend the proof of Matthews et al. to a larger class of initial weight distributions.
We show that fully-connected and convolutional networks with PSEUDO-IID distributions are all effectively equivalent up to their variance.
Using our results, one can identify the Edge-of-Chaos for a broader class of neural networks and tune them at criticality in order to enhance their training.
arXiv Detail & Related papers (2023-10-25T12:38:36Z) - Approximate Thompson Sampling via Epistemic Neural Networks [26.872304174606278]
Epistemic neural networks (ENNs) are designed to produce accurate joint predictive distributions.
We show that ENNs serve this purpose well and illustrate how the quality of joint predictive distributions drives performance.
arXiv Detail & Related papers (2023-02-18T01:58:15Z) - Why Neural Networks Work [0.32228025627337864]
We argue that many properties of fully-connected feedforward neural networks (FCNNs) are explainable from the analysis of a single pair of operations.
We show how expand-and-sparsify can explain the observed phenomena that have been discussed in the literature.
arXiv Detail & Related papers (2022-11-26T18:15:17Z) - Increasing the Accuracy of a Neural Network Using Frequency Selective
Mesh-to-Grid Resampling [4.211128681972148]
We propose the use of keypoint frequency selective mesh-to-grid resampling (FSMR) for the processing of input data for neural networks.
We show that depending on the network architecture and classification task the application of FSMR during training aids learning process.
The classification accuracy can be increased by up to 4.31 percentage points for ResNet50 and the Oxflower17 dataset.
arXiv Detail & Related papers (2022-09-28T21:34:47Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - FF-NSL: Feed-Forward Neural-Symbolic Learner [70.978007919101]
This paper introduces a neural-symbolic learning framework, called Feed-Forward Neural-Symbolic Learner (FF-NSL)
FF-NSL integrates state-of-the-art ILP systems based on the Answer Set semantics, with neural networks, in order to learn interpretable hypotheses from labelled unstructured data.
arXiv Detail & Related papers (2021-06-24T15:38:34Z) - A tensor network representation of path integrals: Implementation and
analysis [0.0]
We introduce a novel tensor network-based decomposition of path integral simulations involving Feynman-Vernon influence functional.
The finite temporarily non-local interactions introduced by the influence functional can be captured very efficiently using matrix product state representation.
The flexibility of the AP-TNPI framework makes it a promising new addition to the family of path integral methods for non-equilibrium quantum dynamics.
arXiv Detail & Related papers (2021-06-23T16:41:54Z) - Towards Evaluating and Training Verifiably Robust Neural Networks [81.39994285743555]
We study the relationship between IBP and CROWN, and prove that CROWN is always tighter than IBP when choosing appropriate bounding lines.
We propose a relaxed version of CROWN, linear bound propagation (LBP), that can be used to verify large networks to obtain lower verified errors.
arXiv Detail & Related papers (2021-04-01T13:03:48Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.