Related papers: Linear Mode Connectivity in Sparse Neural Networks

Linear Mode Connectivity in Sparse Neural Networks

URL: http://arxiv.org/abs/2310.18769v1
Date: Sat, 28 Oct 2023 17:51:39 GMT
Title: Linear Mode Connectivity in Sparse Neural Networks
Authors: Luke McDermott, Daniel Cummings
Abstract summary: We study how neural network pruning with synthetic data leads to sparse networks with unique training properties. We find that these properties lead to syntheticworks matching the performance of traditional IMP with up to 150x less training points in settings where distilled data applies.
Score: 1.30536490219656
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the rise in interest of sparse neural networks, we study how neural network pruning with synthetic data leads to sparse networks with unique training properties. We find that distilled data, a synthetic summarization of the real data, paired with Iterative Magnitude Pruning (IMP) unveils a new class of sparse networks that are more stable to SGD noise on the real data, than either the dense model, or subnetworks found with real data in IMP. That is, synthetically chosen subnetworks often train to the same minima, or exhibit linear mode connectivity. We study this through linear interpolation, loss landscape visualizations, and measuring the diagonal of the hessian. While dataset distillation as a field is still young, we find that these properties lead to synthetic subnetworks matching the performance of traditional IMP with up to 150x less training points in settings where distilled data applies.

Related papers

Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework. We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
NIDS Neural Networks Using Sliding Time Window Data Processing with Trainable Activations and its Generalization Capability [0.0]
This paper presents neural networks for network intrusion detection systems (NIDS) that operate on flow data preprocessed with a time window. It requires only eleven features which do not rely on deep packet inspection and can be found in most NIDS datasets and easily obtained from conventional flow collectors. The reported training accuracy exceeds 99% for the proposed method with as little as twenty neural network input features.
arXiv Detail & Related papers (2024-10-24T11:36:19Z)
Steinmetz Neural Networks for Complex-Valued Data [23.80312814400945]
We introduce a new approach to processing complex-valued data using DNNs consisting of parallel real-valuedetzworks with coupled outputs. Our proposed class of architectures, referred to as Steinmetz Neural Networks, leverage multi-view learning to construct more interpretable representations within the latent space. Our numerical experiments depict the improved performance and to additive noise, afforded by these networks on benchmark datasets and synthetic examples.
arXiv Detail & Related papers (2024-09-16T08:26:06Z)
SynA-ResNet: Spike-driven ResNet Achieved through OR Residual Connection [10.702093960098104]
Spiking Neural Networks (SNNs) have garnered substantial attention in brain-like computing for their biological fidelity and the capacity to execute energy-efficient spike-driven operations. We propose a novel training paradigm that first accumulates a large amount of redundant information through OR Residual Connection (ORRC) We then filters out the redundant information using the Synergistic Attention (SynA) module, which promotes feature extraction in the backbone while suppressing the influence of noise and useless features in the shortcuts.
arXiv Detail & Related papers (2023-11-11T13:36:27Z)
Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD. We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z)
ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions. Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z)
Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs) It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
Exploring explicit coarse-grained structure in artificial neural networks [0.0]
We propose to employ the hierarchical coarse-grained structure in the artificial neural networks explicitly to improve the interpretability without degrading performance. One is a neural network called TaylorNet, which aims to approximate the general mapping from input data to output result in terms of Taylor series directly. The other is a new setup for data distillation, which can perform multi-level abstraction of the input dataset and generate new data.
arXiv Detail & Related papers (2022-11-03T13:06:37Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data. We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z)
PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context. We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z)
Finding trainable sparse networks through Neural Tangent Transfer [16.092248433189816]
In deep learning, trainable sparse networks that perform well on a specific task are usually constructed using label-dependent pruning criteria. In this article, we introduce Neural Tangent Transfer, a method that instead finds trainable sparse networks in a label-free manner.
arXiv Detail & Related papers (2020-06-15T08:58:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.