Related papers: A Unified Paths Perspective for Pruning at Initialization

A Unified Paths Perspective for Pruning at Initialization

URL: http://arxiv.org/abs/2101.10552v1
Date: Tue, 26 Jan 2021 04:29:50 GMT
Title: A Unified Paths Perspective for Pruning at Initialization
Authors: Thomas Gebhart, Udit Saxena, Paul Schrater
Abstract summary: We introduce the Path Kernel as the data-independent factor in a decomposition of the Neural Tangent Kernel. We show the global structure of the Path Kernel can be computed efficiently. We analyze the use of this structure in approximating training and generalization performance of networks in the absence of data.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A number of recent approaches have been proposed for pruning neural network parameters at initialization with the goal of reducing the size and computational burden of models while minimally affecting their training dynamics and generalization performance. While each of these approaches have some amount of well-founded motivation, a rigorous analysis of the effect of these pruning methods on network training dynamics and their formal relationship to each other has thus far received little attention. Leveraging recent theoretical approximations provided by the Neural Tangent Kernel, we unify a number of popular approaches for pruning at initialization under a single path-centric framework. We introduce the Path Kernel as the data-independent factor in a decomposition of the Neural Tangent Kernel and show the global structure of the Path Kernel can be computed efficiently. This Path Kernel decomposition separates the architectural effects from the data-dependent effects within the Neural Tangent Kernel, providing a means to predict the convergence dynamics of a network from its architecture alone. We analyze the use of this structure in approximating training and generalization performance of networks in the absence of data across a number of initialization pruning approaches. Observing the relationship between input data and paths and the relationship between the Path Kernel and its natural norm, we additionally propose two augmentations of the SynFlow algorithm for pruning at initialization.

Related papers

Path-conditioned training: a principled way to rescale ReLU neural networks [15.875889029027915]
We build on the recent path-lifting framework, which provides a compact factorization of ReLU networks.<n>We introduce a geometrically motivated criterion to rescale neural network parameters.<n>We derive an efficient algorithm to perform this alignment.
arXiv Detail & Related papers (2026-02-23T12:55:48Z)
A new initialisation to Control Gradients in Sinusoidal Neural network [9.341735544356167]
We propose a new initialisation for networks with sinusoidal activation functions such as textttSIREN.<n> Controlling both gradients and targeting vanishing pre-activation helps preventing the emergence of inappropriate frequencies during estimation.<n>New initialisation consistently outperforms state-of-the-art methods across a wide range of reconstruction tasks.
arXiv Detail & Related papers (2025-12-06T13:23:03Z)
Initialization Schemes for Kolmogorov-Arnold Networks: An Empirical Study [9.450853542720909]
Kolmogorov-Arnold Networks (KANs) are a recently introduced neural architecture that replace fixed nonlinearities with trainable activation functions.<n>This work proposes two theory-driven approaches inspired by LeCun and Glorot, as well as an empirical power-law family with tunable exponents.
arXiv Detail & Related papers (2025-09-03T15:45:28Z)
Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel [55.82768375605861]
We establish a generalization bound for gradient flow that aligns with the classical Rademacher complexity for kernel methods.<n>Unlike static kernels such as NTK, the LPK captures the entire training trajectory, adapting to both data and optimization dynamics.
arXiv Detail & Related papers (2025-06-12T23:17:09Z)
From Theory to Application: A Practical Introduction to Neural Operators in Scientific Computing [0.0]
The study covers foundational models such as Deep Operator Networks (DeepONet) and Principal Component Analysis-based Neural Networks (PCANet) The review delves into applying neural operators as surrogates in Bayesian inference problems, showcasing their effectiveness in accelerating posterior inference while maintaining accuracy. It outlines emerging strategies to address these issues, such as residual-based error correction and multi-level training.
arXiv Detail & Related papers (2025-03-07T17:25:25Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Simple initialization and parametrization of sinusoidal networks via their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions. We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis. We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z)
A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms. We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z)
Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks [18.27510863075184]
We analyze feature learning in infinite width neural networks trained with gradient flow through a self-consistent dynamical field theory. We construct a collection of deterministic dynamical order parameters which are inner-product kernels for hidden unit activations and gradients in each layer at pairs of time points.
arXiv Detail & Related papers (2022-05-19T16:10:10Z)
Robust Learning of Parsimonious Deep Neural Networks [0.0]
We propose a simultaneous learning and pruning algorithm capable of identifying and eliminating irrelevant structures in a neural network. We derive a novel hyper-prior distribution over the prior parameters that is crucial for their optimal selection. We evaluate the proposed algorithm on the MNIST data set and commonly used fully connected and convolutional LeNet architectures.
arXiv Detail & Related papers (2022-05-10T03:38:55Z)
Training Integrable Parameterizations of Deep Neural Networks in the Infinite-Width Limit [0.0]
Large-width dynamics has emerged as a fruitful viewpoint and led to practical insights on real-world deep networks. For two-layer neural networks, it has been understood that the nature of the trained model radically changes depending on the scale of the initial random weights. We propose various methods to avoid this trivial behavior and analyze in detail the resulting dynamics.
arXiv Detail & Related papers (2021-10-29T07:53:35Z)
Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error. Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z)
An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d) This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
Fiedler Regularization: Learning Neural Networks with Graph Sparsity [6.09170287691728]
We introduce a novel regularization approach for deep learning that incorporates and respects the underlying graphical structure of the neural network. We propose to use the Fiedler value of the neural network's underlying graph as a tool for regularization.
arXiv Detail & Related papers (2020-03-02T16:19:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.