Fiedler Regularization: Learning Neural Networks with Graph Sparsity
- URL: http://arxiv.org/abs/2003.00992v3
- Date: Sat, 15 Aug 2020 08:39:03 GMT
- Title: Fiedler Regularization: Learning Neural Networks with Graph Sparsity
- Authors: Edric Tam and David Dunson
- Abstract summary: We introduce a novel regularization approach for deep learning that incorporates and respects the underlying graphical structure of the neural network.
We propose to use the Fiedler value of the neural network's underlying graph as a tool for regularization.
- Score: 6.09170287691728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a novel regularization approach for deep learning that
incorporates and respects the underlying graphical structure of the neural
network. Existing regularization methods often focus on dropping/penalizing
weights in a global manner that ignores the connectivity structure of the
neural network. We propose to use the Fiedler value of the neural network's
underlying graph as a tool for regularization. We provide theoretical support
for this approach via spectral graph theory. We list several useful properties
of the Fiedler value that makes it suitable in regularization. We provide an
approximate, variational approach for fast computation in practical training of
neural networks. We provide bounds on such approximations. We provide an
alternative but equivalent formulation of this framework in the form of a
structurally weighted L1 penalty, thus linking our approach to sparsity
induction. We performed experiments on datasets that compare Fiedler
regularization with traditional regularization methods such as dropout and
weight decay. Results demonstrate the efficacy of Fiedler regularization.
Related papers
- Convergence Analysis for Learning Orthonormal Deep Linear Neural
Networks [27.29463801531576]
We provide convergence analysis for training orthonormal deep linear neural networks.
Our results shed light on how increasing the number of hidden layers can impact the convergence speed.
arXiv Detail & Related papers (2023-11-24T18:46:54Z) - Spectral Gap Regularization of Neural Networks [6.09170287691728]
Fiedler regularization is a novel approach for regularizing neural networks that utilizes spectral/graphical information.
We provide an approximate, variational approach for faster computation during training.
We performed experiments on datasets that compare Fiedler regularization with classical regularization methods such as dropout and weight decay.
arXiv Detail & Related papers (2023-04-06T14:23:40Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Simple initialization and parametrization of sinusoidal networks via
their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions.
We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis.
We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z) - Fast Adaptation with Linearized Neural Networks [35.43406281230279]
We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions.
Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel designed from the Jacobian of the network.
In this setting, domain adaptation takes the form of interpretable posterior inference, with accompanying uncertainty estimation.
arXiv Detail & Related papers (2021-03-02T03:23:03Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Compressive Sensing and Neural Networks from a Statistical Learning
Perspective [4.561032960211816]
We present a generalization error analysis for a class of neural networks suitable for sparse reconstruction from few linear measurements.
Under realistic conditions, the generalization error scales only logarithmically in the number of layers, and at most linear in number of measurements.
arXiv Detail & Related papers (2020-10-29T15:05:43Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Volumization as a Natural Generalization of Weight Decay [25.076488081589403]
Inspired by physics, we define a physical volume for the weight parameters in neural networks.
We show that this method is an effective way of regularizing neural networks.
arXiv Detail & Related papers (2020-03-25T07:13:55Z) - Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights.
Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.