Related papers: Free Probability, Newton lilypads and Jacobians of neural networks

Free Probability, Newton lilypads and Jacobians of neural networks

URL: http://arxiv.org/abs/2111.00841v1
Date: Mon, 1 Nov 2021 11:22:42 GMT
Title: Free Probability, Newton lilypads and Jacobians of neural networks
Authors: Reda Chhaibi, Tariq Daouda, Ezechiel Kahn
Abstract summary: We present a reliable and very fast method for computing the associated spectral densities. Our technique is based on an adaptative Newton-Raphson scheme, by finding and chaining basins of attraction.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gradient descent during the learning process of a neural network can be subject to many instabilities. The spectral density of the Jacobian is a key component for analyzing robustness. Following the works of Pennington et al., such Jacobians are modeled using free multiplicative convolutions from Free Probability Theory. We present a reliable and very fast method for computing the associated spectral densities. This method has a controlled and proven convergence. Our technique is based on an adaptative Newton-Raphson scheme, by finding and chaining basins of attraction: the Newton algorithm finds contiguous lilypad-like basins and steps from one to the next, heading towards the objective. We demonstrate the applicability of our method by using it to assess how the learning process is affected by network depth, layer widths and initialization choices: empirically, final test losses are very correlated to our Free Probability metrics.

Related papers

On the Hardness of Probabilistic Neurosymbolic Learning [10.180468225166441]
We study the complexity of differentiating probabilistic reasoning in neurosymbolic models. We introduce WeightME, an unbiased gradient estimator based on model sampling. Our experiments indicate that the existing biased approximations indeed struggle to optimize even when exact solving is still feasible.
arXiv Detail & Related papers (2024-06-06T19:56:33Z)
Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise. We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z)
Semantic Strengthening of Neuro-Symbolic Learning [85.6195120593625]
Neuro-symbolic approaches typically resort to fuzzy approximations of a probabilistic objective. We show how to compute this efficiently for tractable circuits. We test our approach on three tasks: predicting a minimum-cost path in Warcraft, predicting a minimum-cost perfect matching, and solving Sudoku puzzles.
arXiv Detail & Related papers (2023-02-28T00:04:22Z)
Robust Explanation Constraints for Neural Networks [33.14373978947437]
Post-hoc explanation methods used with the intent of neural networks are sometimes said to help engender trust in their outputs. Our training method is the only method able to learn neural networks with insights about robustness tested across all six tested networks.
arXiv Detail & Related papers (2022-12-16T14:40:25Z)
On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods. We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z)
Newton methods based convolution neural networks using parallel processing [3.9220281834178463]
Training of convolutional neural networks is a high dimensional and a non- parametric optimization problem. Newton methods for convolutional neural networks deals with this by using sub-sampled Hessian Newton methods. We have used parallel processing instead of serial processing in mini-batch computations.
arXiv Detail & Related papers (2021-12-02T16:42:27Z)
DONE: Distributed Approximate Newton-type Method for Federated Edge Learning [41.20946186966816]
DONE is a distributed approximate Newton-type algorithm with fast convergence rate. We show DONE attains a comparable performance to the Newton's method.
arXiv Detail & Related papers (2020-12-10T12:25:34Z)
Deep Archimedean Copulas [98.96141706464425]
ACNet is a novel differentiable neural network architecture that enforces structural properties. We show that ACNet is able to both approximate common Archimedean Copulas and generate new copulas which may provide better fits to data.
arXiv Detail & Related papers (2020-12-05T22:58:37Z)
Deep Magnification-Flexible Upsampling over 3D Point Clouds [103.09504572409449]
We propose a novel end-to-end learning-based framework to generate dense point clouds. We first formulate the problem explicitly, which boils down to determining the weights and high-order approximation errors. Then, we design a lightweight neural network to adaptively learn unified and sorted weights as well as the high-order refinements.
arXiv Detail & Related papers (2020-11-25T14:00:18Z)
How Powerful are Shallow Neural Networks with Bandlimited Random Weights? [25.102870584507244]
We investigate the expressive power of limited depth-2 band random neural networks. A random net is a neural network where the hidden layer parameters are frozen with random bandwidth.
arXiv Detail & Related papers (2020-08-19T13:26:12Z)
Disentangling the Gauss-Newton Method and Approximate Inference for Neural Networks [96.87076679064499]
We disentangle the generalized Gauss-Newton and approximate inference for Bayesian deep learning. We find that the Gauss-Newton method simplifies the underlying probabilistic model significantly. The connection to Gaussian processes enables new function-space inference algorithms.
arXiv Detail & Related papers (2020-07-21T17:42:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.