Unwrapping All ReLU Networks
- URL: http://arxiv.org/abs/2305.09424v1
- Date: Tue, 16 May 2023 13:30:15 GMT
- Title: Unwrapping All ReLU Networks
- Authors: Mattia Jacopo Villani, Peter McBurney
- Abstract summary: Deep ReLU Networks can be decomposed into a collection of linear models.
We extend this decomposition to Graph Neural networks and tensor convolutional networks.
We show how this model leads to computing cheap and exact SHAP values.
- Score: 1.370633147306388
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep ReLU Networks can be decomposed into a collection of linear models, each
defined in a region of a partition of the input space. This paper provides
three results extending this theory. First, we extend this linear
decompositions to Graph Neural networks and tensor convolutional networks, as
well as networks with multiplicative interactions. Second, we provide proofs
that neural networks can be understood as interpretable models such as
Multivariate Decision trees and logical theories. Finally, we show how this
model leads to computing cheap and exact SHAP values. We validate the theory
through experiments with on Graph Neural Networks.
Related papers
- Convection-Diffusion Equation: A Theoretically Certified Framework for Neural Networks [14.01268607317875]
We study the partial differential equation models of neural networks.
We show that this map can be formulated by a convection-diffusion equation.
We design a novel network structure, which incorporates diffusion mechanism into network architecture.
arXiv Detail & Related papers (2024-03-23T05:26:36Z) - A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features [54.83898311047626]
We consider neural networks with piecewise linear activations ranging from 2 to an arbitrary but finite number of layers.
We first show that two-layer networks with piecewise linear activations are Lasso models using a discrete dictionary of ramp depths.
arXiv Detail & Related papers (2024-03-02T00:33:45Z) - Globally Gated Deep Linear Networks [3.04585143845864]
We introduce Globally Gated Deep Linear Networks (GGDLNs) where gating units are shared among all processing units in each layer.
We derive exact equations for the generalization properties in these networks in the finite-width thermodynamic limit.
Our work is the first exact theoretical solution of learning in a family of nonlinear networks with finite width.
arXiv Detail & Related papers (2022-10-31T16:21:56Z) - Convolutional Neural Networks on Manifolds: From Graphs and Back [122.06927400759021]
We propose a manifold neural network (MNN) composed of a bank of manifold convolutional filters and point-wise nonlinearities.
To sum up, we focus on the manifold model as the limit of large graphs and construct MNNs, while we can still bring back graph neural networks by the discretization of MNNs.
arXiv Detail & Related papers (2022-10-01T21:17:39Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - On some theoretical limitations of Generative Adversarial Networks [77.34726150561087]
It is a general assumption that GANs can generate any probability distribution.
We provide a new result based on Extreme Value Theory showing that GANs can't generate heavy tailed distributions.
arXiv Detail & Related papers (2021-10-21T06:10:38Z) - The Principles of Deep Learning Theory [19.33681537640272]
This book develops an effective theory approach to understanding deep neural networks of practical relevance.
We explain how these effectively-deep networks learn nontrivial representations from training.
We show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks.
arXiv Detail & Related papers (2021-06-18T15:00:00Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Towards Understanding Learning in Neural Networks with Linear Teachers [31.849269592822296]
We prove that SGD globally optimize this learning problem for a two-layer network with Leaky ReLU activations.
We provide theoretical support for this phenomenon by proving that if network weights converge to two weight clusters, this will imply an approximately linear decision boundary.
arXiv Detail & Related papers (2021-01-07T13:21:24Z) - Perceptron Theory Can Predict the Accuracy of Neural Networks [6.136302173351179]
Multilayer neural networks set the current state of the art for many technical classification problems.
But, these networks are still, essentially, black boxes in terms of analyzing them and predicting their performance.
Here, we develop a statistical theory for the one-layer perceptron and show that it can predict performances of a surprisingly large variety of neural networks.
arXiv Detail & Related papers (2020-12-14T19:02:26Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.