Wide and Deep Neural Networks Achieve Optimality for Classification
- URL: http://arxiv.org/abs/2204.14126v1
- Date: Fri, 29 Apr 2022 14:27:42 GMT
- Title: Wide and Deep Neural Networks Achieve Optimality for Classification
- Authors: Adityanarayanan Radhakrishnan, Mikhail Belkin, Caroline Uhler
- Abstract summary: We identify and construct an explicit set of neural network classifiers that achieve optimality.
In particular, we provide explicit activation functions that can be used to construct networks that achieve optimality.
Our results highlight the benefit of using deep networks for classification tasks, in contrast to regression tasks, where excessive depth is harmful.
- Score: 23.738242876364865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While neural networks are used for classification tasks across domains, a
long-standing open problem in machine learning is determining whether neural
networks trained using standard procedures are optimal for classification,
i.e., whether such models minimize the probability of misclassification for
arbitrary data distributions. In this work, we identify and construct an
explicit set of neural network classifiers that achieve optimality. Since
effective neural networks in practice are typically both wide and deep, we
analyze infinitely wide networks that are also infinitely deep. In particular,
using the recent connection between infinitely wide neural networks and Neural
Tangent Kernels, we provide explicit activation functions that can be used to
construct networks that achieve optimality. Interestingly, these activation
functions are simple and easy to implement, yet differ from commonly used
activations such as ReLU or sigmoid. More generally, we create a taxonomy of
infinitely wide and deep networks and show that these models implement one of
three well-known classifiers depending on the activation function used: (1)
1-nearest neighbor (model predictions are given by the label of the nearest
training example); (2) majority vote (model predictions are given by the label
of the class with greatest representation in the training set); or (3) singular
kernel classifiers (a set of classifiers containing those that achieve
optimality). Our results highlight the benefit of using deep networks for
classification tasks, in contrast to regression tasks, where excessive depth is
harmful.
Related papers
- On Excess Risk Convergence Rates of Neural Network Classifiers [8.329456268842227]
We study the performance of plug-in classifiers based on neural networks in a binary classification setting as measured by their excess risks.
We analyze the estimation and approximation properties of neural networks to obtain a dimension-free, uniform rate of convergence.
arXiv Detail & Related papers (2023-09-26T17:14:10Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Explainable Deep Belief Network based Auto encoder using novel Extended
Garson Algorithm [6.228766191647919]
We develop an algorithm to explain Deep Belief Network based Auto-encoder (DBNA)
It is used to determine the contribution of each input feature in the DBN.
Important features identified by this method are compared against those obtained by Wald chi square (chi2)
arXiv Detail & Related papers (2022-07-18T10:44:02Z) - Neural networks with linear threshold activations: structure and
algorithms [1.795561427808824]
We show that 2 hidden layers are necessary and sufficient to represent any function representable in the class.
We also give precise bounds on the sizes of the neural networks required to represent any function in the class.
We propose a new class of neural networks that we call shortcut linear threshold networks.
arXiv Detail & Related papers (2021-11-15T22:33:52Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - Fast Adaptation with Linearized Neural Networks [35.43406281230279]
We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions.
Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel designed from the Jacobian of the network.
In this setting, domain adaptation takes the form of interpretable posterior inference, with accompanying uncertainty estimation.
arXiv Detail & Related papers (2021-03-02T03:23:03Z) - Firefly Neural Architecture Descent: a General Approach for Growing
Neural Networks [50.684661759340145]
Firefly neural architecture descent is a general framework for progressively and dynamically growing neural networks.
We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures.
In particular, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.
arXiv Detail & Related papers (2021-02-17T04:47:18Z) - Kernelized Classification in Deep Networks [49.47339560731506]
We propose a kernelized classification layer for deep networks.
We advocate a nonlinear classification layer by using the kernel trick on the softmax cross-entropy loss function during training.
We show the usefulness of the proposed nonlinear classification layer on several datasets and tasks.
arXiv Detail & Related papers (2020-12-08T21:43:19Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - Finding trainable sparse networks through Neural Tangent Transfer [16.092248433189816]
In deep learning, trainable sparse networks that perform well on a specific task are usually constructed using label-dependent pruning criteria.
In this article, we introduce Neural Tangent Transfer, a method that instead finds trainable sparse networks in a label-free manner.
arXiv Detail & Related papers (2020-06-15T08:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.