On the Power of Shallow Learning
- URL: http://arxiv.org/abs/2106.03186v1
- Date: Sun, 6 Jun 2021 17:25:33 GMT
- Title: On the Power of Shallow Learning
- Authors: James B. Simon, Sajant Anand, Michael R. DeWeese
- Abstract summary: Given a kernel, can one find a network that realizes it?
We affirmatively answer this question for fully-connected architectures, completely characterizing the space of achievable kernels.
We experimentally verify our construction and demonstrate that, by just choosing the activation function, we can design a wide shallow network that mimics the generalization performance of any wide, deep, fully-connected network.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A deluge of recent work has explored equivalences between wide neural
networks and kernel methods. A central theme is that one can analytically find
the kernel corresponding to a given wide network architecture, but despite
major implications for architecture design, no work to date has asked the
converse question: given a kernel, can one find a network that realizes it? We
affirmatively answer this question for fully-connected architectures,
completely characterizing the space of achievable kernels. Furthermore, we give
a surprising constructive proof that any kernel of any wide, deep,
fully-connected net can also be achieved with a network with just one hidden
layer and a specially-designed pointwise activation function. We experimentally
verify our construction and demonstrate that, by just choosing the activation
function, we can design a wide shallow network that mimics the generalization
performance of any wide, deep, fully-connected network.
Related papers
- Local Kernel Renormalization as a mechanism for feature learning in
overparametrized Convolutional Neural Networks [0.0]
Empirical evidence shows that fully-connected neural networks in the infinite-width limit eventually outperform their finite-width counterparts.
State-of-the-art architectures with convolutional layers achieve optimal performances in the finite-width regime.
We show that the generalization performance of a finite-width FC network can be obtained by an infinite-width network, with a suitable choice of the Gaussian priors.
arXiv Detail & Related papers (2023-07-21T17:22:04Z) - Deep Maxout Network Gaussian Process [1.9292807030801753]
We derive the equivalence of the deep, infinite-width maxout network and the Gaussian process (GP)
We build up the connection between our deep maxout network kernel and deep neural network kernels.
arXiv Detail & Related papers (2022-08-08T23:52:26Z) - The Neural Race Reduction: Dynamics of Abstraction in Gated Networks [12.130628846129973]
We introduce the Gated Deep Linear Network framework that schematizes how pathways of information flow impact learning dynamics.
We derive an exact reduction and, for certain cases, exact solutions to the dynamics of learning.
Our work gives rise to general hypotheses relating neural architecture to learning and provides a mathematical approach towards understanding the design of more complex architectures.
arXiv Detail & Related papers (2022-07-21T12:01:03Z) - Firefly Neural Architecture Descent: a General Approach for Growing
Neural Networks [50.684661759340145]
Firefly neural architecture descent is a general framework for progressively and dynamically growing neural networks.
We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures.
In particular, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.
arXiv Detail & Related papers (2021-02-17T04:47:18Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - NAS-Navigator: Visual Steering for Explainable One-Shot Deep Neural
Network Synthesis [53.106414896248246]
We present a framework that allows analysts to effectively build the solution sub-graph space and guide the network search by injecting their domain knowledge.
Applying this technique in an iterative manner allows analysts to converge to the best performing neural network architecture for a given application.
arXiv Detail & Related papers (2020-09-28T01:48:45Z) - Inductive Graph Embeddings through Locality Encodings [0.42970700836450487]
We look at the problem of finding inductive network embeddings in large networks without domain-dependent node/edge attributes.
We propose to use a set of basic predefined local encodings as the basis of a learning algorithm.
This method achieves state-of-the-art performance in tasks such as role detection, link prediction and node classification.
arXiv Detail & Related papers (2020-09-26T13:09:11Z) - Automated Search for Resource-Efficient Branched Multi-Task Networks [81.48051635183916]
We propose a principled approach, rooted in differentiable neural architecture search, to automatically define branching structures in a multi-task neural network.
We show that our approach consistently finds high-performing branching structures within limited resource budgets.
arXiv Detail & Related papers (2020-08-24T09:49:19Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Recursive Multi-model Complementary Deep Fusion forRobust Salient Object
Detection via Parallel Sub Networks [62.26677215668959]
Fully convolutional networks have shown outstanding performance in the salient object detection (SOD) field.
This paper proposes a wider'' network architecture which consists of parallel sub networks with totally different network architectures.
Experiments on several famous benchmarks clearly demonstrate the superior performance, good generalization, and powerful learning ability of the proposed wider framework.
arXiv Detail & Related papers (2020-08-07T10:39:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.