Understanding Deep Architectures with Reasoning Layer
- URL: http://arxiv.org/abs/2006.13401v2
- Date: Thu, 29 Oct 2020 22:00:00 GMT
- Title: Understanding Deep Architectures with Reasoning Layer
- Authors: Xinshi Chen, Yufei Zhang, Christoph Reisinger, Le Song
- Abstract summary: We show that properties of the algorithm layers, such as convergence, stability, and sensitivity, are intimately related to the approximation and generalization abilities of the end-to-end model.
Our theory can provide useful guidelines for designing deep architectures with reasoning layers.
- Score: 60.90906477693774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, there has been a surge of interest in combining deep learning
models with reasoning in order to handle more sophisticated learning tasks. In
many cases, a reasoning task can be solved by an iterative algorithm. This
algorithm is often unrolled, and used as a specialized layer in the deep
architecture, which can be trained end-to-end with other neural components.
Although such hybrid deep architectures have led to many empirical successes,
the theoretical foundation of such architectures, especially the interplay
between algorithm layers and other neural layers, remains largely unexplored.
In this paper, we take an initial step towards an understanding of such hybrid
deep architectures by showing that properties of the algorithm layers, such as
convergence, stability, and sensitivity, are intimately related to the
approximation and generalization abilities of the end-to-end model.
Furthermore, our analysis matches closely our experimental observations under
various conditions, suggesting that our theory can provide useful guidelines
for designing deep architectures with reasoning layers.
Related papers
- Structure of Artificial Neural Networks -- Empirical Investigations [0.0]
Within one decade, Deep Learning overtook the dominating solution methods of countless problems of artificial intelligence.
With a formal definition for structures of neural networks, neural architecture search problems and solution methods can be formulated under a common framework.
Does structure make a difference or can it be chosen arbitrarily?
arXiv Detail & Related papers (2024-10-12T16:13:28Z) - Towards Understanding Mixture of Experts in Deep Learning [95.27215939891511]
We study how the MoE layer improves the performance of neural network learning.
Our results suggest that the cluster structure of the underlying problem and the non-linearity of the expert are pivotal to the success of MoE.
arXiv Detail & Related papers (2022-08-04T17:59:10Z) - The Neural Race Reduction: Dynamics of Abstraction in Gated Networks [12.130628846129973]
We introduce the Gated Deep Linear Network framework that schematizes how pathways of information flow impact learning dynamics.
We derive an exact reduction and, for certain cases, exact solutions to the dynamics of learning.
Our work gives rise to general hypotheses relating neural architecture to learning and provides a mathematical approach towards understanding the design of more complex architectures.
arXiv Detail & Related papers (2022-07-21T12:01:03Z) - Fault-Tolerant Deep Learning: A Hierarchical Perspective [12.315753706063324]
We conduct a comprehensive survey of fault-tolerant deep learning design approaches.
We investigate these approaches from model layer, architecture layer, circuit layer, and cross layer respectively.
arXiv Detail & Related papers (2022-04-05T02:31:18Z) - The Modern Mathematics of Deep Learning [8.939008609565368]
We describe the new field of mathematical analysis of deep learning.
This field emerged around a list of research questions that were not answered within the classical of learning theory.
For selected approaches, we describe the main ideas in more detail.
arXiv Detail & Related papers (2021-05-09T21:30:42Z) - Recent advances in deep learning theory [104.01582662336256]
This paper reviews and organizes the recent advances in deep learning theory.
The literature is categorized in six groups: (1) complexity and capacity-based approaches for analysing the generalizability of deep learning; (2) differential equations and their dynamic systems for modelling gradient descent and its variants; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; and (5) theoretical foundations of several special structures in network architectures.
arXiv Detail & Related papers (2020-12-20T14:16:41Z) - NAS-Navigator: Visual Steering for Explainable One-Shot Deep Neural
Network Synthesis [53.106414896248246]
We present a framework that allows analysts to effectively build the solution sub-graph space and guide the network search by injecting their domain knowledge.
Applying this technique in an iterative manner allows analysts to converge to the best performing neural network architecture for a given application.
arXiv Detail & Related papers (2020-09-28T01:48:45Z) - Automated Search for Resource-Efficient Branched Multi-Task Networks [81.48051635183916]
We propose a principled approach, rooted in differentiable neural architecture search, to automatically define branching structures in a multi-task neural network.
We show that our approach consistently finds high-performing branching structures within limited resource budgets.
arXiv Detail & Related papers (2020-08-24T09:49:19Z) - Learning to Stop While Learning to Predict [85.7136203122784]
Many algorithm-inspired deep models are restricted to a fixed-depth'' for all inputs.
Similar to algorithms, the optimal depth of a deep architecture may be different for different input instances.
In this paper, we tackle this varying depth problem using a steerable architecture.
We show that the learned deep model along with the stopping policy improves the performances on a diverse set of tasks.
arXiv Detail & Related papers (2020-06-09T07:22:01Z) - Structure preserving deep learning [1.2263454117570958]
deep learning has risen to the foreground as a topic of massive interest.
There are multiple challenging mathematical problems involved in applying deep learning.
A growing effort to mathematically understand the structure in existing deep learning methods.
arXiv Detail & Related papers (2020-06-05T10:59:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.