Related papers: Depth Selection for Deep ReLU Nets in Feature Extraction and Generalization

Depth Selection for Deep ReLU Nets in Feature Extraction and Generalization

URL: http://arxiv.org/abs/2004.00245v1
Date: Wed, 1 Apr 2020 06:03:01 GMT
Title: Depth Selection for Deep ReLU Nets in Feature Extraction and Generalization
Authors: Zhi Han, Siquan Yu, Shao-Bo Lin, Ding-Xuan Zhou
Abstract summary: We show that implementing the classical empirical risk minimization on deep nets can achieve the optimal generalization performance for numerous learning tasks. Our results are verified by a series of numerical experiments including toy simulations and a real application of earthquake seismic intensity prediction.
Score: 22.696129751033983
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning is recognized to be capable of discovering deep features for representation learning and pattern recognition without requiring elegant feature engineering techniques by taking advantage of human ingenuity and prior knowledge. Thus it has triggered enormous research activities in machine learning and pattern recognition. One of the most important challenge of deep learning is to figure out relations between a feature and the depth of deep neural networks (deep nets for short) to reflect the necessity of depth. Our purpose is to quantify this feature-depth correspondence in feature extraction and generalization. We present the adaptivity of features to depths and vice-verse via showing a depth-parameter trade-off in extracting both single feature and composite features. Based on these results, we prove that implementing the classical empirical risk minimization on deep nets can achieve the optimal generalization performance for numerous learning tasks. Our theoretical results are verified by a series of numerical experiments including toy simulations and a real application of earthquake seismic intensity prediction.

Related papers

Feature Qualification by Deep Nets: A Constructive Approach [19.474935486234166]
We build a linear deep net operator that possesses optimal approximation performance in approximating smooth and radial functions. We provide theoretical evidences that the constructed deep net operator is capable of qualifying multiple features such as the smoothness and radialness of the target functions.
arXiv Detail & Related papers (2025-03-24T13:48:17Z)
The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent [28.999394988111106]
We introduce a class of target functions that incorporate a hierarchy of latent subspace dimensionalities. Our main theorem shows that feature learning with gradient descent reduces the effective dimensionality. These findings open the way to further quantitative studies of the crucial role of depth in learning hierarchical structures with deep networks.
arXiv Detail & Related papers (2025-02-19T18:58:28Z)
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
Semantics-Oriented Multitask Learning for DeepFake Detection: A Joint Embedding Approach [77.65459419417533]
We propose an automatic dataset expansion technique to support semantics-oriented DeepFake detection tasks. We also resort to joint embedding of face images and their corresponding labels for prediction. Our method improves the generalizability of DeepFake detection and renders some degree of model interpretation by providing human-understandable explanations.
arXiv Detail & Related papers (2024-08-29T07:11:50Z)
Convergence Analysis for Deep Sparse Coding via Convolutional Neural Networks [7.956678963695681]
We introduce a novel class of Deep Sparse Coding (DSC) models. We derive convergence rates for CNNs in their ability to extract sparse features. Inspired by the strong connection between sparse coding and CNNs, we explore training strategies to encourage neural networks to learn more sparse features.
arXiv Detail & Related papers (2024-08-10T12:43:55Z)
Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data. Main aim of the identified model is to predict new data from previous observations. We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z)
Expressive Power and Loss Surfaces of Deep Learning Models [0.0]
This paper serves as an expository tutorial on the working of deep learning models. The second goal is to complement the current results on the expressive power of deep learning models with novel insights and results.
arXiv Detail & Related papers (2021-08-08T06:28:09Z)
Self-Guided Instance-Aware Network for Depth Completion and Enhancement [6.319531161477912]
Existing methods directly interpolate the missing depth measurements based on pixel-wise image content and the corresponding neighboring depth values. We propose a novel self-guided instance-aware network (SG-IANet) that utilize self-guided mechanism to extract instance-level features that is needed for depth restoration.
arXiv Detail & Related papers (2021-05-25T19:41:38Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
Variational Structured Attention Networks for Deep Visual Representation Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner. Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework. We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z)
Accurate RGB-D Salient Object Detection via Collaborative Learning [101.82654054191443]
RGB-D saliency detection shows impressive ability on some challenge scenarios. We propose a novel collaborative learning framework where edge, depth and saliency are leveraged in a more efficient way.
arXiv Detail & Related papers (2020-07-23T04:33:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.