An Information-Theoretic Framework for Supervised Learning
- URL: http://arxiv.org/abs/2203.00246v6
- Date: Fri, 24 Mar 2023 19:48:25 GMT
- Title: An Information-Theoretic Framework for Supervised Learning
- Authors: Hong Jun Jeon and Yifan Zhu and Benjamin Van Roy
- Abstract summary: We propose a novel information-theoretic framework with its own notions of regret and sample complexity.
We study the sample complexity of learning from data generated by deep neural networks with ReLU activation units.
We conclude by corroborating our theoretical results with experimental analysis of random single-hidden-layer neural networks.
- Score: 22.280001450122175
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Each year, deep learning demonstrates new and improved empirical results with
deeper and wider neural networks. Meanwhile, with existing theoretical
frameworks, it is difficult to analyze networks deeper than two layers without
resorting to counting parameters or encountering sample complexity bounds that
are exponential in depth. Perhaps it may be fruitful to try to analyze modern
machine learning under a different lens. In this paper, we propose a novel
information-theoretic framework with its own notions of regret and sample
complexity for analyzing the data requirements of machine learning. With our
framework, we first work through some classical examples such as scalar
estimation and linear regression to build intuition and introduce general
techniques. Then, we use the framework to study the sample complexity of
learning from data generated by deep neural networks with ReLU activation
units. For a particular prior distribution on weights, we establish sample
complexity bounds that are simultaneously width independent and linear in
depth. This prior distribution gives rise to high-dimensional latent
representations that, with high probability, admit reasonably accurate
low-dimensional approximations. We conclude by corroborating our theoretical
results with experimental analysis of random single-hidden-layer neural
networks.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Fundamental limits of overparametrized shallow neural networks for
supervised learning [11.136777922498355]
We study a two-layer neural network trained from input-output pairs generated by a teacher network with matching architecture.
Our results come in the form of bounds relating i) the mutual information between training data and network weights, or ii) the Bayes-optimal generalization error.
arXiv Detail & Related papers (2023-07-11T08:30:50Z) - Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural
Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs)
It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - Generalization Error Bounds for Iterative Recovery Algorithms Unfolded
as Neural Networks [6.173968909465726]
We introduce a general class of neural networks suitable for sparse reconstruction from few linear measurements.
By allowing a wide range of degrees of weight-sharing between the layers, we enable a unified analysis for very different neural network types.
arXiv Detail & Related papers (2021-12-08T16:17:33Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Generalized Approach to Matched Filtering using Neural Networks [4.535489275919893]
We make a key observation on the relationship between the emerging deep learning and the traditional techniques.
matched filtering is formally equivalent to a particular neural network.
We show that the proposed neural network architecture can outperform matched filtering.
arXiv Detail & Related papers (2021-04-08T17:59:07Z) - Anomaly Detection on Attributed Networks via Contrastive Self-Supervised
Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks.
Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair.
A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z) - Correlator Convolutional Neural Networks: An Interpretable Architecture
for Image-like Quantum Matter Data [15.283214387433082]
We develop a network architecture that discovers features in the data which are directly interpretable in terms of physical observables.
Our approach lends itself well to the construction of simple, end-to-end interpretable architectures.
arXiv Detail & Related papers (2020-11-06T17:04:10Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.