Related papers: Deep Learning is Not So Mysterious or Different

Deep Learning is Not So Mysterious or Different

URL: http://arxiv.org/abs/2503.02113v1
Date: Mon, 03 Mar 2025 22:56:04 GMT
Title: Deep Learning is Not So Mysterious or Different
Authors: Andrew Gordon Wilson,
Abstract summary: We argue that anomalous generalization behaviour is not distinct to neural networks.<n>We present soft inductive biases as a key unifying principle in explaining these phenomena.<n>We also highlight how deep learning is relatively distinct in other ways, such as its ability for representation learning.
Score: 54.5330466151362
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks are often seen as different from other model classes by defying conventional notions of generalization. Popular examples of anomalous generalization behaviour include benign overfitting, double descent, and the success of overparametrization. We argue that these phenomena are not distinct to neural networks, or particularly mysterious. Moreover, this generalization behaviour can be intuitively understood, and rigorously characterized using long-standing generalization frameworks such as PAC-Bayes and countable hypothesis bounds. We present soft inductive biases as a key unifying principle in explaining these phenomena: rather than restricting the hypothesis space to avoid overfitting, embrace a flexible hypothesis space, with a soft preference for simpler solutions that are consistent with the data. This principle can be encoded in many model classes, and thus deep learning is not as mysterious or different from other model classes as it might seem. However, we also highlight how deep learning is relatively distinct in other ways, such as its ability for representation learning, phenomena such as mode connectivity, and its relative universality.

Related papers

A Near Complete Nonasymptotic Generalization Theory For Multilayer Neural Networks: Beyond the Bias-Variance Tradeoff [57.25901375384457]
We propose a nonasymptotic generalization theory for multilayer neural networks with arbitrary Lipschitz activations and general Lipschitz loss functions.<n>In particular, it doens't require the boundness of loss function, as commonly assumed in the literature.<n>We show the near minimax optimality of our theory for multilayer ReLU networks for regression problems.
arXiv Detail & Related papers (2025-03-03T23:34:12Z)
Universality of Benign Overfitting in Binary Linear Classification [1.6385815610837167]
deep neural networks seem to generalize well in the over-parametrized regime. It is now known that benign overfitting also occurs in various classical statistical models.
arXiv Detail & Related papers (2025-01-17T20:22:06Z)
Understanding Deep Learning via Notions of Rank [5.439020425819001]
This thesis puts forth notions of rank as key for developing a theory of deep learning.<n>In particular, we establish that gradient-based training can induce an implicit regularization towards low rank for several neural network architectures.<n> Practical implications of our theory for designing explicit regularization schemes and data preprocessing algorithms are presented.
arXiv Detail & Related papers (2024-08-04T18:47:55Z)
Neural Causal Abstractions [63.21695740637627]
We develop a new family of causal abstractions by clustering variables and their domains. We show that such abstractions are learnable in practical settings through Neural Causal Models. Our experiments support the theory and illustrate how to scale causal inferences to high-dimensional settings involving image data.
arXiv Detail & Related papers (2024-01-05T02:00:27Z)
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization [48.26492774959634]
We develop a compression approach based on quantizing neural network parameters in a linear subspace. We find large models can be compressed to a much greater extent than previously known, encapsulating Occam's razor.
arXiv Detail & Related papers (2022-11-24T13:50:16Z)
Generalization Through The Lens Of Leave-One-Out Error [22.188535244056016]
We show that the leave-one-out error provides a tractable way to estimate the generalization ability of deep neural networks in the kernel regime. Our work therefore demonstrates that the leave-one-out error provides a tractable way to estimate the generalization ability of deep neural networks in the kernel regime.
arXiv Detail & Related papers (2022-03-07T14:56:00Z)
Realizable Learning is All You Need [21.34668631009594]
equivalence of realizable and agnostic learnability is a fundamental phenomenon in learning theory. We give the first model-independent framework explaining the equivalence of realizable and agnostic learnability.
arXiv Detail & Related papers (2021-11-08T19:00:00Z)
An Empirical Investigation into Deep and Shallow Rule Learning [0.0]
In this paper, we empirically compare deep and shallow rule learning with a uniform general algorithm. Our experiments on both artificial and real-world benchmark data indicate that deep rule networks outperform shallow networks.
arXiv Detail & Related papers (2021-06-18T17:43:17Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
Plausible Counterfactuals: Auditing Deep Learning Classifiers with Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data. Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model. Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.