Related papers: Rethink the Connections among Generalization, Memorization and the Spectral Bias of DNNs

Rethink the Connections among Generalization, Memorization and the Spectral Bias of DNNs

URL: http://arxiv.org/abs/2004.13954v2
Date: Sat, 5 Jun 2021 11:18:34 GMT
Title: Rethink the Connections among Generalization, Memorization and the Spectral Bias of DNNs
Authors: Xiao Zhang, Haoyi Xiong, Dongrui Wu
Abstract summary: We show that the monotonicity of the learning bias does not always hold. Under the experimental setup of deep double descent, the high-frequency components of DNNs diminish in the late stage of training.
Score: 44.5823185453399
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Over-parameterized deep neural networks (DNNs) with sufficient capacity to memorize random noise can achieve excellent generalization performance, challenging the bias-variance trade-off in classical learning theory. Recent studies claimed that DNNs first learn simple patterns and then memorize noise; some other works showed a phenomenon that DNNs have a spectral bias to learn target functions from low to high frequencies during training. However, we show that the monotonicity of the learning bias does not always hold: under the experimental setup of deep double descent, the high-frequency components of DNNs diminish in the late stage of training, leading to the second descent of the test error. Besides, we find that the spectrum of DNNs can be applied to indicating the second descent of the test error, even though it is calculated from the training set only.

Related papers

Addressing Spectral Bias of Deep Neural Networks by Multi-Grade Deep Learning [3.0468273116892752]
Deep neural networks (DNNs) exhibit a tendency to prioritize the learning of lower-frequency components of a function, struggling to capture its high-frequency features. We propose to learn a function containing high-frequency components by composing several SNNs, each of which learns certain low-frequency information from the given data. Our study reveals that MGDL excels at representing functions containing high-frequency information.
arXiv Detail & Related papers (2024-10-21T15:34:33Z)
Understanding the dynamics of the frequency bias in neural networks [0.0]
Recent works have shown that traditional Neural Network (NN) architectures display a marked frequency bias in the learning process. We develop a partial differential equation (PDE) that unravels the frequency dynamics of the error for a 2-layer NN. We empirically show that the same principle extends to multi-layer NNs.
arXiv Detail & Related papers (2024-05-23T18:09:16Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree Spectral Bias of Neural Networks [79.28094304325116]
Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards simpler'' functions. We show how this spectral bias towards low-degree frequencies can in fact hurt the neural network's generalization on real-world datasets. We propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies.
arXiv Detail & Related papers (2023-05-16T20:06:01Z)
A Quadrature Perspective on Frequency Bias in Neural Network Training with Nonuniform Data [1.7188280334580197]
gradient-based algorithms minimize the low-frequency misfit before reducing the high-frequency residuals. We use the Neural Tangent Kernel (NTK) to provide a theoretically rigorous analysis for training where data are drawn from constant or piecewise-constant probability densities.
arXiv Detail & Related papers (2022-05-28T02:31:19Z)
The Spectral Bias of Polynomial Neural Networks [63.27903166253743]
Polynomial neural networks (PNNs) have been shown to be particularly effective at image generation and face recognition, where high-frequency information is critical. Previous studies have revealed that neural networks demonstrate a $textitspectral bias$ towards low-frequency functions, which yields faster learning of low-frequency components during training. Inspired by such studies, we conduct a spectral analysis of the Tangent Kernel (NTK) of PNNs. We find that the $Pi$-Net family, i.e., a recently proposed parametrization of PNNs, speeds up the
arXiv Detail & Related papers (2022-02-27T23:12:43Z)
Rethinking Nearest Neighbors for Visual Classification [56.00783095670361]
k-NN is a lazy learning method that aggregates the distance between the test image and top-k neighbors in a training set. We adopt k-NN with pre-trained visual representations produced by either supervised or self-supervised methods in two steps. Via extensive experiments on a wide range of classification tasks, our study reveals the generality and flexibility of k-NN integration.
arXiv Detail & Related papers (2021-12-15T20:15:01Z)
Learning from Failure: Training Debiased Classifier from Biased Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge. We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously. Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z)
Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks [9.23835409289015]
We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective. We demonstrate a very universal Frequency Principle (F-Principle) -- DNNs often fit target functions from low to high frequencies.
arXiv Detail & Related papers (2019-01-19T13:37:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.