Frequency Principle in Deep Learning Beyond Gradient-descent-based
Training
- URL: http://arxiv.org/abs/2101.00747v1
- Date: Mon, 4 Jan 2021 03:11:03 GMT
- Title: Frequency Principle in Deep Learning Beyond Gradient-descent-based
Training
- Authors: Yuheng Ma, Zhi-Qin John Xu, Jiwei Zhang
- Abstract summary: Frequency perspective recently makes progress in understanding deep learning.
It has been widely verified that deep neural networks (DNNs) often fit the target function from low to high frequency, namely Frequency Principle (F-Principle)
Previous works examine the F-Principle in gradient-descent-based training.
It remains unclear whether gradient-descent-based training is a necessary condition for the F-Principle.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Frequency perspective recently makes progress in understanding deep learning.
It has been widely verified in both empirical and theoretical studies that deep
neural networks (DNNs) often fit the target function from low to high
frequency, namely Frequency Principle (F-Principle). F-Principle sheds light on
the strength and the weakness of DNNs and inspires a series of subsequent
works, including theoretical studies, empirical studies and the design of
efficient DNN structures etc. Previous works examine the F-Principle in
gradient-descent-based training. It remains unclear whether
gradient-descent-based training is a necessary condition for the F-Principle.
In this paper, we show that the F-Principle exists stably in the training
process of DNNs with non-gradient-descent-based training, including
optimization algorithms with gradient information, such as conjugate gradient
and BFGS, and algorithms without gradient information, such as Powell's method
and Particle Swarm Optimization. These empirical studies show the universality
of the F-Principle and provide hints for further study of F-Principle.
Related papers
- Kernel Approximation of Fisher-Rao Gradient Flows [52.154685604660465]
We present a rigorous investigation of Fisher-Rao and Wasserstein type gradient flows concerning their gradient structures, flow equations, and their kernel approximations.
Specifically, we focus on the Fisher-Rao geometry and its various kernel-based approximations, developing a principled theoretical framework.
arXiv Detail & Related papers (2024-10-27T22:52:08Z) - Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm [56.06235614890066]
Gradient descent (GD) and gradient descent (SGD) have been widely used in a number of application domains.
This paper carefully analyzes the dynamics of GD based on the terminal attractor at different stages of its gradient flow.
arXiv Detail & Related papers (2024-09-10T14:15:56Z) - Learning by the F-adjoint [0.0]
In this work, we develop and investigate this theoretical framework to improve some supervised learning algorithm for feed-forward neural network.
Our main result is that by introducing some neural dynamical model combined by the gradient descent algorithm, we derived an equilibrium F-adjoint process.
Experimental results on MNIST and Fashion-MNIST datasets, demonstrate that the proposed approach provide a significant improvements on the standard back-propagation training procedure.
arXiv Detail & Related papers (2024-07-08T13:49:25Z) - On the Generalization Capability of Temporal Graph Learning Algorithms:
Theoretical Insights and a Simpler Method [59.52204415829695]
Temporal Graph Learning (TGL) has become a prevalent technique across diverse real-world applications.
This paper investigates the generalization ability of different TGL algorithms.
We propose a simplified TGL network, which enjoys a small generalization error, improved overall performance, and lower model complexity.
arXiv Detail & Related papers (2024-02-26T08:22:22Z) - Layer-wise Feedback Propagation [53.00944147633484]
We present Layer-wise Feedback Propagation (LFP), a novel training approach for neural-network-like predictors.
LFP assigns rewards to individual connections based on their respective contributions to solving a given task.
We demonstrate its effectiveness in achieving comparable performance to gradient descent on various models and datasets.
arXiv Detail & Related papers (2023-08-23T10:48:28Z) - A Kernel-Based View of Language Model Fine-Tuning [94.75146965041131]
We investigate whether the Neural Tangent Kernel (NTK) describes fine-tuning of pre-trained LMs.
We show that formulating the downstream task as a masked word prediction problem through prompting often induces kernel-based dynamics during fine-tuning.
arXiv Detail & Related papers (2022-10-11T17:34:32Z) - Overview frequency principle/spectral bias in deep learning [3.957124094805574]
We show a Frequency Principle (F-Principle) of the training behavior of deep neural networks (DNNs)
The F-Principle is first demonstrated by one-dimensional synthetic data followed by the verification in high-dimensional real datasets.
This low-frequency implicit bias reveals the strength of neural network in learning low-frequency functions as well as its deficiency in learning high-frequency functions.
arXiv Detail & Related papers (2022-01-19T03:08:33Z) - FL-NTK: A Neural Tangent Kernel-based Framework for Federated Learning
Convergence Analysis [27.022551495550676]
This paper presents a new class of convergence analysis for FL, Learning Neural Kernel (FL-NTK), which corresponds to overterized Reparamterized ReLU neural networks trained by gradient descent in FL.
Theoretically, FL-NTK converges to a global-optimal solution at atrivial rate with properly tuned linear learning parameters.
arXiv Detail & Related papers (2021-05-11T13:05:53Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks [9.23835409289015]
We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective.
We demonstrate a very universal Frequency Principle (F-Principle) -- DNNs often fit target functions from low to high frequencies.
arXiv Detail & Related papers (2019-01-19T13:37:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.