Related papers: Frequency Principle in Deep Learning Beyond Gradient-descent-based Training

Frequency Principle in Deep Learning Beyond Gradient-descent-based Training

URL: http://arxiv.org/abs/2101.00747v1
Date: Mon, 4 Jan 2021 03:11:03 GMT
Title: Frequency Principle in Deep Learning Beyond Gradient-descent-based Training
Authors: Yuheng Ma, Zhi-Qin John Xu, Jiwei Zhang
Abstract summary: Frequency perspective recently makes progress in understanding deep learning. It has been widely verified that deep neural networks (DNNs) often fit the target function from low to high frequency, namely Frequency Principle (F-Principle) Previous works examine the F-Principle in gradient-descent-based training. It remains unclear whether gradient-descent-based training is a necessary condition for the F-Principle.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Frequency perspective recently makes progress in understanding deep learning. It has been widely verified in both empirical and theoretical studies that deep neural networks (DNNs) often fit the target function from low to high frequency, namely Frequency Principle (F-Principle). F-Principle sheds light on the strength and the weakness of DNNs and inspires a series of subsequent works, including theoretical studies, empirical studies and the design of efficient DNN structures etc. Previous works examine the F-Principle in gradient-descent-based training. It remains unclear whether gradient-descent-based training is a necessary condition for the F-Principle. In this paper, we show that the F-Principle exists stably in the training process of DNNs with non-gradient-descent-based training, including optimization algorithms with gradient information, such as conjugate gradient and BFGS, and algorithms without gradient information, such as Powell's method and Particle Swarm Optimization. These empirical studies show the universality of the F-Principle and provide hints for further study of F-Principle.

Related papers

Kernel Approximation of Fisher-Rao Gradient Flows [52.154685604660465]
We present a rigorous investigation of Fisher-Rao and Wasserstein type gradient flows concerning their gradient structures, flow equations, and their kernel approximations. Specifically, we focus on the Fisher-Rao geometry and its various kernel-based approximations, developing a principled theoretical framework.
arXiv Detail & Related papers (2024-10-27T22:52:08Z)
Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm [56.06235614890066]
Gradient descent (GD) and gradient descent (SGD) have been widely used in a number of application domains. This paper carefully analyzes the dynamics of GD based on the terminal attractor at different stages of its gradient flow.
arXiv Detail & Related papers (2024-09-10T14:15:56Z)
Learning by the F-adjoint [0.0]
In this work, we develop and investigate this theoretical framework to improve some supervised learning algorithm for feed-forward neural network. Our main result is that by introducing some neural dynamical model combined by the gradient descent algorithm, we derived an equilibrium F-adjoint process. Experimental results on MNIST and Fashion-MNIST datasets, demonstrate that the proposed approach provide a significant improvements on the standard back-propagation training procedure.
arXiv Detail & Related papers (2024-07-08T13:49:25Z)
On the Generalization Capability of Temporal Graph Learning Algorithms: Theoretical Insights and a Simpler Method [59.52204415829695]
Temporal Graph Learning (TGL) has become a prevalent technique across diverse real-world applications. This paper investigates the generalization ability of different TGL algorithms. We propose a simplified TGL network, which enjoys a small generalization error, improved overall performance, and lower model complexity.
arXiv Detail & Related papers (2024-02-26T08:22:22Z)
Layer-wise Feedback Propagation [53.00944147633484]
We present Layer-wise Feedback Propagation (LFP), a novel training approach for neural-network-like predictors. LFP assigns rewards to individual connections based on their respective contributions to solving a given task. We demonstrate its effectiveness in achieving comparable performance to gradient descent on various models and datasets.
arXiv Detail & Related papers (2023-08-23T10:48:28Z)
Properties and Potential Applications of Random Functional-Linked Types of Neural Networks [81.56822938033119]
Random functional-linked neural networks (RFLNNs) offer an alternative way of learning in deep structure. This paper gives some insights into the properties of RFLNNs from the viewpoints of frequency domain. We propose a method to generate a BLS network with better performance, and design an efficient algorithm for solving Poison's equation.
arXiv Detail & Related papers (2023-04-03T13:25:22Z)
A Kernel-Based View of Language Model Fine-Tuning [94.75146965041131]
We investigate whether the Neural Tangent Kernel (NTK) describes fine-tuning of pre-trained LMs. We show that formulating the downstream task as a masked word prediction problem through prompting often induces kernel-based dynamics during fine-tuning.
arXiv Detail & Related papers (2022-10-11T17:34:32Z)
Overview frequency principle/spectral bias in deep learning [8.78791231619729]
We show a Frequency Principle (F-Principle) of the training behavior of deep neural networks (DNNs) The F-Principle is first demonstrated by onedimensional synthetic data followed by the verification in high-dimensional real datasets. This low-frequency implicit bias reveals the strength of neural network in learning low-frequency functions as well as its deficiency in learning high-frequency functions.
arXiv Detail & Related papers (2022-01-19T03:08:33Z)
FL-NTK: A Neural Tangent Kernel-based Framework for Federated Learning Convergence Analysis [27.022551495550676]
This paper presents a new class of convergence analysis for FL, Learning Neural Kernel (FL-NTK), which corresponds to overterized Reparamterized ReLU neural networks trained by gradient descent in FL. Theoretically, FL-NTK converges to a global-optimal solution at atrivial rate with properly tuned linear learning parameters.
arXiv Detail & Related papers (2021-05-11T13:05:53Z)
On the exact computation of linear frequency principle dynamics and its generalization [6.380166265263755]
Recent works show an intriguing phenomenon of Frequency Principle (F-Principle) that fits the target function from low to high frequency during the training. In this paper, we derive the exact differential equation, namely Linear Frequency-Principle (LFP) model, governing the evolution of NN output function in frequency domain.
arXiv Detail & Related papers (2020-10-15T15:17:21Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks [9.23835409289015]
We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective. We demonstrate a very universal Frequency Principle (F-Principle) -- DNNs often fit target functions from low to high frequencies.
arXiv Detail & Related papers (2019-01-19T13:37:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.