FL-NTK: A Neural Tangent Kernel-based Framework for Federated Learning
Convergence Analysis
- URL: http://arxiv.org/abs/2105.05001v1
- Date: Tue, 11 May 2021 13:05:53 GMT
- Title: FL-NTK: A Neural Tangent Kernel-based Framework for Federated Learning
Convergence Analysis
- Authors: Baihe Huang, Xiaoxiao Li, Zhao Song, Xin Yang
- Abstract summary: This paper presents a new class of convergence analysis for FL, Learning Neural Kernel (FL-NTK), which corresponds to overterized Reparamterized ReLU neural networks trained by gradient descent in FL.
Theoretically, FL-NTK converges to a global-optimal solution at atrivial rate with properly tuned linear learning parameters.
- Score: 27.022551495550676
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Federated Learning (FL) is an emerging learning scheme that allows different
distributed clients to train deep neural networks together without data
sharing. Neural networks have become popular due to their unprecedented
success. To the best of our knowledge, the theoretical guarantees of FL
concerning neural networks with explicit forms and multi-step updates are
unexplored. Nevertheless, training analysis of neural networks in FL is
non-trivial for two reasons: first, the objective loss function we are
optimizing is non-smooth and non-convex, and second, we are even not updating
in the gradient direction. Existing convergence results for gradient
descent-based methods heavily rely on the fact that the gradient direction is
used for updating. This paper presents a new class of convergence analysis for
FL, Federated Learning Neural Tangent Kernel (FL-NTK), which corresponds to
overparamterized ReLU neural networks trained by gradient descent in FL and is
inspired by the analysis in Neural Tangent Kernel (NTK). Theoretically, FL-NTK
converges to a global-optimal solution at a linear rate with properly tuned
learning parameters. Furthermore, with proper distributional assumptions,
FL-NTK can also achieve good generalization.
Related papers
- Approximation and Gradient Descent Training with Neural Networks [0.0]
Recent work extends a neural tangent kernel (NTK) optimization argument to an under-parametrized regime.
This paper establishes analogous results for networks trained by gradient descent.
arXiv Detail & Related papers (2024-05-19T23:04:09Z) - Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network.
We provide analytical expressions for these speed limits for linear and linearizable neural networks.
Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - A global convergence theory for deep ReLU implicit networks via
over-parameterization [26.19122384935622]
Implicit deep learning has received increasing attention recently.
This paper analyzes the gradient flow of Rectified Linear Unit (ReLU) activated implicit neural networks.
arXiv Detail & Related papers (2021-10-11T23:22:50Z) - Convergence rates for gradient descent in the training of
overparameterized artificial neural networks with biases [3.198144010381572]
In recent years, artificial neural networks have developed into a powerful tool for dealing with a multitude of problems for which classical solution approaches.
It is still unclear why randomly gradient descent algorithms reach their limits.
arXiv Detail & Related papers (2021-02-23T18:17:47Z) - A Novel Neural Network Training Framework with Data Assimilation [2.948167339160823]
A gradient-free training framework based on data assimilation is proposed to avoid the calculation of gradients.
The results show that the proposed training framework performed better than the gradient decent method.
arXiv Detail & Related papers (2020-10-06T11:12:23Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.