Federated Functional Gradient Boosting
- URL: http://arxiv.org/abs/2103.06972v1
- Date: Thu, 11 Mar 2021 21:49:19 GMT
- Title: Federated Functional Gradient Boosting
- Authors: Zebang Shen, Hamed Hassani, Satyen Kale, Amin Karbasi
- Abstract summary: We study functional minimization in Federated Learning.
For both FFGB.C and FFGB.L, the radii of convergence shrink to zero as the feature distributions become more homogeneous.
- Score: 75.06942944563572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we initiate a study of functional minimization in Federated
Learning. First, in the semi-heterogeneous setting, when the marginal
distributions of the feature vectors on client machines are identical, we
develop the federated functional gradient boosting (FFGB) method that provably
converges to the global minimum. Subsequently, we extend our results to the
fully-heterogeneous setting (where marginal distributions of feature vectors
may differ) by designing an efficient variant of FFGB called FFGB.C, with
provable convergence to a neighborhood of the global minimum within a radius
that depends on the total variation distances between the client feature
distributions. For the special case of square loss, but still in the fully
heterogeneous setting, we design the FFGB.L method that also enjoys provable
convergence to a neighborhood of the global minimum but within a radius
depending on the much tighter Wasserstein-1 distances. For both FFGB.C and
FFGB.L, the radii of convergence shrink to zero as the feature distributions
become more homogeneous. Finally, we conduct proof-of-concept experiments to
demonstrate the benefits of our approach against natural baselines.
Related papers
- Broadening Target Distributions for Accelerated Diffusion Models via a Novel Analysis Approach [49.97755400231656]
We show that a novel accelerated DDPM sampler achieves accelerated performance for three broad distribution classes not considered before.
Our results show an improved dependency on the data dimension $d$ among accelerated DDPM type samplers.
arXiv Detail & Related papers (2024-02-21T16:11:47Z) - Accelerating Convergence in Global Non-Convex Optimization with
Reversible Diffusion [0.0]
Langevin Dynamics has been extensively in global non- optimization experiments.
Our proposed method is used to investigate the trade-off between speed and discretization error.
arXiv Detail & Related papers (2023-05-19T07:49:40Z) - Optimal Scaling for Locally Balanced Proposals in Discrete Spaces [65.14092237705476]
We show that efficiency of Metropolis-Hastings (M-H) algorithms in discrete spaces can be characterized by an acceptance rate that is independent of the target distribution.
Knowledge of the optimal acceptance rate allows one to automatically tune the neighborhood size of a proposal distribution in a discrete space, directly analogous to step-size control in continuous spaces.
arXiv Detail & Related papers (2022-09-16T22:09:53Z) - Combating Mode Collapse in GANs via Manifold Entropy Estimation [70.06639443446545]
Generative Adversarial Networks (GANs) have shown compelling results in various tasks and applications.
We propose a novel training pipeline to address the mode collapse issue of GANs.
arXiv Detail & Related papers (2022-08-25T12:33:31Z) - Disentangled Federated Learning for Tackling Attributes Skew via
Invariant Aggregation and Diversity Transferring [104.19414150171472]
Attributes skews the current federated learning (FL) frameworks from consistent optimization directions among the clients.
We propose disentangled federated learning (DFL) to disentangle the domain-specific and cross-invariant attributes into two complementary branches.
Experiments verify that DFL facilitates FL with higher performance, better interpretability, and faster convergence rate, compared with SOTA FL methods.
arXiv Detail & Related papers (2022-06-14T13:12:12Z) - Optimal 1-Wasserstein Distance for WGANs [2.1174215880331775]
We provide a thorough analysis of Wasserstein GANs (WGANs) in both the finite sample and regimes.
We derive in passing new results on optimal transport theory in the semi-discrete setting.
arXiv Detail & Related papers (2022-01-08T13:04:03Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint
Support [27.165565512841656]
We study the gradient flow for a relaxed approximation to the Kullback-Leibler divergence between a moving source and a fixed target distribution.
This approximation, termed the KALE (KL approximate lower-bound estimator), solves a regularized version of the Fenchel dual problem defining the KL over a restricted class of functions.
arXiv Detail & Related papers (2021-06-16T16:37:43Z) - Quantitative Rates and Fundamental Obstructions to Non-Euclidean
Universal Approximation with Deep Narrow Feed-Forward Networks [3.8073142980733]
We quantify the number of narrow layers required for "deep geometric feed-forward neural networks"
We find that both the global and local universal approximation guarantees can only coincide when approximating null-homotopic functions.
arXiv Detail & Related papers (2021-01-13T23:29:40Z) - Reciprocal Adversarial Learning via Characteristic Functions [12.961770002117142]
Generative adversarial nets (GANs) have become a preferred tool for tasks involving complicated distributions.
We show how to use the characteristic function (CF) to compare the distributions rather than their moments.
We then prove an equivalence between the embedded and data domains when a reciprocal exists, where we naturally develop the GAN in an auto-encoder structure.
This efficient structure uses only two modules, together with a simple training strategy, to achieve bi-directionally generating clear images.
arXiv Detail & Related papers (2020-06-15T14:04:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.