FeDXL: Provable Federated Learning for Deep X-Risk Optimization
- URL: http://arxiv.org/abs/2210.14396v4
- Date: Fri, 18 Aug 2023 03:18:51 GMT
- Title: FeDXL: Provable Federated Learning for Deep X-Risk Optimization
- Authors: Zhishuai Guo, Rong Jin, Jiebo Luo, Tianbao Yang
- Abstract summary: We tackle a novel federated learning (FL) problem for optimizing a family of X-risks, to which no existing algorithms are applicable.
The challenges for designing an FL algorithm for X-risks lie in the non-decomability of the objective over multiple machines and the interdependency between different machines.
- Score: 105.17383135458897
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we tackle a novel federated learning (FL) problem for
optimizing a family of X-risks, to which no existing FL algorithms are
applicable. In particular, the objective has the form of $\mathbb E_{z\sim S_1}
f(\mathbb E_{z'\sim S_2} \ell(w; z, z'))$, where two sets of data $S_1, S_2$
are distributed over multiple machines, $\ell(\cdot)$ is a pairwise loss that
only depends on the prediction outputs of the input data pairs $(z, z')$, and
$f(\cdot)$ is possibly a non-linear non-convex function. This problem has
important applications in machine learning, e.g., AUROC maximization with a
pairwise loss, and partial AUROC maximization with a compositional loss. The
challenges for designing an FL algorithm for X-risks lie in the
non-decomposability of the objective over multiple machines and the
interdependency between different machines. To this end, we propose an
active-passive decomposition framework that decouples the gradient's components
with two types, namely active parts and passive parts, where the active parts
depend on local data that are computed with the local model and the passive
parts depend on other machines that are communicated/computed based on
historical models and samples. Under this framework, we develop two provable FL
algorithms (FeDXL) for handling linear and nonlinear $f$, respectively, based
on federated averaging and merging. We develop a novel theoretical analysis to
combat the latency of the passive parts and the interdependency between the
local model parameters and the involved data for computing local gradient
estimators. We establish both iteration and communication complexities and show
that using the historical samples and models for computing the passive parts do
not degrade the complexities. We conduct empirical studies of FeDXL for deep
AUROC and partial AUROC maximization, and demonstrate their performance
compared with several baselines.
Related papers
- Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization [65.8915778873691]
conditional distributions is a central problem in machine learning.
We propose a new learning paradigm that integrates both paired and unpaired data.
Our approach also connects intriguingly with inverse entropic optimal transport (OT)
arXiv Detail & Related papers (2024-10-03T16:12:59Z) - Semi-Federated Learning: Convergence Analysis and Optimization of A
Hybrid Learning Framework [70.83511997272457]
We propose a semi-federated learning (SemiFL) paradigm to leverage both the base station (BS) and devices for a hybrid implementation of centralized learning (CL) and FL.
We propose a two-stage algorithm to solve this intractable problem, in which we provide the closed-form solutions to the beamformers.
arXiv Detail & Related papers (2023-10-04T03:32:39Z) - DFedADMM: Dual Constraints Controlled Model Inconsistency for
Decentralized Federated Learning [52.83811558753284]
Decentralized learning (DFL) discards the central server and establishes a decentralized communication network.
Existing DFL methods still suffer from two major challenges: local inconsistency and local overfitting.
arXiv Detail & Related papers (2023-08-16T11:22:36Z) - Federated Empirical Risk Minimization via Second-Order Method [18.548661105227488]
We present an interior point method (IPM) to solve a general empirical risk minimization problem under the federated learning setting.
We show that the communication complexity of each iteration of our IPM is $tildeO(d3/2)$, where $d$ is the dimension (i.e., number of features) of the dataset.
arXiv Detail & Related papers (2023-05-27T14:23:14Z) - $\textit{FastSVD-ML-ROM}$: A Reduced-Order Modeling Framework based on
Machine Learning for Real-Time Applications [0.0]
High-fidelity numerical simulations constitute the backbone of engineering design.
Reduced order models (ROMs) are employed to approximate the high-fidelity solutions.
The present work proposes a new machine learning (ML) platform for the development of ROMs.
arXiv Detail & Related papers (2022-07-24T23:11:07Z) - An Improved Analysis of Gradient Tracking for Decentralized Machine
Learning [34.144764431505486]
We consider decentralized machine learning over a network where the training data is distributed across $n$ agents.
The agent's common goal is to find a model that minimizes the average of all local loss functions.
We improve the dependency on $p$ from $mathcalO(p-1)$ to $mathcalO(p-1)$ in the noiseless case.
arXiv Detail & Related papers (2022-02-08T12:58:14Z) - Parallel Successive Learning for Dynamic Distributed Model Training over
Heterogeneous Wireless Networks [50.68446003616802]
Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices.
We develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions.
Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning.
arXiv Detail & Related papers (2022-02-07T05:11:01Z) - FedLGA: Towards System-Heterogeneity of Federated Learning via Local
Gradient Approximation [21.63719641718363]
We formalize the system-heterogeneous FL problem and propose a new algorithm, called FedLGA, which addresses this problem by bridging the divergence local model updates via epoch approximation.
The results of comprehensive experiments on multiple datasets show that FedLGA outperforms current FL benchmarks against the system-heterogeneity.
arXiv Detail & Related papers (2021-12-22T16:05:09Z) - Joint Majorization-Minimization for Nonnegative Matrix Factorization
with the $\beta$-divergence [4.468952886990851]
This article proposes new multiplicative updates for nonnegative matrix factorization (NMF) with the $beta$-divergence objective function.
We report experimental results using diverse datasets: face images, an audio spectrogram, hyperspectral data and song play counts.
arXiv Detail & Related papers (2021-06-29T09:58:21Z) - Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal
Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error.
We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.