Distributed sequential federated learning
- URL: http://arxiv.org/abs/2302.00107v1
- Date: Tue, 31 Jan 2023 21:20:45 GMT
- Title: Distributed sequential federated learning
- Authors: Z. F. Wang, X. Y. Zhang, Y-c I. Chang
- Abstract summary: We develop a data-driven method for efficiently and effectively aggregating valued information by analyzing local data.
We use numerical studies of simulated data and an application to COVID-19 data collected from 32 hospitals in Mexico.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The analysis of data stored in multiple sites has become more popular,
raising new concerns about the security of data storage and communication.
Federated learning, which does not require centralizing data, is a common
approach to preventing heavy data transportation, securing valued data, and
protecting personal information protection. Therefore, determining how to
aggregate the information obtained from the analysis of data in separate local
sites has become an important statistical issue. The commonly used averaging
methods may not be suitable due to data nonhomogeneity and incomparable results
among individual sites, and applying them may result in the loss of information
obtained from the individual analyses. Using a sequential method in federated
learning with distributed computing can facilitate the integration and
accelerate the analysis process. We develop a data-driven method for
efficiently and effectively aggregating valued information by analyzing local
data without encountering potential issues such as information security and
heavy transportation due to data communication. In addition, the proposed
method can preserve the properties of classical sequential adaptive design,
such as data-driven sample size and estimation precision when applied to
generalized linear models. We use numerical studies of simulated data and an
application to COVID-19 data collected from 32 hospitals in Mexico, to
illustrate the proposed method.
Related papers
- A Two-Stage Federated Learning Approach for Industrial Prognostics Using Large-Scale High-Dimensional Signals [1.2277343096128712]
Industrial prognostics aims to develop data-driven methods that leverage high-dimensional degradation signals from assets to predict their failure times.
In practice, individual organizations often lack sufficient data to independently train reliable prognostic models.
This article proposes a statistical learning-based federated model that enables multiple organizations to jointly train a prognostic model.
arXiv Detail & Related papers (2024-10-14T21:26:22Z) - Geometry-Aware Instrumental Variable Regression [56.16884466478886]
We propose a transport-based IV estimator that takes into account the geometry of the data manifold through data-derivative information.
We provide a simple plug-and-play implementation of our method that performs on par with related estimators in standard settings.
arXiv Detail & Related papers (2024-05-19T17:49:33Z) - Utilizing dataset affinity prediction in object detection to assess training data [4.508868068781057]
We show the benefits of the so-called dataset affinity score by automatically selecting samples from a heterogeneous pool of vehicle datasets.
The results show that object detectors can be trained on a significantly sparser set of training samples without losing detection accuracy.
arXiv Detail & Related papers (2023-11-16T10:45:32Z) - CEDAR: Communication Efficient Distributed Analysis for Regressions [9.50726756006467]
There are growing interests about distributed learning over multiple EHRs databases without sharing patient-level data.
We propose a novel communication efficient method that aggregates the local optimal estimates, by turning the problem into a missing data problem.
We provide theoretical investigation for the properties of the proposed method for statistical inference as well as differential privacy, and evaluate its performance in simulations and real data analyses.
arXiv Detail & Related papers (2022-07-01T09:53:44Z) - Data-SUITE: Data-centric identification of in-distribution incongruous
examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data.
We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - Learning Bias-Invariant Representation by Cross-Sample Mutual
Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task.
The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator.
We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z) - Using Synthetic Data to Enhance the Accuracy of Fingerprint-Based
Localization: A Deep Learning Approach [1.6379393441314491]
We introduce a novel approach to reduce training data collection costs in fingerprint-based localization by using synthetic data.
Generative adversarial networks (GANs) are used to learn the distribution of a limited sample of collected data.
We can obtain essentially similar positioning accuracy to that which would be obtained by using the full set of collected data.
arXiv Detail & Related papers (2021-05-05T07:36:01Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Sharing Models or Coresets: A Study based on Membership Inference Attack [17.562474629669513]
Distributed machine learning aims at training a global model based on distributed data without collecting all the data to a centralized location.
Two approaches have been proposed: collecting and aggregating local models (federated learning) and collecting and training over representative data summaries (coreset)
Our experiments quantify the accuracy-privacy-cost tradeoff of each approach, and reveal a nontrivial comparison that can be used to guide the design of model training processes.
arXiv Detail & Related papers (2020-07-06T18:06:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.