Distributed sequential federated learning
- URL: http://arxiv.org/abs/2302.00107v1
- Date: Tue, 31 Jan 2023 21:20:45 GMT
- Title: Distributed sequential federated learning
- Authors: Z. F. Wang, X. Y. Zhang, Y-c I. Chang
- Abstract summary: We develop a data-driven method for efficiently and effectively aggregating valued information by analyzing local data.
We use numerical studies of simulated data and an application to COVID-19 data collected from 32 hospitals in Mexico.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The analysis of data stored in multiple sites has become more popular,
raising new concerns about the security of data storage and communication.
Federated learning, which does not require centralizing data, is a common
approach to preventing heavy data transportation, securing valued data, and
protecting personal information protection. Therefore, determining how to
aggregate the information obtained from the analysis of data in separate local
sites has become an important statistical issue. The commonly used averaging
methods may not be suitable due to data nonhomogeneity and incomparable results
among individual sites, and applying them may result in the loss of information
obtained from the individual analyses. Using a sequential method in federated
learning with distributed computing can facilitate the integration and
accelerate the analysis process. We develop a data-driven method for
efficiently and effectively aggregating valued information by analyzing local
data without encountering potential issues such as information security and
heavy transportation due to data communication. In addition, the proposed
method can preserve the properties of classical sequential adaptive design,
such as data-driven sample size and estimation precision when applied to
generalized linear models. We use numerical studies of simulated data and an
application to COVID-19 data collected from 32 hospitals in Mexico, to
illustrate the proposed method.
Related papers
- Geometry-Aware Instrumental Variable Regression [56.16884466478886]
We propose a transport-based IV estimator that takes into account the geometry of the data manifold through data-derivative information.
We provide a simple plug-and-play implementation of our method that performs on par with related estimators in standard settings.
arXiv Detail & Related papers (2024-05-19T17:49:33Z) - Utilizing dataset affinity prediction in object detection to assess training data [4.508868068781057]
We show the benefits of the so-called dataset affinity score by automatically selecting samples from a heterogeneous pool of vehicle datasets.
The results show that object detectors can be trained on a significantly sparser set of training samples without losing detection accuracy.
arXiv Detail & Related papers (2023-11-16T10:45:32Z) - Another Use of SMOTE for Interpretable Data Collaboration Analysis [8.143750358586072]
Data collaboration (DC) analysis has been developed for privacy-preserving integrated analysis across multiple institutions.
This study proposes an anchor data construction technique to improve the recognition performance without increasing the risk of data leakage.
arXiv Detail & Related papers (2022-08-26T06:39:13Z) - CEDAR: Communication Efficient Distributed Analysis for Regressions [9.50726756006467]
There are growing interests about distributed learning over multiple EHRs databases without sharing patient-level data.
We propose a novel communication efficient method that aggregates the local optimal estimates, by turning the problem into a missing data problem.
We provide theoretical investigation for the properties of the proposed method for statistical inference as well as differential privacy, and evaluate its performance in simulations and real data analyses.
arXiv Detail & Related papers (2022-07-01T09:53:44Z) - Data-SUITE: Data-centric identification of in-distribution incongruous
examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data.
We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - Learning Bias-Invariant Representation by Cross-Sample Mutual
Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task.
The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator.
We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z) - Using Synthetic Data to Enhance the Accuracy of Fingerprint-Based
Localization: A Deep Learning Approach [1.6379393441314491]
We introduce a novel approach to reduce training data collection costs in fingerprint-based localization by using synthetic data.
Generative adversarial networks (GANs) are used to learn the distribution of a limited sample of collected data.
We can obtain essentially similar positioning accuracy to that which would be obtained by using the full set of collected data.
arXiv Detail & Related papers (2021-05-05T07:36:01Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Sharing Models or Coresets: A Study based on Membership Inference Attack [17.562474629669513]
Distributed machine learning aims at training a global model based on distributed data without collecting all the data to a centralized location.
Two approaches have been proposed: collecting and aggregating local models (federated learning) and collecting and training over representative data summaries (coreset)
Our experiments quantify the accuracy-privacy-cost tradeoff of each approach, and reveal a nontrivial comparison that can be used to guide the design of model training processes.
arXiv Detail & Related papers (2020-07-06T18:06:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.