Related papers: Distributed sequential federated learning

Distributed sequential federated learning

URL: http://arxiv.org/abs/2302.00107v1
Date: Tue, 31 Jan 2023 21:20:45 GMT
Title: Distributed sequential federated learning
Authors: Z. F. Wang, X. Y. Zhang, Y-c I. Chang
Abstract summary: We develop a data-driven method for efficiently and effectively aggregating valued information by analyzing local data. We use numerical studies of simulated data and an application to COVID-19 data collected from 32 hospitals in Mexico.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The analysis of data stored in multiple sites has become more popular, raising new concerns about the security of data storage and communication. Federated learning, which does not require centralizing data, is a common approach to preventing heavy data transportation, securing valued data, and protecting personal information protection. Therefore, determining how to aggregate the information obtained from the analysis of data in separate local sites has become an important statistical issue. The commonly used averaging methods may not be suitable due to data nonhomogeneity and incomparable results among individual sites, and applying them may result in the loss of information obtained from the individual analyses. Using a sequential method in federated learning with distributed computing can facilitate the integration and accelerate the analysis process. We develop a data-driven method for efficiently and effectively aggregating valued information by analyzing local data without encountering potential issues such as information security and heavy transportation due to data communication. In addition, the proposed method can preserve the properties of classical sequential adaptive design, such as data-driven sample size and estimation precision when applied to generalized linear models. We use numerical studies of simulated data and an application to COVID-19 data collected from 32 hospitals in Mexico, to illustrate the proposed method.

Related papers

A Privacy-Preserving Data Collection Method for Diversified Statistical Analysis [11.135689359531105]
This paper proposes a novel real-value negative survey model, termed RVNS, for the first time in the field of real-value sensitive information collection.<n>The RVNS model exempts users from the necessity of discretizing their data and only requires them to sample a set of data from a range that deviates from their actual sensitive details.
arXiv Detail & Related papers (2025-07-23T04:05:33Z)
Capturing the Temporal Dependence of Training Data Influence [100.91355498124527]
We formalize the concept of trajectory-specific leave-one-out influence, which quantifies the impact of removing a data point during training. We propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO. As data value embedding captures training data ordering, it offers valuable insights into model training dynamics.
arXiv Detail & Related papers (2024-12-12T18:28:55Z)
A Two-Stage Federated Learning Approach for Industrial Prognostics Using Large-Scale High-Dimensional Signals [1.2277343096128712]
Industrial prognostics aims to develop data-driven methods that leverage high-dimensional degradation signals from assets to predict their failure times. In practice, individual organizations often lack sufficient data to independently train reliable prognostic models. This article proposes a statistical learning-based federated model that enables multiple organizations to jointly train a prognostic model.
arXiv Detail & Related papers (2024-10-14T21:26:22Z)
Geometry-Aware Instrumental Variable Regression [56.16884466478886]
We propose a transport-based IV estimator that takes into account the geometry of the data manifold through data-derivative information. We provide a simple plug-and-play implementation of our method that performs on par with related estimators in standard settings.
arXiv Detail & Related papers (2024-05-19T17:49:33Z)
Utilizing dataset affinity prediction in object detection to assess training data [4.508868068781057]
We show the benefits of the so-called dataset affinity score by automatically selecting samples from a heterogeneous pool of vehicle datasets. The results show that object detectors can be trained on a significantly sparser set of training samples without losing detection accuracy.
arXiv Detail & Related papers (2023-11-16T10:45:32Z)
CEDAR: Communication Efficient Distributed Analysis for Regressions [9.50726756006467]
There are growing interests about distributed learning over multiple EHRs databases without sharing patient-level data. We propose a novel communication efficient method that aggregates the local optimal estimates, by turning the problem into a missing data problem. We provide theoretical investigation for the properties of the proposed method for statistical inference as well as differential privacy, and evaluate its performance in simulations and real data analyses.
arXiv Detail & Related papers (2022-07-01T09:53:44Z)
Data-SUITE: Data-centric identification of in-distribution incongruous examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data. We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z)
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z)
Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task. The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator. We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z)
Using Synthetic Data to Enhance the Accuracy of Fingerprint-Based Localization: A Deep Learning Approach [1.6379393441314491]
We introduce a novel approach to reduce training data collection costs in fingerprint-based localization by using synthetic data. Generative adversarial networks (GANs) are used to learn the distribution of a limited sample of collected data. We can obtain essentially similar positioning accuracy to that which would be obtained by using the full set of collected data.
arXiv Detail & Related papers (2021-05-05T07:36:01Z)
Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines. Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
Sharing Models or Coresets: A Study based on Membership Inference Attack [17.562474629669513]
Distributed machine learning aims at training a global model based on distributed data without collecting all the data to a centralized location. Two approaches have been proposed: collecting and aggregating local models (federated learning) and collecting and training over representative data summaries (coreset) Our experiments quantify the accuracy-privacy-cost tradeoff of each approach, and reveal a nontrivial comparison that can be used to guide the design of model training processes.
arXiv Detail & Related papers (2020-07-06T18:06:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.