Federated Learning over Harmonized Data Silos
- URL: http://arxiv.org/abs/2305.08985v1
- Date: Mon, 15 May 2023 19:55:51 GMT
- Title: Federated Learning over Harmonized Data Silos
- Authors: Dimitris Stripelis and Jose Luis Ambite
- Abstract summary: Federated Learning is a distributed machine learning approach that enables geographically distributed data silos to collaboratively learn a joint machine learning model without sharing data.
We propose an architectural vision for an end-to-end Federated Learning and Integration system, incorporating the critical steps of data harmonization and data imputation.
- Score: 0.7106986689736825
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Federated Learning is a distributed machine learning approach that enables
geographically distributed data silos to collaboratively learn a joint machine
learning model without sharing data. Most of the existing work operates on
unstructured data, such as images or text, or on structured data assumed to be
consistent across the different sites. However, sites often have different
schemata, data formats, data values, and access patterns. The field of data
integration has developed many methods to address these challenges, including
techniques for data exchange and query rewriting using declarative schema
mappings, and for entity linkage. Therefore, we propose an architectural vision
for an end-to-end Federated Learning and Integration system, incorporating the
critical steps of data harmonization and data imputation, to spur further
research on the intersection of data management information systems and machine
learning.
Related papers
- Multi-Modal Dataset Creation for Federated Learning with DICOM Structured Reports [26.2463670182172]
Federated training is often hindered by heterogeneous datasets due to divergent data storage options, inconsistent naming schemes, varied annotation procedures, and disparities in label quality.
This is particularly evident in the emerging multi-modal learning paradigms, where dataset harmonization including a uniform data representation and filtering options are of paramount importance.
We developed an open platform for data integration and interactive filtering capabilities that simplifies the process of assembling multi-modal datasets.
arXiv Detail & Related papers (2024-07-12T07:34:10Z) - Personalized Federated Learning with Contextual Modulation and
Meta-Learning [2.7716102039510564]
Federated learning has emerged as a promising approach for training machine learning models on decentralized data sources.
We propose a novel framework that combines federated learning with meta-learning techniques to enhance both efficiency and generalization capabilities.
arXiv Detail & Related papers (2023-12-23T08:18:22Z) - Privacy-Preserving Machine Learning for Collaborative Data Sharing via
Auto-encoder Latent Space Embeddings [57.45332961252628]
Privacy-preserving machine learning in data-sharing processes is an ever-critical task.
This paper presents an innovative framework that uses Representation Learning via autoencoders to generate privacy-preserving embedded data.
arXiv Detail & Related papers (2022-11-10T17:36:58Z) - FedILC: Weighted Geometric Mean and Invariant Gradient Covariance for
Federated Learning on Non-IID Data [69.0785021613868]
Federated learning is a distributed machine learning approach which enables a shared server model to learn by aggregating the locally-computed parameter updates with the training data from spatially-distributed client silos.
We propose the Federated Invariant Learning Consistency (FedILC) approach, which leverages the gradient covariance and the geometric mean of Hessians to capture both inter-silo and intra-silo consistencies.
This is relevant to various fields such as medical healthcare, computer vision, and the Internet of Things (IoT)
arXiv Detail & Related papers (2022-05-19T03:32:03Z) - Non-IID data and Continual Learning processes in Federated Learning: A
long road ahead [58.720142291102135]
Federated Learning is a novel framework that allows multiple devices or institutions to train a machine learning model collaboratively while preserving their data private.
In this work, we formally classify data statistical heterogeneity and review the most remarkable learning strategies that are able to face it.
At the same time, we introduce approaches from other machine learning frameworks, such as Continual Learning, that also deal with data heterogeneity and could be easily adapted to the Federated Learning settings.
arXiv Detail & Related papers (2021-11-26T09:57:11Z) - CateCom: a practical data-centric approach to categorization of
computational models [77.34726150561087]
We present an effort aimed at organizing the landscape of physics-based and data-driven computational models.
We apply object-oriented design concepts and outline the foundations of an open-source collaborative framework.
arXiv Detail & Related papers (2021-09-28T02:59:40Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z) - From Distributed Machine Learning to Federated Learning: A Survey [49.7569746460225]
Federated learning emerges as an efficient approach to exploit distributed data and computing resources.
We propose a functional architecture of federated learning systems and a taxonomy of related techniques.
We present the distributed training, data communication, and security of FL systems.
arXiv Detail & Related papers (2021-04-29T14:15:11Z) - IBM Federated Learning: an Enterprise Framework White Paper V0.1 [28.21579297214125]
Federated Learning (FL) is an approach to conduct machine learning without centralizing training data in a single place.
The framework applies to both Deep Neural Networks as well as traditional'' approaches for the most common machine learning libraries.
arXiv Detail & Related papers (2020-07-22T05:32:00Z) - Siamese Graph Neural Networks for Data Integration [11.41207739004894]
We propose a general approach to modeling and integrating entities from structured data, such as relational databases, as well as unstructured sources, such as free text from news articles.
Our approach is designed to explicitly model and leverage relations between entities, thereby using all available information and preserving as much context as possible.
We evaluate our method on the task of integrating data about business entities, and we demonstrate that it outperforms standard rule-based systems, as well as other deep learning approaches that do not use graph-based representations.
arXiv Detail & Related papers (2020-01-17T21:51:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.