Federated t-SNE and UMAP for Distributed Data Visualization
- URL: http://arxiv.org/abs/2412.13495v1
- Date: Wed, 18 Dec 2024 04:33:11 GMT
- Title: Federated t-SNE and UMAP for Distributed Data Visualization
- Authors: Dong Qiao, Xinxian Ma, Jicong Fan,
- Abstract summary: Big data is often distributed across multiple data centers and subject to security and privacy concerns.
This work proposes Fed-tSNE and Fed-UMAP, which provide high-dimensional data visualization without exchanging data across clients or sending data to the central server.
- Score: 20.58663155344881
- License:
- Abstract: High-dimensional data visualization is crucial in the big data era and these techniques such as t-SNE and UMAP have been widely used in science and engineering. Big data, however, is often distributed across multiple data centers and subject to security and privacy concerns, which leads to difficulties for the standard algorithms of t-SNE and UMAP. To tackle the challenge, this work proposes Fed-tSNE and Fed-UMAP, which provide high-dimensional data visualization under the framework of federated learning, without exchanging data across clients or sending data to the central server. The main idea of Fed-tSNE and Fed-UMAP is implicitly learning the distribution information of data in a manner of federated learning and then estimating the global distance matrix for t-SNE and UMAP. To further enhance the protection of data privacy, we propose Fed-tSNE+ and Fed-UMAP+. We also extend our idea to federated spectral clustering, yielding algorithms of clustering distributed data. In addition to these new algorithms, we offer theoretical guarantees of optimization convergence, distance and similarity estimation, and differential privacy. Experiments on multiple datasets demonstrate that, compared to the original algorithms, the accuracy drops of our federated algorithms are tiny.
Related papers
- An Empirical Study of Efficiency and Privacy of Federated Learning
Algorithms [2.994794762377111]
In today's world, the rapid expansion of IoT networks and the proliferation of smart devices have resulted in the generation of substantial amounts of heterogeneous data.
To handle this data effectively, advanced data processing technologies are necessary to guarantee the preservation of both privacy and efficiency.
Federated learning emerged as a distributed learning method that trains models locally and aggregates them on a server to preserve data privacy.
arXiv Detail & Related papers (2023-12-24T00:13:41Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - Benchmarking FedAvg and FedCurv for Image Classification Tasks [1.376408511310322]
This paper focuses on the problem of statistical heterogeneity of the data in the same federated network.
Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv) have already been proposed.
As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
arXiv Detail & Related papers (2023-03-31T10:13:01Z) - Fed-TDA: Federated Tabular Data Augmentation on Non-IID Data [7.5178093283247165]
Non-independent and identically distributed (non-IID) data is a key challenge in federated learning (FL)
Existing data augmentation methods based on federated generative models or raw data sharing strategies for solving the non-IID problem still suffer from low performance, privacy protection concerns, and high communication overhead.
We propose Fed-TDA, which synthesizes data for data augmentation using some simple statistics.
arXiv Detail & Related papers (2022-11-22T02:17:15Z) - Mitigating Data Heterogeneity in Federated Learning with Data
Augmentation [26.226057709504733]
Federated Learning (FL) is a framework that enables training a centralized model while securing user privacy by fusing local, decentralized models.
One major obstacle is data heterogeneity, i.e., each client having non-identically and independently distributed (non-IID) data.
Recent evidence suggests that data augmentation can induce equal or greater performance.
arXiv Detail & Related papers (2022-06-20T19:47:43Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - FedMix: Approximation of Mixup under Mean Augmented Federated Learning [60.503258658382]
Federated learning (FL) allows edge devices to collectively learn a model without directly sharing data within each device.
Current state-of-the-art algorithms suffer from performance degradation as the heterogeneity of local data across clients increases.
We propose a new augmentation algorithm, named FedMix, which is inspired by a phenomenal yet simple data augmentation method, Mixup.
arXiv Detail & Related papers (2021-07-01T06:14:51Z) - Improving Federated Relational Data Modeling via Basis Alignment and
Weight Penalty [18.096788806121754]
Federated learning (FL) has attracted increasing attention in recent years.
We present a modified version of the graph neural network algorithm that performs federated modeling over Knowledge Graph (KG)
We propose a novel optimization algorithm, named FedAlign, with 1) optimal transportation (OT) for on-client personalization and 2) weight constraint to speed up the convergence.
Empirical results show that our proposed method outperforms the state-of-the-art FL methods, such as FedAVG and FedProx, with better convergence.
arXiv Detail & Related papers (2020-11-23T12:52:18Z) - Privacy-Preserving Asynchronous Federated Learning Algorithms for
Multi-Party Vertically Collaborative Learning [151.47900584193025]
We propose an asynchronous federated SGD (AFSGD-VP) algorithm and its SVRG and SAGA variants on the vertically partitioned data.
To the best of our knowledge, AFSGD-VP and its SVRG and SAGA variants are the first asynchronous federated learning algorithms for vertically partitioned data.
arXiv Detail & Related papers (2020-08-14T08:08:15Z) - Federated Doubly Stochastic Kernel Learning for Vertically Partitioned
Data [93.76907759950608]
We propose a doubly kernel learning algorithm for vertically partitioned data.
We show that FDSKL is significantly faster than state-of-the-art federated learning methods when dealing with kernels.
arXiv Detail & Related papers (2020-08-14T05:46:56Z) - Privacy-preserving Traffic Flow Prediction: A Federated Learning
Approach [61.64006416975458]
We propose a privacy-preserving machine learning technique named Federated Learning-based Gated Recurrent Unit neural network algorithm (FedGRU) for traffic flow prediction.
FedGRU differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism.
It is shown that FedGRU's prediction accuracy is 90.96% higher than the advanced deep learning models.
arXiv Detail & Related papers (2020-03-19T13:07:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.