A Survey on Federated Analytics: Taxonomy, Enabling Techniques, Applications and Open Issues
- URL: http://arxiv.org/abs/2404.12666v3
- Date: Mon, 07 Apr 2025 13:11:28 GMT
- Title: A Survey on Federated Analytics: Taxonomy, Enabling Techniques, Applications and Open Issues
- Authors: Zibo Wang, Haichao Ji, Yifei Zhu, Dan Wang, Zhu Han,
- Abstract summary: federated analytics (FA) is an emerging technique to support collaborative data analytics among diverse data owners without centralizing the raw data.<n>This survey aims to bridge the gap by first providing an overview of FA, elucidating key concepts, and discussing its relationship with similar concepts.
- Score: 28.096861605150075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The escalating influx of data generated by networked edge devices, coupled with the growing awareness of data privacy, has restricted the traditional data analytics workflow, where the edge data are gathered by a centralized server to be further utilized by data analysts. To continue leveraging vast edge data to support various data-incentive applications, computing paradigms have promoted a transformative shift from centralized data processing to privacy-preserved distributed data processing. The need to perform data analytics on private edge data motivates federated analytics (FA), an emerging technique to support collaborative data analytics among diverse data owners without centralizing the raw data. Despite the wide applications of FA in industry and academia, a comprehensive examination of existing research efforts in FA has been notably absent. This survey aims to bridge this gap by first providing an overview of FA, elucidating key concepts, and discussing its relationship with similar concepts. We then thoroughly examine FA, including its key challenges, taxonomy, and enabling techniques. Diverse FA applications, including statistical metrics, frequency-related applications, database query operations, FL-assisting FA tasks, and other wireless network applications are then carefully reviewed. We complete the survey with several open research issues, future directions, and a comprehensive lessons learned part. This survey intends to provide a holistic understanding of the emerging FA techniques and foster the continued evolution of privacy-preserving distributed data processing in the emerging networked society.
Related papers
- A Comprehensive Survey on Imbalanced Data Learning [45.3186824501823]
imbalanced data is prevalent in various types of raw data and hinders the performance of machine learning.
This survey systematically analyzes various real-world data formats.
It concludes existing researches for different data formats into four categories: data re-balancing, feature representation, training strategy, and ensemble learning.
arXiv Detail & Related papers (2025-02-13T04:53:17Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - A Comprehensive Survey on Data Augmentation [55.355273602421384]
Data augmentation is a technique that generates high-quality artificial data by manipulating existing data samples.
Existing literature surveys only focus on a certain type of specific modality data.
We propose a more enlightening taxonomy that encompasses data augmentation techniques for different common data modalities.
arXiv Detail & Related papers (2024-05-15T11:58:08Z) - Empowering Data Mesh with Federated Learning [5.087058648342379]
New paradigm, Data Mesh, treats domains as a first-class concern by distributing the data ownership from the central team to each data domain.
Many multi-million dollar organizations like Paypal, Netflix, and Zalando have already transformed their data analysis pipelines based on this new architecture.
We introduce a pioneering approach that incorporates Federated Learning into Data Mesh.
arXiv Detail & Related papers (2024-03-26T17:10:15Z) - Collaborative business intelligence virtual assistant [1.9953434933575993]
This study focuses on the applications of data mining within distributed virtual teams through the interaction of users and a CBI Virtual Assistant.
The proposed virtual assistant for CBI endeavors to enhance data exploration accessibility for a wider range of users and streamline the time and effort required for data analysis.
arXiv Detail & Related papers (2023-12-20T05:34:12Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Enabling Inter-organizational Analytics in Business Networks Through
Meta Machine Learning [0.0]
Fear of disclosing sensitive information as well as the sheer volume of the data that would need to be exchanged are key inhibitors for the creation of effective system-wide solutions.
We propose a meta machine learning method that deals with these obstacles to enable comprehensive analyses within a business network.
arXiv Detail & Related papers (2023-03-28T09:06:28Z) - A Comprehensive Survey on Source-free Domain Adaptation [69.17622123344327]
The research of Source-Free Domain Adaptation (SFDA) has drawn growing attention in recent years.
We provide a comprehensive survey of recent advances in SFDA and organize them into a unified categorization scheme.
We compare the results of more than 30 representative SFDA methods on three popular classification benchmarks.
arXiv Detail & Related papers (2023-02-23T06:32:09Z) - DataPerf: Benchmarks for Data-Centric AI Development [81.03754002516862]
DataPerf is a community-led benchmark suite for evaluating ML datasets and data-centric algorithms.
We provide an open, online platform with multiple rounds of challenges to support this iterative development.
The benchmarks, online evaluation platform, and baseline implementations are open source.
arXiv Detail & Related papers (2022-07-20T17:47:54Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Predictive analytics using Social Big Data and machine learning [6.142272540492935]
This chapter sheds the light on core aspects that lay the foundations for social big data analytics.
Various predictive analytical algorithms are introduced with their usage in several important application and top-tier tools and APIs.
arXiv Detail & Related papers (2021-04-21T19:30:45Z) - Wide-Area Data Analytics [4.080171822768553]
We increasingly live in a data-driven world, with diverse kinds of data distributed across many locations.
The Computing Community Consortium (CCC) convened a 1.5-day workshop focused on wide-area data analytics in October 2019.
This report summarizes the challenges discussed and the conclusions generated at the workshop.
arXiv Detail & Related papers (2020-06-17T22:44:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.