PAPAYA Federated Analytics Stack: Engineering Privacy, Scalability and Practicality
- URL: http://arxiv.org/abs/2412.02340v2
- Date: Thu, 27 Mar 2025 13:25:14 GMT
- Title: PAPAYA Federated Analytics Stack: Engineering Privacy, Scalability and Practicality
- Authors: Harish Srinivas, Graham Cormode, Mehrdad Honarkhah, Samuel Lurye, Jonathan Hehir, Lunwen He, George Hong, Ahmed Magdy, Dzmitry Huba, Kaikai Wang, Shen Guo, Shoubhik Bhattacharya,
- Abstract summary: Cross-device Federated Analytics (FA) is a distributed computation paradigm designed to answer analytics queries about and derive insights from data held locally on users' devices.<n>Despite FA's broad relevance, the applicability of existing FA systems is limited by compromised accuracy; lack of flexibility for data analytics; and an inability to scale effectively.<n>We describe our approach to combine privacy, scalability, and practicality to build and deploy a system that overcomes these limitations.
- Score: 5.276674920508729
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-device Federated Analytics (FA) is a distributed computation paradigm designed to answer analytics queries about and derive insights from data held locally on users' devices. On-device computations combined with other privacy and security measures ensure that only minimal data is transmitted off-device, achieving a high standard of data protection. Despite FA's broad relevance, the applicability of existing FA systems is limited by compromised accuracy; lack of flexibility for data analytics; and an inability to scale effectively. In this paper, we describe our approach to combine privacy, scalability, and practicality to build and deploy a system that overcomes these limitations. Our FA system leverages trusted execution environments (TEEs) and optimizes the use of on-device computing resources to facilitate federated data processing across large fleets of devices, while ensuring robust, defensible, and verifiable privacy safeguards. We focus on federated analytics (statistics and monitoring), in contrast to systems for federated learning (ML workloads), and we flag the key differences.
Related papers
- Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks [63.541114376141735]
Large language models (LLMs) are proliferating rapidly at the edge, delivering intelligent capabilities across diverse application scenarios.<n>However, their practical deployment in collaborative scenarios confronts fundamental challenges: privacy vulnerabilities, communication overhead, and computational bottlenecks.<n>We propose Federated Attention (FedAttn), which integrates the federated paradigm into the self-attention mechanism.
arXiv Detail & Related papers (2025-11-04T15:14:58Z) - Subgraph Federated Learning via Spectral Methods [52.40322201034717]
FedLap is a novel framework that captures inter-node dependencies while ensuring privacy and scalability.<n>We provide a formal analysis of the privacy of FedLap, demonstrating that it preserves privacy.
arXiv Detail & Related papers (2025-10-29T16:22:32Z) - Toward provably private analytics and insights into GenAI use [12.545209220189113]
We present a next-generation federated analytics system based on technologies like AMD SEV-SNP and Intel TDX.<n>In our system, devices encrypt and upload data, tagging it with a limited set of allowable server-side processing steps.<n>An open source, TEE-hosted key management service guarantees that the data is only accessible to those steps.
arXiv Detail & Related papers (2025-10-24T17:40:12Z) - FedMicro-IDA: A Federated Learning and Microservices-based Framework for IoT Data Analytics [0.5686018066666573]
Internet of Things (IoT) data aids in providing efficient data analytics for a variety of prevalent and crucial applications.<n>Data analytics techniques were proposed to collect and analyze data in edge or fog devices.<n>Federated learning has been recommended as an ideal distributed machine/deep learning-based technique for edge/fog computing environments.
arXiv Detail & Related papers (2025-10-22T04:57:47Z) - Emerging Paradigms for Securing Federated Learning Systems [0.0]
Methods such as MPC, Homomorphic Encryption (HE), and Differential Privacy (DP) often incur high compu- tational costs and suffer from limited scalability.<n>This survey examines emerging approaches that hold promise for enhancing both privacy and efficiency in Federated Learning.<n>For each paradigm, we assess its relevance to the FL pipeline, outlining its strengths, limitations, and practical considerations.
arXiv Detail & Related papers (2025-09-25T13:34:44Z) - Closer to Reality: Practical Semi-Supervised Federated Learning for Foundation Model Adaptation [56.36237936346563]
Foundation models (FMs) exhibit remarkable generalization but require adaptation to downstream tasks.<n>Due to data privacy regulations, cloud-based FMs cannot directly access private edge data.<n>We introduce Practical Semi-Supervised Federated Learning (PSSFL), where edge devices hold only unlabeled, low-resolution data.<n>Our work paves the way for scalable and privacy-preserving FM adaptation in federated scenarios.
arXiv Detail & Related papers (2025-08-22T17:47:02Z) - A Federated Random Forest Solution for Secure Distributed Machine Learning [44.99833362998488]
This paper introduces a federated learning framework for Random Forest classifiers that preserves data privacy and provides robust performance in distributed settings.<n>By leveraging PySyft for secure, privacy-aware computation, our method enables multiple institutions to collaboratively train Random Forest models on locally stored data.<n>Experiments on two real-world healthcare benchmarks demonstrate that the federated approach maintains competitive accuracy - within a maximum 9% margin of centralized methods.
arXiv Detail & Related papers (2025-05-12T21:40:35Z) - Trustworthy AI: Securing Sensitive Data in Large Language Models [0.0]
Large Language Models (LLMs) have transformed natural language processing (NLP) by enabling robust text generation and understanding.
This paper proposes a comprehensive framework for embedding trust mechanisms into LLMs to dynamically control the disclosure of sensitive information.
arXiv Detail & Related papers (2024-09-26T19:02:33Z) - Confidential Federated Computations [16.415880530250092]
Federated Learning and Analytics (FLA) have seen widespread adoption by technology platforms for processing sensitive on-device data.
FLA systems do not necessarily require anonymization mechanisms like differential privacy (DP)
This paper introduces a novel system architecture that leverages trusted execution environments (TEEs) and open-sourcing to ensure confidentiality of server-side computations.
arXiv Detail & Related papers (2024-04-16T17:47:27Z) - Effective Intrusion Detection in Heterogeneous Internet-of-Things Networks via Ensemble Knowledge Distillation-based Federated Learning [52.6706505729803]
We introduce Federated Learning (FL) to collaboratively train a decentralized shared model of Intrusion Detection Systems (IDS)
FLEKD enables a more flexible aggregation method than conventional model fusion techniques.
Experiment results show that the proposed approach outperforms local training and traditional FL in terms of both speed and performance.
arXiv Detail & Related papers (2024-01-22T14:16:37Z) - Filling the Missing: Exploring Generative AI for Enhanced Federated
Learning over Heterogeneous Mobile Edge Devices [72.61177465035031]
We propose a generative AI-empowered federated learning to address these challenges by leveraging the idea of FIlling the MIssing (FIMI) portion of local data.
Experiment results demonstrate that FIMI can save up to 50% of the device-side energy to achieve the target global test accuracy.
arXiv Detail & Related papers (2023-10-21T12:07:04Z) - Libertas: Privacy-Preserving Computation for Decentralised Personal Data Stores [19.54818218429241]
We propose a modular design for integrating Secure Multi-Party Computation with Solid.
Our architecture, Libertas, requires no protocol level changes in the underlying design of Solid.
We show how this can be combined with existing differential privacy techniques to also ensure output privacy.
arXiv Detail & Related papers (2023-09-28T12:07:40Z) - UFed-GAN: A Secure Federated Learning Framework with Constrained
Computation and Unlabeled Data [50.13595312140533]
We propose a novel framework of UFed-GAN: Unsupervised Federated Generative Adversarial Network, which can capture user-side data distribution without local classification training.
Our experimental results demonstrate the strong potential of UFed-GAN in addressing limited computational resources and unlabeled data while preserving privacy.
arXiv Detail & Related papers (2023-08-10T22:52:13Z) - Tool-Supported Architecture-Based Data Flow Analysis for Confidentiality [1.6544671438664054]
We reimplemented a data flow analysis as a Java-based tool to identify access violations based on the data flow.
The evaluation for our tool indicates that we can analyze similar scenarios and scale for certain scenarios better than the existing analysis.
arXiv Detail & Related papers (2023-08-03T09:21:20Z) - Federated Learning for Computationally-Constrained Heterogeneous
Devices: A Survey [3.219812767529503]
Federated learning (FL) offers a privacy-preserving trade-off between communication overhead and model accuracy.
We outline the challengesFL has to overcome to be widely applicable in real-world applications.
arXiv Detail & Related papers (2023-07-18T12:05:36Z) - PS-FedGAN: An Efficient Federated Learning Framework Based on Partially
Shared Generative Adversarial Networks For Data Privacy [56.347786940414935]
Federated Learning (FL) has emerged as an effective learning paradigm for distributed computation.
This work proposes a novel FL framework that requires only partial GAN model sharing.
Named as PS-FedGAN, this new framework enhances the GAN releasing and training mechanism to address heterogeneous data distributions.
arXiv Detail & Related papers (2023-05-19T05:39:40Z) - Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive
Privacy Analysis and Beyond [57.10914865054868]
We consider vertical logistic regression (VLR) trained with mini-batch descent gradient.
We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks.
arXiv Detail & Related papers (2022-07-19T05:47:30Z) - Federated Stochastic Gradient Descent Begets Self-Induced Momentum [151.4322255230084]
Federated learning (FL) is an emerging machine learning method that can be applied in mobile edge systems.
We show that running to the gradient descent (SGD) in such a setting can be viewed as adding a momentum-like term to the global aggregation process.
arXiv Detail & Related papers (2022-02-17T02:01:37Z) - PrivacyFL: A simulator for privacy-preserving and secure federated
learning [2.578242050187029]
Federated learning is a technique that enables distributed clients to collaboratively learn a shared machine learning model.
PrivacyFL is a privacy-preserving and secure federated learning simulator.
arXiv Detail & Related papers (2020-02-19T20:16:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.