SynthGuard: Redefining Synthetic Data Generation with a Scalable and Privacy-Preserving Workflow Framework
- URL: http://arxiv.org/abs/2507.10489v1
- Date: Mon, 14 Jul 2025 17:11:20 GMT
- Title: SynthGuard: Redefining Synthetic Data Generation with a Scalable and Privacy-Preserving Workflow Framework
- Authors: Eduardo Brito, Mahmoud Shoush, Kristian Tamm, Paula Etti, Liina Kamm,
- Abstract summary: Growing reliance on data-driven applications in sectors such as healthcare, finance, and law enforcement underscores the need for secure, privacy-preserving, and scalable mechanisms for data generation and sharing.<n>We introduce SynthGuard, a framework designed to ensure computational governance by enabling data owners to maintain control over synthetic data.
- Score: 0.873811641236639
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The growing reliance on data-driven applications in sectors such as healthcare, finance, and law enforcement underscores the need for secure, privacy-preserving, and scalable mechanisms for data generation and sharing. Synthetic data generation (SDG) has emerged as a promising approach but often relies on centralized or external processing, raising concerns about data sovereignty, domain ownership, and compliance with evolving regulatory standards. To overcome these issues, we introduce SynthGuard, a framework designed to ensure computational governance by enabling data owners to maintain control over SDG workflows. SynthGuard supports modular and privacy-preserving workflows, ensuring secure, auditable, and reproducible execution across diverse environments. In this paper, we demonstrate how SynthGuard addresses the complexities at the intersection of domain-specific needs and scalable SDG by aligning with requirements for data sovereignty and regulatory compliance. Developed iteratively with domain expert input, SynthGuard has been validated through real-world use cases, demonstrating its ability to balance security, privacy, and scalability while ensuring compliance. The evaluation confirms its effectiveness in implementing and executing SDG workflows and integrating privacy and utility assessments across various computational environments.
Related papers
- Blockchain Powered Edge Intelligence for U-Healthcare in Privacy Critical and Time Sensitive Environment [0.559239450391449]
We propose an autonomous computing model for privacy-critical and time-sensitive health applications.<n>The system supports continuous monitoring, real-time alert notifications, disease detection, and robust data processing and aggregation.<n>A secure access scheme is defined to manage both off-chain and on-chain data sharing and storage.
arXiv Detail & Related papers (2025-05-31T06:58:52Z) - Next Generation Authentication for Data Spaces: An Authentication Flow Based On Grant Negotiation And Authorization Protocol For Verifiable Presentations (GNAP4VP) [0.0]
This paper presents an identity verification protocol tailored for shared data environments within Data Spaces.<n>The proposed solution adheres to the principles of Self-Sovereign Identity (SSI) to facilitate decentralized, user-centric identity management.
arXiv Detail & Related papers (2025-05-30T15:20:39Z) - DP-RTFL: Differentially Private Resilient Temporal Federated Learning for Trustworthy AI in Regulated Industries [0.0]
This paper introduces Differentially Private Resilient Temporal Federated Learning (DP-RTFL)<n>It is designed to ensure training continuity, precise state recovery, and strong data privacy.<n>The framework is particularly suited for critical applications like credit risk assessment using sensitive financial data.
arXiv Detail & Related papers (2025-05-27T16:30:25Z) - Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation [60.81109086640437]
We propose a novel framework called Federated Retrieval-Augmented Generation (FedE4RAG)<n>FedE4RAG facilitates collaborative training of client-side RAG retrieval models.<n>We apply homomorphic encryption within federated learning to safeguard model parameters.
arXiv Detail & Related papers (2025-04-27T04:26:02Z) - MES-RAG: Bringing Multi-modal, Entity-Storage, and Secure Enhancements to RAG [65.0423152595537]
We propose MES-RAG, which enhances entity-specific query handling and provides accurate, secure, and consistent responses.<n>MES-RAG introduces proactive security measures that ensure system integrity by applying protections prior to data access.<n> Experimental results demonstrate that MES-RAG significantly improves both accuracy and recall, highlighting its effectiveness in advancing the security and utility of question-answering.
arXiv Detail & Related papers (2025-03-17T08:09:42Z) - CaPS: Collaborative and Private Synthetic Data Generation from Distributed Sources [5.898893619901382]
We propose a framework for the collaborative and private generation of synthetic data from distributed data holders.
We replace the trusted aggregator with secure multi-party computation protocols and output privacy via differential privacy (DP)
We demonstrate the applicability and scalability of our approach for the state-of-the-art select-measure-generate algorithms MWEM+PGM and AIM.
arXiv Detail & Related papers (2024-02-13T17:26:32Z) - Blockchain-empowered Federated Learning for Healthcare Metaverses:
User-centric Incentive Mechanism with Optimal Data Freshness [66.3982155172418]
We first design a user-centric privacy-preserving framework based on decentralized Federated Learning (FL) for healthcare metaverses.
We then utilize Age of Information (AoI) as an effective data-freshness metric and propose an AoI-based contract theory model under Prospect Theory (PT) to motivate sensing data sharing.
arXiv Detail & Related papers (2023-07-29T12:54:03Z) - Auditing and Generating Synthetic Data with Controllable Trust Trade-offs [54.262044436203965]
We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models.
It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation.
We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases.
arXiv Detail & Related papers (2023-04-21T09:03:18Z) - Decentralized Stochastic Optimization with Inherent Privacy Protection [103.62463469366557]
Decentralized optimization is the basic building block of modern collaborative machine learning, distributed estimation and control, and large-scale sensing.
Since involved data, privacy protection has become an increasingly pressing need in the implementation of decentralized optimization algorithms.
arXiv Detail & Related papers (2022-05-08T14:38:23Z) - Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data.
FL and related techniques are often described as privacy-preserving.
We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.