FHAIM: Fully Homomorphic AIM For Private Synthetic Data Generation
- URL: http://arxiv.org/abs/2602.05838v2
- Date: Mon, 09 Feb 2026 19:11:04 GMT
- Title: FHAIM: Fully Homomorphic AIM For Private Synthetic Data Generation
- Authors: Mayank Kumar, Qian Lou, Paulo Barreto, Martine De Cock, Sikha Pentyala,
- Abstract summary: FHAIM is the first fully homomorphic encryption framework for training a marginal-based synthetic data generator on encrypted data.<n>Our empirical analysis show that FHAIM preserves the performance of AIM while maintaining feasible runtimes.
- Score: 14.382861942609189
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data is the lifeblood of AI, yet much of the most valuable data remains locked in silos due to privacy and regulations. As a result, AI remains heavily underutilized in many of the most important domains, including healthcare, education, and finance. Synthetic data generation (SDG), i.e. the generation of artificial data with a synthesizer trained on real data, offers an appealing solution to make data available while mitigating privacy concerns, however existing SDG-as-a-service workflow require data holders to trust providers with access to private data. We propose FHAIM, the first fully homomorphic encryption (FHE) framework for training a marginal-based synthetic data generator on encrypted tabular data. FHAIM adapts the widely used AIM algorithm to the FHE setting using novel FHE protocols, ensuring that the private data remains encrypted throughout and is released only with differential privacy guarantees. Our empirical analysis show that FHAIM preserves the performance of AIM while maintaining feasible runtimes.
Related papers
- How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy [52.00934156883483]
Differential Privacy (DP) is a framework for reasoning about and limiting information leakage.<n>Differentially Private Synthetic data refers to synthetic data that preserves the overall trends of source data.
arXiv Detail & Related papers (2025-12-02T21:14:39Z) - Synthetic Data Generation and Differential Privacy using Tensor Networks' Matrix Product States (MPS) [33.032422801043495]
We propose a method for generating privacy-preserving high-quality synthetic data using Matrix Product States (MPS)<n>We benchmark the MPS-based generative model against state-of-the-art models such as CTGAN, VAE, and PrivBayes.<n>Our results show that MPS outperforms classical models, particularly under strict privacy constraints.
arXiv Detail & Related papers (2025-08-08T12:14:57Z) - Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation [60.81109086640437]
We propose a novel framework called Federated Retrieval-Augmented Generation (FedE4RAG)<n>FedE4RAG facilitates collaborative training of client-side RAG retrieval models.<n>We apply homomorphic encryption within federated learning to safeguard model parameters.
arXiv Detail & Related papers (2025-04-27T04:26:02Z) - Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data [51.41288763521186]
Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources.<n>RAG systems may face severe privacy risks when retrieving private data.<n>We propose using synthetic data as a privacy-preserving alternative for the retrieval data.
arXiv Detail & Related papers (2024-06-20T22:53:09Z) - Generative AI for Secure and Privacy-Preserving Mobile Crowdsensing [74.58071278710896]
generative AI has attracted much attention from both academic and industrial fields.
Secure and privacy-preserving mobile crowdsensing (SPPMCS) has been widely applied in data collection/ acquirement.
arXiv Detail & Related papers (2024-05-17T04:00:58Z) - FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - Decentralised, Scalable and Privacy-Preserving Synthetic Data Generation [8.982917734231165]
We build a novel system that allows the contributors of real data to autonomously participate in differentially private synthetic data generation.
Our solution is based on three building blocks namely: Solid (Social Linked Data), MPC (Secure Multi-Party Computation) and Trusted Execution Environments (TEEs)
We show how these three technologies can be effectively used to address various challenges in responsible and trustworthy synthetic data generation.
arXiv Detail & Related papers (2023-10-30T22:27:32Z) - FLAIM: AIM-based Synthetic Data Generation in the Federated Setting [18.38046354606749]
DistAIM and FLAIM are proposed to produce synthetic data that mirrors the statistical properties of private data.
We show that naively federating AIM can lead to substantial degradation in utility under the presence of heterogeneity.
arXiv Detail & Related papers (2023-10-05T10:34:47Z) - Secure Multiparty Computation for Synthetic Data Generation from
Distributed Data [7.370727048591523]
Legal and ethical restrictions on accessing relevant data inhibit data science research in critical domains such as health, finance, and education.
Existing approaches assume that the data holders supply their raw data to a trusted curator, who uses it as fuel for synthetic data generation.
We propose the first solution in which data holders only share encrypted data for differentially private synthetic data generation.
arXiv Detail & Related papers (2022-10-13T20:09:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.