Related papers: SAFES: Sequential Privacy and Fairness Enhancing Data Synthesis for Responsible AI

SAFES: Sequential Privacy and Fairness Enhancing Data Synthesis for Responsible AI

URL: http://arxiv.org/abs/2411.09178v2
Date: Sat, 16 Nov 2024 03:13:23 GMT
Title: SAFES: Sequential Privacy and Fairness Enhancing Data Synthesis for Responsible AI
Authors: Spencer Giddens, Fang Liu,
Abstract summary: We introduce SAFES, a Sequential PrivAcy and Fairness Enhancing data Synthesis procedure. For reasonable privacy loss, SAFES-generated synthetic data achieve significantly improved fairness metrics with relatively low utility loss.
Score: 3.0445044300235535
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As data-driven and AI-based decision making gains widespread adoption in most disciplines, it is crucial that both data privacy and decision fairness are appropriately addressed. While differential privacy (DP) provides a robust framework for guaranteeing privacy and several widely accepted methods have been proposed for improving fairness, the vast majority of existing literature treats the two concerns independently. For methods that do consider privacy and fairness simultaneously, they often only apply to a specific machine learning task, limiting their generalizability. In response, we introduce SAFES, a Sequential PrivAcy and Fairness Enhancing data Synthesis procedure that sequentially combines DP data synthesis with a fairness-aware data transformation. SAFES allows full control over the privacy-fairness-utility trade-off via tunable privacy and fairness parameters. We illustrate SAFES by combining AIM, a graphical model-based DP data synthesizer, with a popular fairness-aware data pre-processing transformation. Empirical evaluations on the Adult and COMPAS datasets demonstrate that for reasonable privacy loss, SAFES-generated synthetic data achieve significantly improved fairness metrics with relatively low utility loss.

Related papers

Improving Noise Efficiency in Privacy-preserving Dataset Distillation [59.57846442477106]
We introduce a novel framework that decouples sampling from optimization for better convergence and improves signal quality.<n>On CIFAR-10, our method achieves a textbf10.0% improvement with 50 images per class and textbf8.3% increase with just textbfone-fifth the distilled set size of previous state-of-the-art methods.
arXiv Detail & Related papers (2025-08-03T13:15:52Z)
Aim High, Stay Private: Differentially Private Synthetic Data Enables Public Release of Behavioral Health Information with High Utility [2.1715431485081593]
Differential Privacy (DP) provides formal guarantees against re-identification risks.<n>We generate DP synthetic data for Phase 1 of the Lived Experiences Measured Using Rings Study (LEMURS)<n>We evaluate the utility of the synthetic data using a framework informed by actual uses of the LEMURS dataset.
arXiv Detail & Related papers (2025-06-30T15:58:34Z)
Quantitative Auditing of AI Fairness with Differentially Private Synthetic Data [0.30693357740321775]
Fairness auditing of AI systems can identify and quantify biases. Traditional auditing using real-world data raises security and privacy concerns. We propose a framework that leverages differentially private synthetic data to audit the fairness of AI systems.
arXiv Detail & Related papers (2025-04-30T13:36:27Z)
Empirical Analysis of Privacy-Fairness-Accuracy Trade-offs in Federated Learning: A Step Towards Responsible AI [6.671649946926508]
Federated Learning (FL) enables machine learning while preserving data privacy but struggles to balance privacy preservation (PP) and fairness. DP enhances privacy but can disproportionately impact underrepresented groups, while HE and SMC fairness concerns at the cost of computational overhead. Our findings highlight context-dependent trade-offs and offer guidelines for designing FL systems that uphold responsible AI principles, ensuring fairness, privacy, and equitable real-world applications.
arXiv Detail & Related papers (2025-03-20T15:31:01Z)
Differentially Private Federated Learning of Diffusion Models for Synthetic Tabular Data Generation [5.182014186927255]
We introduce DP-Fed-FinDiff framework, a novel integration of Differential Privacy, Federated Learning and Denoising Diffusion Probabilistic Models. We demonstrate the effectiveness of DP-Fed-FinDiff on multiple real-world financial datasets. The results affirm the potential of DP-Fed-FinDiff to enable secure data sharing and robust analytics in highly regulated domains.
arXiv Detail & Related papers (2024-12-20T17:30:58Z)
Second FRCSyn-onGoing: Winning Solutions and Post-Challenge Analysis to Improve Face Recognition with Synthetic Data [104.30479583607918]
2nd FRCSyn-onGoing challenge is based on the 2nd Face Recognition Challenge in the Era of Synthetic Data (FRCSyn), originally launched at CVPR 2024. We focus on exploring the use of synthetic data both individually and in combination with real data to solve current challenges in face recognition.
arXiv Detail & Related papers (2024-12-02T11:12:01Z)
DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized Mixing [0.8739101659113155]
We introduce an effective data publishing algorithm emphDP-CDA. Our proposed algorithm generates synthetic datasets by randomly mixing data in a class-specific manner, and inducing carefully-tuned randomness to ensure privacy guarantees. Our results indicate that synthetic datasets produced using the DP-CDA can achieve superior utility compared to those generated by traditional data publishing algorithms, even when subject to the same privacy requirements.
arXiv Detail & Related papers (2024-11-25T06:14:06Z)
Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data [51.41288763521186]
Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources. RAG systems may face severe privacy risks when retrieving private data. We propose using synthetic data as a privacy-preserving alternative for the retrieval data.
arXiv Detail & Related papers (2024-06-20T22:53:09Z)
Differentially Private Fine-Tuning of Diffusion Models [22.454127503937883]
The integration of Differential Privacy with diffusion models (DMs) presents a promising yet challenging frontier. Recent developments in this field have highlighted the potential for generating high-quality synthetic data by pre-training on public data. We propose a strategy optimized for private diffusion models, which minimizes the number of trainable parameters to enhance the privacy-utility trade-off.
arXiv Detail & Related papers (2024-06-03T14:18:04Z)
FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners. FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks. We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z)
FedFDP: Fairness-Aware Federated Learning with Differential Privacy [21.55903748640851]
Federated learning (FL) is a new machine learning paradigm to overcome the challenge of data silos. We first propose a fairness-aware federated learning algorithm, termed FedFair. We then introduce differential privacy protection to form the FedFDP algorithm to address the trade-offs among fairness, privacy protection, and model performance.
arXiv Detail & Related papers (2024-02-25T08:35:21Z)
TernaryVote: Differentially Private, Communication Efficient, and Byzantine Resilient Distributed Optimization on Heterogeneous Data [50.797729676285876]
We propose TernaryVote, which combines a ternary compressor and the majority vote mechanism to realize differential privacy, gradient compression, and Byzantine resilience simultaneously. We theoretically quantify the privacy guarantee through the lens of the emerging f-differential privacy (DP) and the Byzantine resilience of the proposed algorithm.
arXiv Detail & Related papers (2024-02-16T16:41:14Z)
Auditing and Generating Synthetic Data with Controllable Trust Trade-offs [54.262044436203965]
We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models. It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation. We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases.
arXiv Detail & Related papers (2023-04-21T09:03:18Z)
Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge. Existing private generative models are struggling with the utility of synthetic samples. We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z)
Causally Constrained Data Synthesis for Private Data Release [36.80484740314504]
Using synthetic data which reflects certain statistical properties of the original data preserves the privacy of the original data. Prior works utilize differentially private data release mechanisms to provide formal privacy guarantees. We propose incorporating causal information into the training process to favorably modify the aforementioned trade-off.
arXiv Detail & Related papers (2021-05-27T13:46:57Z)
Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users. An adversary may still be able to infer the private training data by attacking the released model. Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
Really Useful Synthetic Data -- A Framework to Evaluate the Quality of Differentially Private Synthetic Data [2.538209532048867]
Recent advances in generating synthetic data that allow to add principled ways of protecting privacy are a crucial step in sharing statistical information in a privacy preserving way. To further optimise the inherent trade-off between data privacy and data quality, it is necessary to think closely about the latter. We develop a framework to evaluate the quality of differentially private synthetic data from an applied researcher's perspective.
arXiv Detail & Related papers (2020-04-16T16:24:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.