Threshold KNN-Shapley: A Linear-Time and Privacy-Friendly Approach to
Data Valuation
- URL: http://arxiv.org/abs/2308.15709v2
- Date: Sun, 26 Nov 2023 04:32:25 GMT
- Title: Threshold KNN-Shapley: A Linear-Time and Privacy-Friendly Approach to
Data Valuation
- Authors: Jiachen T. Wang, Yuqing Zhu, Yu-Xiang Wang, Ruoxi Jia, Prateek Mittal
- Abstract summary: Data valuation aims to quantify the usefulness of individual data sources in training machine learning (ML) models.
However, data valuation faces significant yet frequently overlooked privacy challenges despite its importance.
This paper studies these challenges with a focus on KNN-Shapley, one of the most practical data valuation methods nowadays.
- Score: 57.36638157108914
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data valuation aims to quantify the usefulness of individual data sources in
training machine learning (ML) models, and is a critical aspect of data-centric
ML research. However, data valuation faces significant yet frequently
overlooked privacy challenges despite its importance. This paper studies these
challenges with a focus on KNN-Shapley, one of the most practical data
valuation methods nowadays. We first emphasize the inherent privacy risks of
KNN-Shapley, and demonstrate the significant technical difficulties in adapting
KNN-Shapley to accommodate differential privacy (DP). To overcome these
challenges, we introduce TKNN-Shapley, a refined variant of KNN-Shapley that is
privacy-friendly, allowing for straightforward modifications to incorporate DP
guarantee (DP-TKNN-Shapley). We show that DP-TKNN-Shapley has several
advantages and offers a superior privacy-utility tradeoff compared to naively
privatized KNN-Shapley in discerning data quality. Moreover, even non-private
TKNN-Shapley achieves comparable performance as KNN-Shapley. Overall, our
findings suggest that TKNN-Shapley is a promising alternative to KNN-Shapley,
particularly for real-world applications involving sensitive data.
Related papers
- On the Privacy-Preserving Properties of Spiking Neural Networks with Unique Surrogate Gradients and Quantization Levels [0.0]
Membership attacks (MIAs) exploit model responses to infer whether specific data points were used during training.
Prior research suggests that spiking neural networks (SNNs) exhibit greater resilience to MIAs than artificial neural networks (ANNs)
This resilience stems from their non-differentiable activations and inherentaccuracy, which obscure the correlation between model responses and individual training samples.
arXiv Detail & Related papers (2025-02-25T20:14:14Z) - Assessing the Impact of Image Dataset Features on Privacy-Preserving Machine Learning [1.3604778572442302]
This study identifies image dataset characteristics that affect the utility and vulnerability of private and non-private Convolutional Neural Network (CNN) models.
We find that imbalanced datasets increase vulnerability in minority classes, but DP mitigates this issue.
arXiv Detail & Related papers (2024-09-02T15:30:27Z) - Harnessing Neuron Stability to Improve DNN Verification [42.65507402735545]
We present VeriStable, a novel extension of recently proposed DPLL-based constraint DNN verification approach.
We evaluate the effectiveness of VeriStable across a range of challenging benchmarks including fully-connected feed networks (FNNs), convolutional neural networks (CNNs) and residual networks (ResNets)
Preliminary results show that VeriStable is competitive and outperforms state-of-the-art verification tools, including $alpha$-$beta$-CROWN and MN-BaB, the first and second performers of the VNN-COMP, respectively.
arXiv Detail & Related papers (2024-01-19T23:48:04Z) - A Survey on Privacy in Graph Neural Networks: Attacks, Preservation, and
Applications [76.88662943995641]
Graph Neural Networks (GNNs) have gained significant attention owing to their ability to handle graph-structured data.
To address this issue, researchers have started to develop privacy-preserving GNNs.
Despite this progress, there is a lack of a comprehensive overview of the attacks and the techniques for preserving privacy in the graph domain.
arXiv Detail & Related papers (2023-08-31T00:31:08Z) - Unraveling Privacy Risks of Individual Fairness in Graph Neural Networks [66.0143583366533]
Graph neural networks (GNNs) have gained significant attraction due to their expansive real-world applications.
To build trustworthy GNNs, two aspects - fairness and privacy - have emerged as critical considerations.
Previous studies have separately examined the fairness and privacy aspects of GNNs, revealing their trade-off with GNN performance.
Yet, the interplay between these two aspects remains unexplored.
arXiv Detail & Related papers (2023-01-30T14:52:23Z) - A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy,
Robustness, Fairness, and Explainability [59.80140875337769]
Graph Neural Networks (GNNs) have made rapid developments in the recent years.
GNNs can leak private information, are vulnerable to adversarial attacks, can inherit and magnify societal bias from training data.
This paper gives a comprehensive survey of GNNs in the computational aspects of privacy, robustness, fairness, and explainability.
arXiv Detail & Related papers (2022-04-18T21:41:07Z) - Investigating Trade-offs in Utility, Fairness and Differential Privacy
in Neural Networks [7.6146285961466]
Machine learning algorithms must be fair and protect the privacy of those whose data are being used.
implementing privacy and fairness constraints might come at the cost of utility.
This paper investigates the privacy-utility-fairness trade-off in neural networks.
arXiv Detail & Related papers (2021-02-11T12:33:19Z) - Towards Scalable and Privacy-Preserving Deep Neural Network via
Algorithmic-Cryptographic Co-design [28.789702559193675]
We propose SPNN - a Scalable and Privacy-preserving deep Neural Network learning framework.
From cryptographic perspective, we propose using two types of cryptographic techniques, i.e., secret sharing and homomorphic encryption.
Experimental results conducted on real-world datasets demonstrate the superiority of SPNN.
arXiv Detail & Related papers (2020-12-17T02:26:16Z) - Robustness Threats of Differential Privacy [70.818129585404]
We experimentally demonstrate that networks, trained with differential privacy, in some settings might be even more vulnerable in comparison to non-private versions.
We study how the main ingredients of differentially private neural networks training, such as gradient clipping and noise addition, affect the robustness of the model.
arXiv Detail & Related papers (2020-12-14T18:59:24Z) - Industrial Scale Privacy Preserving Deep Neural Network [23.690146141150407]
We propose an industrial scale privacy preserving neural network learning paradigm, which is secure against semi-honest adversaries.
We conduct experiments on real-world fraud detection dataset and financial distress prediction dataset.
arXiv Detail & Related papers (2020-03-11T10:15:37Z) - CryptoSPN: Privacy-preserving Sum-Product Network Inference [84.88362774693914]
We present a framework for privacy-preserving inference of sum-product networks (SPNs)
CryptoSPN achieves highly efficient and accurate inference in the order of seconds for medium-sized SPNs.
arXiv Detail & Related papers (2020-02-03T14:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.