Related papers: Correlated Differential Privacy: Feature Selection in Machine Learning

Correlated Differential Privacy: Feature Selection in Machine Learning

URL: http://arxiv.org/abs/2010.03094v1
Date: Wed, 7 Oct 2020 00:33:24 GMT
Title: Correlated Differential Privacy: Feature Selection in Machine Learning
Authors: Tao Zhang, Tianqing Zhu, Ping Xiong, Huan Huo, Zahir Tari, Wanlei Zhou
Abstract summary: The proposed scheme involves five steps with the goal of managing the extent of data correlation, preserving the privacy, and supporting accuracy in the prediction results. Experiments show that the proposed scheme can produce better prediction results with machine learning tasks and fewer mean square errors for data queries compared to existing schemes.
Score: 13.477069421691562
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Privacy preserving in machine learning is a crucial issue in industry informatics since data used for training in industries usually contain sensitive information. Existing differentially private machine learning algorithms have not considered the impact of data correlation, which may lead to more privacy leakage than expected in industrial applications. For example, data collected for traffic monitoring may contain some correlated records due to temporal correlation or user correlation. To fill this gap, we propose a correlation reduction scheme with differentially private feature selection considering the issue of privacy loss when data have correlation in machine learning tasks. %The key to the proposed scheme is to describe the data correlation and select features which leads to less data correlation across the whole dataset. The proposed scheme involves five steps with the goal of managing the extent of data correlation, preserving the privacy, and supporting accuracy in the prediction results. In this way, the impact of data correlation is relieved with the proposed feature selection scheme, and moreover, the privacy issue of data correlation in learning is guaranteed. The proposed method can be widely used in machine learning algorithms which provide services in industrial areas. Experiments show that the proposed scheme can produce better prediction results with machine learning tasks and fewer mean square errors for data queries compared to existing schemes.

Related papers

Capturing the Temporal Dependence of Training Data Influence [100.91355498124527]
We formalize the concept of trajectory-specific leave-one-out influence, which quantifies the impact of removing a data point during training. We propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO. As data value embedding captures training data ordering, it offers valuable insights into model training dynamics.
arXiv Detail & Related papers (2024-12-12T18:28:55Z)
Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines. We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z)
Footprints of Data in a Classifier: Understanding the Privacy Risks and Solution Strategies [0.9208007322096533]
Article 17 of the General Data Protection Regulation (Right Erasure) requires data to be permanently removed from a system to prevent potential compromise. One such issue arises from the residual footprints of training data embedded within predictive models. This study examines how two fundamental aspects of classifier systems - training quality and classifier training methodology - contribute to privacy vulnerabilities.
arXiv Detail & Related papers (2024-07-02T13:56:37Z)
Causal Discovery Under Local Privacy [5.7835417207193585]
Local differential privacy is a variant that allows data providers to apply the privatization mechanism themselves on their data individually. We consider various well-known locally differentially private mechanisms and compare the trade-off between the privacy they provide and the accuracy of the causal structure produced by algorithms for causal learning.
arXiv Detail & Related papers (2023-11-07T14:44:27Z)
Differentially Private Linear Regression with Linked Data [3.9325957466009203]
Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees. Recent work focuses on developing differentially private versions of individual statistical and machine learning tasks. We present two differentially private algorithms for linear regression with linked data.
arXiv Detail & Related papers (2023-08-01T21:00:19Z)
Privacy-Preserving Graph Machine Learning from Data to Computation: A Survey [67.7834898542701]
We focus on reviewing privacy-preserving techniques of graph machine learning. We first review methods for generating privacy-preserving graph data. Then we describe methods for transmitting privacy-preserved information.
arXiv Detail & Related papers (2023-07-10T04:30:23Z)
Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving Training Data Release for Machine Learning [3.29354893777827]
We introduce a data release framework, 3A (Approximate, Adapt, Anonymize), to maximize data utility for machine learning. We present experimental evidence showing minimal discrepancy between performance metrics of models trained on real versus privatized datasets.
arXiv Detail & Related papers (2023-07-04T18:37:11Z)
Striving for data-model efficiency: Identifying data externalities on group performance [75.17591306911015]
Building trustworthy, effective, and responsible machine learning systems hinges on understanding how differences in training data and modeling decisions interact to impact predictive performance. We focus on a particular type of data-model inefficiency, in which adding training data from some sources can actually lower performance evaluated on key sub-groups of the population. Our results indicate that data-efficiency is a key component of both accurate and trustworthy machine learning.
arXiv Detail & Related papers (2022-11-11T16:48:27Z)
Privacy-Preserving Machine Learning for Collaborative Data Sharing via Auto-encoder Latent Space Embeddings [57.45332961252628]
Privacy-preserving machine learning in data-sharing processes is an ever-critical task. This paper presents an innovative framework that uses Representation Learning via autoencoders to generate privacy-preserving embedded data.
arXiv Detail & Related papers (2022-11-10T17:36:58Z)
Mixed Differential Privacy in Computer Vision [133.68363478737058]
AdaMix is an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data. A few-shot or even zero-shot learning baseline that ignores private data can outperform fine-tuning on a large private dataset.
arXiv Detail & Related papers (2022-03-22T06:15:43Z)
On Deep Learning with Label Differential Privacy [54.45348348861426]
We study the multi-class classification setting where the labels are considered sensitive and ought to be protected. We propose a new algorithm for training deep neural networks with label differential privacy, and run evaluations on several datasets.
arXiv Detail & Related papers (2021-02-11T15:09:06Z)
Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data. perturbations chosen independently at every agent, resulting in a significant performance loss. We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z)
A Critical Overview of Privacy-Preserving Approaches for Collaborative Forecasting [0.0]
Cooperation between different data owners may lead to an improvement in forecast quality. Due to business competitive factors and personal data protection questions, said data owners might be unwilling to share their data. This paper analyses the state-of-the-art and unveils several shortcomings of existing methods in guaranteeing data privacy.
arXiv Detail & Related papers (2020-04-20T20:21:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.