Related papers: Privacy-Preserving Public Release of Datasets for Support Vector Machine Classification

Privacy-Preserving Public Release of Datasets for Support Vector Machine Classification

URL: http://arxiv.org/abs/1912.12576v1
Date: Sun, 29 Dec 2019 03:32:38 GMT
Title: Privacy-Preserving Public Release of Datasets for Support Vector Machine Classification
Authors: Farhad Farokhi
Abstract summary: We consider the problem of publicly releasing a dataset for support vector machine classification while not infringing on the privacy of data subjects. The dataset is systematically obfuscated using an additive noise for privacy protection. Conditions are established for ensuring that the classifier extracted from the original dataset and the obfuscated one are close to each other.
Score: 14.095523601311374
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of publicly releasing a dataset for support vector machine classification while not infringing on the privacy of data subjects (i.e., individuals whose private information is stored in the dataset). The dataset is systematically obfuscated using an additive noise for privacy protection. Motivated by the Cramer-Rao bound, inverse of the trace of the Fisher information matrix is used as a measure of the privacy. Conditions are established for ensuring that the classifier extracted from the original dataset and the obfuscated one are close to each other (capturing the utility). The optimal noise distribution is determined by maximizing a weighted sum of the measures of privacy and utility. The optimal privacy-preserving noise is proved to achieve local differential privacy. The results are generalized to a broader class of optimization-based supervised machine learning algorithms. Applicability of the methodology is demonstrated on multiple datasets.

Related papers

Improving Noise Efficiency in Privacy-preserving Dataset Distillation [59.57846442477106]
We introduce a novel framework that decouples sampling from optimization for better convergence and improves signal quality.<n>On CIFAR-10, our method achieves a textbf10.0% improvement with 50 images per class and textbf8.3% increase with just textbfone-fifth the distilled set size of previous state-of-the-art methods.
arXiv Detail & Related papers (2025-08-03T13:15:52Z)
Benchmarking Fraud Detectors on Private Graph Data [70.4654745317714]
Currently, many types of fraud are managed in part by automated detection algorithms that operate over graphs.<n>We consider the scenario where a data holder wishes to outsource development of fraud detectors to third parties.<n>Third parties submit their fraud detectors to the data holder, who evaluates these algorithms on a private dataset and then publicly communicates the results.<n>We propose a realistic privacy attack on this system that allows an adversary to de-anonymize individuals' data based only on the evaluation results.
arXiv Detail & Related papers (2025-07-30T03:20:15Z)
Optimal Allocation of Privacy Budget on Hierarchical Data Release [48.96399034594329]
This paper addresses the problem of optimal privacy budget allocation for hierarchical data release.<n>It aims to maximize data utility subject to a total privacy budget while considering the inherent trade-offs between data granularity and privacy loss.
arXiv Detail & Related papers (2025-05-16T05:25:11Z)
Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines. We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z)
RASE: Efficient Privacy-preserving Data Aggregation against Disclosure Attacks for IoTs [2.1765174838950494]
We study the new paradigm for collecting and protecting the data produced by ever-increasing sensor devices. Most previous studies on co-design of data aggregation and privacy preservation assume that a trusted fusion center adheres to privacy regimes. We propose a novel paradigm (called RASE), which can be generalized into a 3-step sequential procedure, noise addition, followed by random permutation, and then parameter estimation.
arXiv Detail & Related papers (2024-05-31T15:21:38Z)
Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset. We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z)
Privacy-Preserving Matrix Factorization for Recommendation Systems using Gaussian Mechanism [2.84279467589473]
We propose a privacy-preserving recommendation system based on the differential privacy framework and matrix factorization. As differential privacy is a powerful and robust mathematical framework for designing privacy-preserving machine learning algorithms, it is possible to prevent adversaries from extracting sensitive user information.
arXiv Detail & Related papers (2023-04-11T13:50:39Z)
Privacy-preserving Non-negative Matrix Factorization with Outliers [2.84279467589473]
We focus on developing a Non-negative matrix factorization algorithm in the privacy-preserving framework. We propose a novel privacy-preserving algorithm for non-negative matrix factorisation capable of operating on private data. We show our proposed framework's performance in six real data sets.
arXiv Detail & Related papers (2022-11-02T19:42:18Z)
Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets. In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z)
Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD. We find that most examples enjoy stronger privacy guarantees than the worst-case bound. This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z)
Mixed Differential Privacy in Computer Vision [133.68363478737058]
AdaMix is an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data. A few-shot or even zero-shot learning baseline that ignores private data can outperform fine-tuning on a large private dataset.
arXiv Detail & Related papers (2022-03-22T06:15:43Z)
Post-processing of Differentially Private Data: A Fairness Perspective [53.29035917495491]
This paper shows that post-processing causes disparate impacts on individuals or groups. It analyzes two critical settings: the release of differentially private datasets and the use of such private datasets for downstream decisions. It proposes a novel post-processing mechanism that is (approximately) optimal under different fairness metrics.
arXiv Detail & Related papers (2022-01-24T02:45:03Z)
Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data. perturbations chosen independently at every agent, resulting in a significant performance loss. We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z)
Bounding, Concentrating, and Truncating: Unifying Privacy Loss Composition for Data Analytics [2.614355818010333]
We provide strong privacy loss bounds when an analyst may select pure DP, bounded range (e.g. exponential mechanisms) or concentrated DP mechanisms in any order. We also provide optimal privacy loss bounds that apply when an analyst can select pure DP and bounded range mechanisms in a batch.
arXiv Detail & Related papers (2020-04-15T17:33:10Z)
Privacy-Preserving Boosting in the Local Setting [17.375582978294105]
In machine learning, boosting is one of the most popular methods that designed to combine multiple base learners to a superior one. In the big data era, the data held by individual and entities, like personal images, browsing history and census information, are more likely to contain sensitive information. Local Differential Privacy is proposed as an effective privacy protection approach, which offers a strong guarantee to the data owners.
arXiv Detail & Related papers (2020-02-06T04:48:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.