Distributionally-Robust Machine Learning Using Locally
Differentially-Private Data
- URL: http://arxiv.org/abs/2006.13488v1
- Date: Wed, 24 Jun 2020 05:12:10 GMT
- Title: Distributionally-Robust Machine Learning Using Locally
Differentially-Private Data
- Authors: Farhad Farokhi
- Abstract summary: We consider machine learning, particularly regression, using locally-differentially private datasets.
We show that machine learning with locally-differentially private datasets can be rewritten as a distributionally-robust optimization.
- Score: 14.095523601311374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider machine learning, particularly regression, using
locally-differentially private datasets. The Wasserstein distance is used to
define an ambiguity set centered at the empirical distribution of the dataset
corrupted by local differential privacy noise. The ambiguity set is shown to
contain the probability distribution of unperturbed, clean data. The radius of
the ambiguity set is a function of the privacy budget, spread of the data, and
the size of the problem. Hence, machine learning with locally-differentially
private datasets can be rewritten as a distributionally-robust optimization.
For general distributions, the distributionally-robust optimization problem can
relaxed as a regularized machine learning problem with the Lipschitz constant
of the machine learning model as a regularizer. For linear and logistic
regression, this regularizer is the dual norm of the model parameters. For
Gaussian data, the distributionally-robust optimization problem can be solved
exactly to find an optimal regularizer. This approach results in an entirely
new regularizer for training linear regression models. Training with this novel
regularizer can be posed as a semi-definite program. Finally, the performance
of the proposed distributionally-robust machine learning training is
demonstrated on practical datasets.
Related papers
- Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Differentially Private Post-Processing for Fair Regression [13.855474876965557]
Our algorithm can be applied to post-process any given regressor to improve fairness by remapping its outputs.
We analyze the sample complexity of our algorithm and provide fairness guarantee, revealing a trade-off between the statistical bias and variance induced from the choice of the number of bins in the histogram.
arXiv Detail & Related papers (2024-05-07T06:09:37Z) - Private Gradient Descent for Linear Regression: Tighter Error Bounds and
Instance-Specific Uncertainty Estimation [17.84129947587373]
We provide an improved analysis of standard differentially private gradient descent for linear regression under the squared error loss.
Our analysis leads to new results on the algorithm's accuracy.
arXiv Detail & Related papers (2024-02-21T04:58:41Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Optimal Regularization for a Data Source [8.38093977965175]
It is common to augment criteria that enforce data fidelity with a regularizer that promotes quantity in the solution.
In this paper we seek a systematic understanding of the power and the limitations of convex regularization.
arXiv Detail & Related papers (2022-12-27T20:11:59Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Strongly universally consistent nonparametric regression and
classification with privatised data [2.879036956042183]
We revisit the classical problem of nonparametric regression, but impose local differential privacy constraints.
We design a novel estimator of the regression function, which can be viewed as a privatised version of the well-studied partitioning regression estimator.
arXiv Detail & Related papers (2020-10-31T09:00:43Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.