Related papers: Distributionally-Robust Machine Learning Using Locally Differentially-Private Data

Distributionally-Robust Machine Learning Using Locally Differentially-Private Data

URL: http://arxiv.org/abs/2006.13488v1
Date: Wed, 24 Jun 2020 05:12:10 GMT
Title: Distributionally-Robust Machine Learning Using Locally Differentially-Private Data
Authors: Farhad Farokhi
Abstract summary: We consider machine learning, particularly regression, using locally-differentially private datasets. We show that machine learning with locally-differentially private datasets can be rewritten as a distributionally-robust optimization.
Score: 14.095523601311374
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider machine learning, particularly regression, using locally-differentially private datasets. The Wasserstein distance is used to define an ambiguity set centered at the empirical distribution of the dataset corrupted by local differential privacy noise. The ambiguity set is shown to contain the probability distribution of unperturbed, clean data. The radius of the ambiguity set is a function of the privacy budget, spread of the data, and the size of the problem. Hence, machine learning with locally-differentially private datasets can be rewritten as a distributionally-robust optimization. For general distributions, the distributionally-robust optimization problem can relaxed as a regularized machine learning problem with the Lipschitz constant of the machine learning model as a regularizer. For linear and logistic regression, this regularizer is the dual norm of the model parameters. For Gaussian data, the distributionally-robust optimization problem can be solved exactly to find an optimal regularizer. This approach results in an entirely new regularizer for training linear regression models. Training with this novel regularizer can be posed as a semi-definite program. Finally, the performance of the proposed distributionally-robust machine learning training is demonstrated on practical datasets.

Related papers

Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance. Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z)
Differentially Private Post-Processing for Fair Regression [13.855474876965557]
Our algorithm can be applied to post-process any given regressor to improve fairness by remapping its outputs. We analyze the sample complexity of our algorithm and provide fairness guarantee, revealing a trade-off between the statistical bias and variance induced from the choice of the number of bins in the histogram.
arXiv Detail & Related papers (2024-05-07T06:09:37Z)
Private Gradient Descent for Linear Regression: Tighter Error Bounds and Instance-Specific Uncertainty Estimation [17.84129947587373]
We provide an improved analysis of standard differentially private gradient descent for linear regression under the squared error loss. Our analysis leads to new results on the algorithm's accuracy.
arXiv Detail & Related papers (2024-02-21T04:58:41Z)
Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data. One key challenge in federated learning is to handle non-identically distributed data across the clients. We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z)
Optimal Regularization for a Data Source [8.38093977965175]
It is common to augment criteria that enforce data fidelity with a regularizer that promotes quantity in the solution. In this paper we seek a systematic understanding of the power and the limitations of convex regularization.
arXiv Detail & Related papers (2022-12-27T20:11:59Z)
Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z)
Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties. Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z)
Strongly universally consistent nonparametric regression and classification with privatised data [2.879036956042183]
We revisit the classical problem of nonparametric regression, but impose local differential privacy constraints. We design a novel estimator of the regression function, which can be viewed as a privatised version of the well-studied partitioning regression estimator.
arXiv Detail & Related papers (2020-10-31T09:00:43Z)
Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines. Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.