Related papers: Easy Differentially Private Linear Regression

Easy Differentially Private Linear Regression

URL: http://arxiv.org/abs/2208.07353v1
Date: Mon, 15 Aug 2022 17:42:27 GMT
Title: Easy Differentially Private Linear Regression
Authors: Kareem Amin, Matthew Joseph, M\'onica Ribero, Sergei Vassilvitskii
Abstract summary: We study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models. We find that this algorithm obtains strong empirical performance in the data-rich setting.
Score: 16.325734286930764
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Linear regression is a fundamental tool for statistical analysis. This has motivated the development of linear regression methods that also satisfy differential privacy and thus guarantee that the learned model reveals little about any one data point used to construct it. However, existing differentially private solutions assume that the end user can easily specify good data bounds and hyperparameters. Both present significant practical obstacles. In this paper, we study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models. Given $n$ samples of $d$-dimensional data used to train $m$ models, we construct an efficient analogue using an approximate Tukey depth that runs in time $O(d^2n + dm\log(m))$. We find that this algorithm obtains strong empirical performance in the data-rich setting with no data bounds or hyperparameter selection required.

Related papers

Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives. Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion. We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z)
An Efficient Data Analysis Method for Big Data using Multiple-Model Linear Regression [4.085654010023149]
This paper introduces a new data analysis method for big data using a newly defined regression model named multiple model linear regression(MMLR) The proposed data analysis method is shown to be more efficient and flexible than other regression based methods.
arXiv Detail & Related papers (2023-08-24T10:20:15Z)
Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z)
Variational Inference for Bayesian Bridge Regression [0.0]
We study the implementation of Automatic Differentiation Variational inference (ADVI) for Bayesian inference on regression models with bridge penalization. The bridge approach uses $ell_alpha$ norm, with $alpha in (0, +infty)$ to define a penalization on large values of the regression coefficients. We illustrate the approach on non-parametric regression models with B-splines, although the method works seamlessly for other choices of basis functions.
arXiv Detail & Related papers (2022-05-19T12:29:09Z)
Efficient and robust high-dimensional sparse logistic regression via nonlinear primal-dual hybrid gradient algorithms [0.0]
We propose an iterative algorithm that provably computes a solution to a logistic regression problem regularized by an elastic net penalty. This result improves on the known complexity bound of $O(min(m2n,mn2)log (1/epsilon))$ for first-order optimization methods.
arXiv Detail & Related papers (2021-11-30T14:16:48Z)
Evaluation of Tree Based Regression over Multiple Linear Regression for Non-normally Distributed Data in Battery Performance [0.5735035463793008]
This study explores the impact of data normality in building machine learning models. Tree-based regression models and multiple linear regressions models are each built from a highly skewed non-normal dataset.
arXiv Detail & Related papers (2021-11-03T20:28:24Z)
A Hypergradient Approach to Robust Regression without Correspondence [85.49775273716503]
We consider a variant of regression problem, where the correspondence between input and output data is not available. Most existing methods are only applicable when the sample size is small. We propose a new computational framework -- ROBOT -- for the shuffled regression problem.
arXiv Detail & Related papers (2020-11-30T21:47:38Z)
AutoSimulate: (Quickly) Learning Synthetic Data Generation [70.82315853981838]
We propose an efficient alternative for optimal synthetic data generation based on a novel differentiable approximation of the objective. We demonstrate that the proposed method finds the optimal data distribution faster (up to $50times$), with significantly reduced training data generation (up to $30times$) and better accuracy ($+8.7%$) on real-world test datasets than previous methods.
arXiv Detail & Related papers (2020-08-16T11:36:11Z)
Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain. We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$. We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
The data-driven physical-based equations discovery using evolutionary approach [77.34726150561087]
We describe the algorithm for the mathematical equations discovery from the given observations data. The algorithm combines genetic programming with the sparse regression. It could be used for governing analytical equation discovery as well as for partial differential equations (PDE) discovery.
arXiv Detail & Related papers (2020-04-03T17:21:57Z)
Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling [30.406623987492726]
We present a new method for evaluating and training unnormalized density models. We estimate the Stein discrepancy between the data density $p(x)$ and the model density $q(x)$ defined by a vector function of the data. This yields a novel goodness-of-fit test which outperforms existing methods on high dimensional data.
arXiv Detail & Related papers (2020-02-13T16:39:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.