Easy Differentially Private Linear Regression
- URL: http://arxiv.org/abs/2208.07353v1
- Date: Mon, 15 Aug 2022 17:42:27 GMT
- Title: Easy Differentially Private Linear Regression
- Authors: Kareem Amin, Matthew Joseph, M\'onica Ribero, Sergei Vassilvitskii
- Abstract summary: We study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models.
We find that this algorithm obtains strong empirical performance in the data-rich setting.
- Score: 16.325734286930764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Linear regression is a fundamental tool for statistical analysis. This has
motivated the development of linear regression methods that also satisfy
differential privacy and thus guarantee that the learned model reveals little
about any one data point used to construct it. However, existing differentially
private solutions assume that the end user can easily specify good data bounds
and hyperparameters. Both present significant practical obstacles. In this
paper, we study an algorithm which uses the exponential mechanism to select a
model with high Tukey depth from a collection of non-private regression models.
Given $n$ samples of $d$-dimensional data used to train $m$ models, we
construct an efficient analogue using an approximate Tukey depth that runs in
time $O(d^2n + dm\log(m))$. We find that this algorithm obtains strong
empirical performance in the data-rich setting with no data bounds or
hyperparameter selection required.
Related papers
- Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives.
Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion.
We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z) - An Efficient Data Analysis Method for Big Data using Multiple-Model
Linear Regression [4.085654010023149]
This paper introduces a new data analysis method for big data using a newly defined regression model named multiple model linear regression(MMLR)
The proposed data analysis method is shown to be more efficient and flexible than other regression based methods.
arXiv Detail & Related papers (2023-08-24T10:20:15Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Variational Inference for Bayesian Bridge Regression [0.0]
We study the implementation of Automatic Differentiation Variational inference (ADVI) for Bayesian inference on regression models with bridge penalization.
The bridge approach uses $ell_alpha$ norm, with $alpha in (0, +infty)$ to define a penalization on large values of the regression coefficients.
We illustrate the approach on non-parametric regression models with B-splines, although the method works seamlessly for other choices of basis functions.
arXiv Detail & Related papers (2022-05-19T12:29:09Z) - Efficient and robust high-dimensional sparse logistic regression via
nonlinear primal-dual hybrid gradient algorithms [0.0]
We propose an iterative algorithm that provably computes a solution to a logistic regression problem regularized by an elastic net penalty.
This result improves on the known complexity bound of $O(min(m2n,mn2)log (1/epsilon))$ for first-order optimization methods.
arXiv Detail & Related papers (2021-11-30T14:16:48Z) - Evaluation of Tree Based Regression over Multiple Linear Regression for
Non-normally Distributed Data in Battery Performance [0.5735035463793008]
This study explores the impact of data normality in building machine learning models.
Tree-based regression models and multiple linear regressions models are each built from a highly skewed non-normal dataset.
arXiv Detail & Related papers (2021-11-03T20:28:24Z) - A Hypergradient Approach to Robust Regression without Correspondence [85.49775273716503]
We consider a variant of regression problem, where the correspondence between input and output data is not available.
Most existing methods are only applicable when the sample size is small.
We propose a new computational framework -- ROBOT -- for the shuffled regression problem.
arXiv Detail & Related papers (2020-11-30T21:47:38Z) - AutoSimulate: (Quickly) Learning Synthetic Data Generation [70.82315853981838]
We propose an efficient alternative for optimal synthetic data generation based on a novel differentiable approximation of the objective.
We demonstrate that the proposed method finds the optimal data distribution faster (up to $50times$), with significantly reduced training data generation (up to $30times$) and better accuracy ($+8.7%$) on real-world test datasets than previous methods.
arXiv Detail & Related papers (2020-08-16T11:36:11Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z) - The data-driven physical-based equations discovery using evolutionary
approach [77.34726150561087]
We describe the algorithm for the mathematical equations discovery from the given observations data.
The algorithm combines genetic programming with the sparse regression.
It could be used for governing analytical equation discovery as well as for partial differential equations (PDE) discovery.
arXiv Detail & Related papers (2020-04-03T17:21:57Z) - Learning the Stein Discrepancy for Training and Evaluating Energy-Based
Models without Sampling [30.406623987492726]
We present a new method for evaluating and training unnormalized density models.
We estimate the Stein discrepancy between the data density $p(x)$ and the model density $q(x)$ defined by a vector function of the data.
This yields a novel goodness-of-fit test which outperforms existing methods on high dimensional data.
arXiv Detail & Related papers (2020-02-13T16:39:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.