Differential privacy and robust statistics in high dimensions
- URL: http://arxiv.org/abs/2111.06578v1
- Date: Fri, 12 Nov 2021 06:36:40 GMT
- Title: Differential privacy and robust statistics in high dimensions
- Authors: Xiyang Liu, Weihao Kong, Sewoong Oh
- Abstract summary: High-dimensional Propose-Test-Release (HPTR) builds upon three crucial components: the exponential mechanism, robust statistics, and the Propose-Test-Release mechanism.
We show that HPTR nearly achieves the optimal sample complexity under several scenarios studied in the literature.
- Score: 49.50869296871643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a universal framework for characterizing the statistical
efficiency of a statistical estimation problem with differential privacy
guarantees. Our framework, which we call High-dimensional Propose-Test-Release
(HPTR), builds upon three crucial components: the exponential mechanism, robust
statistics, and the Propose-Test-Release mechanism. Gluing all these together
is the concept of resilience, which is central to robust statistical
estimation. Resilience guides the design of the algorithm, the sensitivity
analysis, and the success probability analysis of the test step in
Propose-Test-Release. The key insight is that if we design an exponential
mechanism that accesses the data only via one-dimensional robust statistics,
then the resulting local sensitivity can be dramatically reduced. Using
resilience, we can provide tight local sensitivity bounds. These tight bounds
readily translate into near-optimal utility guarantees in several cases. We
give a general recipe for applying HPTR to a given instance of a statistical
estimation problem and demonstrate it on canonical problems of mean estimation,
linear regression, covariance estimation, and principal component analysis. We
introduce a general utility analysis technique that proves that HPTR nearly
achieves the optimal sample complexity under several scenarios studied in the
literature.
Related papers
- Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose.
In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.
We propose a method called Stratified Prediction-Powered Inference (StratPPI)
We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z) - Uncertainty quantification in metric spaces [3.637162892228131]
This paper introduces a novel uncertainty quantification framework for regression models where the response takes values in a separable metric space.
The proposed algorithms can efficiently handle large datasets and are agnostic to the predictive base model used.
arXiv Detail & Related papers (2024-05-08T15:06:02Z) - Gaining Outlier Resistance with Progressive Quantiles: Fast Algorithms
and Theoretical Studies [1.6457778420360534]
A framework of outlier-resistant estimation is introduced to robustify arbitrarily loss function.
A new technique is proposed to alleviate the requirement on starting point such that on regular datasets the number of data reestimations can be substantially reduced.
The obtained estimators, though not necessarily globally or even globally, enjoymax optimality in both low dimensions.
arXiv Detail & Related papers (2021-12-15T20:35:21Z) - Robust High-Dimensional Regression with Coefficient Thresholding and its
Application to Imaging Data Analysis [7.640041402805495]
It is of importance to develop statistical techniques to analyze high-dimensional data in the presence of both complex dependence and possible outliers in real-world imaging data.
arXiv Detail & Related papers (2021-09-30T05:29:54Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Distributionally Robust Learning [11.916893752969429]
This book develops a comprehensive statistical learning framework that is robust to (distributional) perturbations in the data.
A tractable DRO relaxation for each problem is being derived, establishing a connection between bounds and regularization.
Beyond theory, we include numerical experiments and case studies using synthetic and real data.
arXiv Detail & Related papers (2021-08-20T04:14:18Z) - SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics.
We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations.
We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Convolutional Neural Networks as Summary Statistics for Approximate
Bayesian Computation [0.0]
This paper proposes a convolutional neural network architecture for automatically learning informative summary statistics of temporal responses.
We show that the proposed network can effectively circumvent the statistics selection problem of the preprocessing step for ABC inference.
arXiv Detail & Related papers (2020-01-31T10:46:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.