Adaptive Data Analysis in a Balanced Adversarial Model
- URL: http://arxiv.org/abs/2305.15452v2
- Date: Fri, 3 Nov 2023 18:30:40 GMT
- Title: Adaptive Data Analysis in a Balanced Adversarial Model
- Authors: Kobbi Nissim, Uri Stemmer, Eliad Tsfadia
- Abstract summary: In adaptive data analysis, a mechanism gets $n$ i.i.d. samples from an unknown distribution $D$, and is required to provide accurate estimations.
We consider more restricted adversaries, called emphbalanced, where each such adversary consists of two separated algorithms.
We show that these stronger hardness assumptions are unavoidable in the sense that any computationally bounded emphbalanced adversary implies the existence of public-key cryptography.
- Score: 26.58630744414181
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In adaptive data analysis, a mechanism gets $n$ i.i.d. samples from an
unknown distribution $D$, and is required to provide accurate estimations to a
sequence of adaptively chosen statistical queries with respect to $D$. Hardt
and Ullman (FOCS 2014) and Steinke and Ullman (COLT 2015) showed that in
general, it is computationally hard to answer more than $\Theta(n^2)$ adaptive
queries, assuming the existence of one-way functions.
However, these negative results strongly rely on an adversarial model that
significantly advantages the adversarial analyst over the mechanism, as the
analyst, who chooses the adaptive queries, also chooses the underlying
distribution $D$. This imbalance raises questions with respect to the
applicability of the obtained hardness results -- an analyst who has complete
knowledge of the underlying distribution $D$ would have little need, if at all,
to issue statistical queries to a mechanism which only holds a finite number of
samples from $D$.
We consider more restricted adversaries, called \emph{balanced}, where each
such adversary consists of two separated algorithms: The \emph{sampler} who is
the entity that chooses the distribution and provides the samples to the
mechanism, and the \emph{analyst} who chooses the adaptive queries, but has no
prior knowledge of the underlying distribution (and hence has no a priori
advantage with respect to the mechanism). We improve the quality of previous
lower bounds by revisiting them using an efficient \emph{balanced} adversary,
under standard public-key cryptography assumptions. We show that these stronger
hardness assumptions are unavoidable in the sense that any computationally
bounded \emph{balanced} adversary that has the structure of all known attacks,
implies the existence of public-key cryptography.
Related papers
- How to Verify Any (Reasonable) Distribution Property: Computationally Sound Argument Systems for Distributions [0.0]
We study proof-systems that allow a probabilistic verifier to ascertain that the results of an analysis are approximately correct.
We focus on distribution testing problems: verifying that an unknown distribution has a claimed property.
Our main contribution is a interactive protocol between a verifier and an untrusted prover which can be used to verify any property.
arXiv Detail & Related papers (2024-09-10T15:37:23Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Score-based generative models are provably robust: an uncertainty quantification perspective [4.396860522241307]
We show that score-based generative models (SGMs) are provably robust to the multiple sources of error in practical implementation.
Our primary tool is the Wasserstein uncertainty propagation (WUP) theorem.
We show how errors due to (a) finite sample approximation, (b) early stopping, (c) score-matching objective choice, (d) score function parametrization, and (e) reference distribution choice, impact the quality of the generative model.
arXiv Detail & Related papers (2024-05-24T17:50:17Z) - Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation.
We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z) - Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint.
We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b.
We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z) - Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z) - CARD: Classification and Regression Diffusion Models [51.0421331214229]
We introduce classification and regression diffusion (CARD) models, which combine a conditional generative model and a pre-trained conditional mean estimator.
We demonstrate the outstanding ability of CARD in conditional distribution prediction with both toy examples and real-world datasets.
arXiv Detail & Related papers (2022-06-15T03:30:38Z) - Robust Linear Regression for General Feature Distribution [21.0709900887309]
We investigate robust linear regression where data may be contaminated by an oblivious adversary.
We do not necessarily assume that the features are centered.
If the features are centered we can obtain a standard convergence rate.
arXiv Detail & Related papers (2022-02-04T11:22:13Z) - Robust Linear Predictions: Analyses of Uniform Concentration, Fast Rates
and Model Misspecification [16.0817847880416]
We offer a unified framework that includes a broad variety of linear prediction problems on a Hilbert space.
We show that for misspecification level $epsilon$, these estimators achieve an error rate of $O(maxleft|mathcalO|1/2n-1/2, |mathcalI|1/2n-1 right+epsilon)$, matching the best-known rates in literature.
arXiv Detail & Related papers (2022-01-06T08:51:08Z) - Towards Assessment of Randomized Smoothing Mechanisms for Certifying
Adversarial Robustness [50.96431444396752]
We argue that the main difficulty is how to assess the appropriateness of each randomized mechanism.
We first conclude that the Gaussian mechanism is indeed an appropriate option to certify $ell$-norm.
Surprisingly, we show that the Gaussian mechanism is also an appropriate option for certifying $ell_infty$-norm, instead of the Exponential mechanism.
arXiv Detail & Related papers (2020-05-15T03:54:53Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.