Characterizing the Distinguishability of Product Distributions through Multicalibration
- URL: http://arxiv.org/abs/2412.03562v3
- Date: Sun, 02 Mar 2025 01:35:01 GMT
- Title: Characterizing the Distinguishability of Product Distributions through Multicalibration
- Authors: Cassandra Marcussen, Aaron Putterman, Salil Vadhan,
- Abstract summary: We prove a new, tight characterization of the number of samples $k$ needed to efficiently distinguish $X_0otimes k$ and $X_1otimes k$.<n>Our framework can be used to re-derive a result of Halevi and Rabin (TCC 2008) and Geier (TCC 2022)
- Score: 9.695176684285832
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Given a sequence of samples $x_1, \dots , x_k$ promised to be drawn from one of two distributions $X_0, X_1$, a well-studied problem in statistics is to decide $\textit{which}$ distribution the samples are from. Information theoretically, the maximum advantage in distinguishing the two distributions given $k$ samples is captured by the total variation distance between $X_0^{\otimes k}$ and $X_1^{\otimes k}$. However, when we restrict our attention to $\textit{efficient distinguishers}$ (i.e., small circuits) of these two distributions, exactly characterizing the ability to distinguish $X_0^{\otimes k}$ and $X_1^{\otimes k}$ is more involved and less understood. In this work, we give a general way to reduce bounds on the computational indistinguishability of $X_0$ and $X_1$ to bounds on the $\textit{information-theoretic}$ indistinguishability of some specific, related variables $\widetilde{X}_0$ and $\widetilde{X}_1$. As a consequence, we prove a new, tight characterization of the number of samples $k$ needed to efficiently distinguish $X_0^{\otimes k}$ and $X_1^{\otimes k}$ with constant advantage as \[ k = \Theta\left(d_H^{-2}\left(\widetilde{X}_0, \widetilde{X}_1\right)\right), \] which is the inverse of the squared Hellinger distance $d_H$ between two distributions $\widetilde{X}_0$ and $\widetilde{X}_1$ that are computationally indistinguishable from $X_0$ and $X_1$. Likewise, our framework can be used to re-derive a result of Halevi and Rabin (TCC 2008) and Geier (TCC 2022), proving nearly-tight bounds on how computational indistinguishability scales with the number of samples for arbitrary product distributions.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Outlier Robust Multivariate Polynomial Regression [27.03423421704806]
We are given a set of random samples $(mathbfx_i,y_i) in [-1,1]n times mathbbR$ that are noisy versions of $(mathbfx_i,p(mathbfx_i)$.
The goal is to output a $hatp$, within an $ell_in$-distance of at most $O(sigma)$ from $p$.
arXiv Detail & Related papers (2024-03-14T15:04:45Z) - Identification of Mixtures of Discrete Product Distributions in
Near-Optimal Sample and Time Complexity [6.812247730094931]
We show, for any $ngeq 2k-1$, how to achieve sample complexity and run-time complexity $(1/zeta)O(k)$.
We also extend the known lower bound of $eOmega(k)$ to match our upper bound across a broad range of $zeta$.
arXiv Detail & Related papers (2023-09-25T09:50:15Z) - Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond [89.72693227960274]
This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions.
To reduce the number of samples in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts and the other executes an online algorithm for non-oblivious multi-armed bandits.
In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of distributions.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - Phase Transitions in the Detection of Correlated Databases [12.010807505655238]
We study the problem of detecting the correlation between two Gaussian databases $mathsfXinmathbbRntimes d$ and $mathsfYntimes d$, each composed of $n$ users with $d$ features.
This problem is relevant in the analysis of social media, computational biology, etc.
arXiv Detail & Related papers (2023-02-07T10:39:44Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - TURF: A Two-factor, Universal, Robust, Fast Distribution Learning
Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$.
We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z) - Data fission: splitting a single data point [27.500860533521713]
In this paper, we offer a more general methodology for achieving such a split in finite samples.
We call our method data fission, as an alternative to data splitting, data carving and p-value masking.
We exemplify the method on a few applications, such as post-selection inference for trend filtering and other regression problems.
arXiv Detail & Related papers (2021-12-21T10:27:04Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - The Sparse Hausdorff Moment Problem, with Application to Topic Models [5.151973524974052]
We give an algorithm for identifying a $k$-mixture using samples of $m=2k$ iid binary random variables.
It suffices to know the moments to additive accuracy $w_mincdotzetaO(k)$.
arXiv Detail & Related papers (2020-07-16T04:23:57Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z) - Sample Amplification: Increasing Dataset Size even when Learning is Impossible [15.864702679819544]
Given data drawn from an unknown distribution, $D$, to what extent is it possible to amplify'' this dataset and output an even larger set of samples that appear to have been drawn from $D$?
We formalize this question as follows: an $left(n, n + Theta(fracnsqrtk)right)$ amplifier exists, even though learning the distribution to small constant total variation distance requires $Theta(d)$ samples.
arXiv Detail & Related papers (2019-04-26T21:42:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.