Near-Optimal Model Discrimination with Non-Disclosure
- URL: http://arxiv.org/abs/2012.02901v2
- Date: Sun, 13 Dec 2020 04:56:43 GMT
- Title: Near-Optimal Model Discrimination with Non-Disclosure
- Authors: Dmitrii M. Ostrovskii, Mohamed Ndaoud, Adel Javanmard, Meisam
Razaviyayn
- Abstract summary: We first consider the case of a well-specified linear model with squared loss.
We derive sample complexity of a similar form, even under misspecification.
- Score: 19.88145627448243
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Let $\theta_0,\theta_1 \in \mathbb{R}^d$ be the population risk minimizers
associated to some loss $\ell: \mathbb{R}^d \times \mathcal{Z} \to \mathbb{R}$
and two distributions $\mathbb{P}_0,\mathbb{P}_1$ on $\mathcal{Z}$. We pose the
following question: Given i.i.d. samples from $\mathbb{P}_0$ and
$\mathbb{P}_1$, what sample sizes are sufficient and necessary to distinguish
between the two hypotheses $\theta^* = \theta_0$ and $\theta^* = \theta_1$ for
given $\theta^* \in \{\theta_0, \theta_1\}$? Making the first steps towards
answering this question in full generality, we first consider the case of a
well-specified linear model with squared loss. Here we provide matching upper
and lower bounds on the sample complexity, showing it to be $\min\{1/\Delta^2,
\sqrt{r}/\Delta\}$ up to a constant factor, where $\Delta$ is a measure of
separation between $\mathbb{P}_0$ and $\mathbb{P}_1$, and $r$ is the rank of
the design covariance matrix. This bound is dimension-independent, and
rank-independent for large enough separation. We then extend this result in two
directions: (i) for the general parametric setup in asymptotic regime; (ii) for
generalized linear models in the small-sample regime $n \le r$ and under weak
moment assumptions. In both cases, we derive sample complexity bounds of a
similar form, even under misspecification. Our testing procedures only access
$\theta^*$ through a certain functional of empirical risk. In addition, the
number of observations that allows to reach statistical confidence in our tests
does not allow to "resolve" the two models -- that is, recover
$\theta_0,\theta_1$ up to $O(\Delta)$ prediction accuracy. These two properties
allow to apply our framework in applied tasks where one would like to
\textit{identify} a prediction model, which can be proprietary, while
guaranteeing that the model cannot be actually \textit{inferred} by the
identifying agent.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - A random matrix model for random approximate $t$-designs [1.534667887016089]
A random matrix model is proposed to describe the probability distribution of $delta(nu_mathcalS,t)$ for any $t$.
We show that our model satisfies the so-called spectral gap conjecture, i.e. we prove that with the probability $1$ there is $tinmathbbZ_+$ such that $sup satisfieskinmathbbZ_+delta(k)=delta(t)$.
arXiv Detail & Related papers (2022-10-14T14:50:06Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Mean Estimation in High-Dimensional Binary Markov Gaussian Mixture
Models [12.746888269949407]
We consider a high-dimensional mean estimation problem over a binary hidden Markov model.
We establish a nearly minimax optimal (up to logarithmic factors) estimation error rate, as a function of $|theta_*|,delta,d,n$.
arXiv Detail & Related papers (2022-06-06T09:34:04Z) - Universality of empirical risk minimization [12.764655736673749]
Consider supervised learning from i.i.d. samples where $boldsymbol x_i inmathbbRp$ are feature vectors and $y in mathbbR$ are labels.
We study empirical risk universality over a class of functions that are parameterized by $mathsfk.
arXiv Detail & Related papers (2022-02-17T18:53:45Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z) - Efficient Statistics for Sparse Graphical Models from Truncated Samples [19.205541380535397]
We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models.
For sparse linear regression, suppose samples $(bf x,y)$ are generated where $y = bf xtopOmega* + mathcalN(0,1)$ and $(bf x, y)$ is seen only if $y$ belongs to a truncation set $S subseteq mathbbRd$.
arXiv Detail & Related papers (2020-06-17T09:21:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.