Metric Entropy Duality and the Sample Complexity of Outcome
Indistinguishability
- URL: http://arxiv.org/abs/2203.04536v1
- Date: Wed, 9 Mar 2022 06:02:31 GMT
- Title: Metric Entropy Duality and the Sample Complexity of Outcome
Indistinguishability
- Authors: Lunjia Hu, Charlotte Peale, Omer Reingold
- Abstract summary: In outcome indistinguishability, the goal is to output a predictor that cannot be distinguished from the target predictor.
We show that the sample complexity of outcome indistinguishability is characterized by the metric entropy of $P$ w.r.t.
This equivalence makes an intriguing connection to the long-standing metric entropy duality conjecture in convex geometry.
- Score: 7.727052811126007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We give the first sample complexity characterizations for outcome
indistinguishability, a theoretical framework of machine learning recently
introduced by Dwork, Kim, Reingold, Rothblum, and Yona (STOC 2021). In outcome
indistinguishability, the goal of the learner is to output a predictor that
cannot be distinguished from the target predictor by a class $D$ of
distinguishers examining the outcomes generated according to the predictors'
predictions.
In the distribution-specific and realizable setting where the learner is
given the data distribution together with a predictor class $P$ containing the
target predictor, we show that the sample complexity of outcome
indistinguishability is characterized by the metric entropy of $P$ w.r.t. the
dual Minkowski norm defined by $D$, and equivalently by the metric entropy of
$D$ w.r.t. the dual Minkowski norm defined by $P$. This equivalence makes an
intriguing connection to the long-standing metric entropy duality conjecture in
convex geometry. Our sample complexity characterization implies a variant of
metric entropy duality, which we show is nearly tight. In the distribution-free
setting, we focus on the case considered by Dwork et al. where $P$ contains all
possible predictors, hence the sample complexity only depends on $D$. In this
setting, we show that the sample complexity of outcome indistinguishability is
characterized by the fat-shattering dimension of $D$.
We also show a strong sample complexity separation between realizable and
agnostic outcome indistinguishability in both the distribution-free and the
distribution-specific settings. This is in contrast to distribution-free (resp.
distribution-specific) PAC learning where the sample complexity in both the
realizable and the agnostic settings can be characterized by the VC dimension
(resp. metric entropy).
Related papers
- Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Sample Complexity of Variance-reduced Distributionally Robust Q-learning [17.96094201655567]
This paper presents two novel model-free algorithms, namely the distributionally robust Q-learning and its variance-reduced counterpart, that can effectively learn a robust policy despite distributional shifts.
A series of numerical experiments confirm the theoretical findings and the efficiency of the algorithms in handling distributional shifts.
arXiv Detail & Related papers (2023-05-28T19:40:46Z) - Differentially-Private Bayes Consistency [70.92545332158217]
We construct a Bayes consistent learning rule that satisfies differential privacy (DP)
We prove that any VC class can be privately learned in a semi-supervised setting with a near-optimal sample complexity.
arXiv Detail & Related papers (2022-12-08T11:57:30Z) - On counterfactual inference with unobserved confounding [36.18241676876348]
Given an observational study with $n$ independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit.
We introduce a convex objective that pools all $n$ samples to jointly learn all $n$ parameter vectors.
We derive sufficient conditions for compactly supported distributions to satisfy the logarithmic Sobolev inequality.
arXiv Detail & Related papers (2022-11-14T04:14:37Z) - Learning versus Refutation in Noninteractive Local Differential Privacy [133.80204506727526]
We study two basic statistical tasks in non-interactive local differential privacy (LDP): learning and refutation.
Our main result is a complete characterization of the sample complexity of PAC learning for non-interactive LDP protocols.
arXiv Detail & Related papers (2022-10-26T03:19:24Z) - On-Demand Sampling: Learning Optimally from Multiple Distributions [63.20009081099896]
Social and real-world considerations have given rise to multi-distribution learning paradigms.
We establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity.
Our algorithm design and analysis are enabled by our extensions of online learning techniques for solving zero-sum games.
arXiv Detail & Related papers (2022-10-22T19:07:26Z) - Sample Complexity of Forecast Aggregation [9.122524488932573]
We consider a Bayesian forecast aggregation model where $n$ experts, after observing private signals about an unknown binary event, report their posterior beliefs about the event to a principal.
The principal aggregates the reports into a single prediction for the event.
We show that the sample complexity of this problem is at least $tilde (mn-2 / varepsilon)$ for arbitrary discrete distributions.
arXiv Detail & Related papers (2022-07-26T18:12:53Z) - $k$-Variance: A Clustered Notion of Variance [23.57925128327]
We introduce $k$-variance, a generalization of variance built on the machinery of random bipartite matchings.
We provide in-depth analysis of this quantity in several key cases, including one-dimensional measures, clustered measures, and measures concentrated on low-dimensional subsets.
arXiv Detail & Related papers (2020-12-13T04:25:32Z) - $(f,\Gamma)$-Divergences: Interpolating between $f$-Divergences and
Integral Probability Metrics [6.221019624345409]
We develop a framework for constructing information-theoretic divergences that subsume both $f$-divergences and integral probability metrics (IPMs)
We show that they can be expressed as a two-stage mass-redistribution/mass-transport process.
Using statistical learning as an example, we demonstrate their advantage in training generative adversarial networks (GANs) for heavy-tailed, not-absolutely continuous sample distributions.
arXiv Detail & Related papers (2020-11-11T18:17:09Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.