Optimal Sample Complexity of Contrastive Learning
- URL: http://arxiv.org/abs/2312.00379v1
- Date: Fri, 1 Dec 2023 06:57:11 GMT
- Title: Optimal Sample Complexity of Contrastive Learning
- Authors: Noga Alon, Dmitrii Avdiukhin, Dor Elboim, Orr Fischer, Grigory
Yaroslavtsev
- Abstract summary: We study the complexity of contrastive learning, i.e. the minimum number of labeled generalizations sufficient for getting high accuracy.
We give tight bounds on the sample complexity in a variety of settings, focusing on arbitrary distance functions.
We show that the theoretical bounds on sample complexity obtained via VC/Natarajan can have strong predictive power for experimental results.
- Score: 12.108926584457736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive learning is a highly successful technique for learning
representations of data from labeled tuples, specifying the distance relations
within the tuple. We study the sample complexity of contrastive learning, i.e.
the minimum number of labeled tuples sufficient for getting high generalization
accuracy. We give tight bounds on the sample complexity in a variety of
settings, focusing on arbitrary distance functions, both general
$\ell_p$-distances, and tree metrics. Our main result is an (almost) optimal
bound on the sample complexity of learning $\ell_p$-distances for integer $p$.
For any $p \ge 1$ we show that $\tilde \Theta(\min(nd,n^2))$ labeled tuples are
necessary and sufficient for learning $d$-dimensional representations of
$n$-point datasets. Our results hold for an arbitrary distribution of the input
samples and are based on giving the corresponding bounds on the
Vapnik-Chervonenkis/Natarajan dimension of the associated problems. We further
show that the theoretical bounds on sample complexity obtained via VC/Natarajan
dimension can have strong predictive power for experimental results, in
contrast with the folklore belief about a substantial gap between the
statistical learning theory and the practice of deep learning.
Related papers
- Information-Computation Tradeoffs for Learning Margin Halfspaces with
Random Classification Noise [50.64137465792738]
We study the problem of PAC $gamma$-margin halfspaces with Random Classification Noise.
We establish an information-computation tradeoff suggesting an inherent gap between the sample complexity of the problem and the sample complexity of computationally efficient algorithms.
arXiv Detail & Related papers (2023-06-28T16:33:39Z) - Sharper Bounds for $\ell_p$ Sensitivity Sampling [56.45770306797584]
We show the first bounds for sensitivity sampling for $ell_p$ subspace embeddings for $p.
We also show that the root leverage score sampling algorithm achieves a bound of roughly $d$ for $1leq p2$, and that a combination of leverage score and sensitivity sampling achieves an improved bound of roughly $d2/pmathfrak S2-4/p$ for $2pinfty.
arXiv Detail & Related papers (2023-06-01T14:27:28Z) - The Sample Complexity of Approximate Rejection Sampling with
Applications to Smoothed Online Learning [29.44582058149344]
We show that the optimal total variation distance as a function of $n$ is given by $tildeTheta(fracDf'(n))$ over the class of all pairs $nu,mu$ with a bounded $f$-divergence.
We then consider an application in the seemingly very different field of smoothed online learning.
arXiv Detail & Related papers (2023-02-09T14:20:14Z) - Tree Learning: Optimal Algorithms and Sample Complexity [10.638365461509]
We study the problem of learning a hierarchical tree representation of data from labeled samples taken from an arbitrary distribution.
We present optimal sample bounds for this problem in several learning settings, including (agnostic) learning and online learning.
arXiv Detail & Related papers (2023-02-09T08:35:17Z) - On-Demand Sampling: Learning Optimally from Multiple Distributions [63.20009081099896]
Social and real-world considerations have given rise to multi-distribution learning paradigms.
We establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity.
Our algorithm design and analysis are enabled by our extensions of online learning techniques for solving zero-sum games.
arXiv Detail & Related papers (2022-10-22T19:07:26Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - Under-bagging Nearest Neighbors for Imbalanced Classification [63.026765294759876]
We propose an ensemble learning algorithm called textitunder-bagging $k$-NN (textitunder-bagging $k$-NN) for imbalanced classification problems.
arXiv Detail & Related papers (2021-09-01T14:10:38Z) - Sample Complexity of Adversarially Robust Linear Classification on
Separated Data [41.30425613353595]
We consider the sample complexity of learning with adversarial robustness.
We show that for very well-separated data, convergence rates of $O(frac1n)$ are achievable.
arXiv Detail & Related papers (2020-12-19T22:04:59Z) - Average-case Complexity of Teaching Convex Polytopes via Halfspace
Queries [55.28642461328172]
We show that the average-case teaching complexity is $Theta(d)$, which is in sharp contrast to the worst-case teaching complexity of $Theta(n)$.
Our insights allow us to establish a tight bound on the average-case complexity for $phi$-separable dichotomies.
arXiv Detail & Related papers (2020-06-25T19:59:24Z) - Learning the mapping $\mathbf{x}\mapsto \sum_{i=1}^d x_i^2$: the cost of
finding the needle in a haystack [6.436049723896002]
We show that the cost of finding the needle in a haystack is related to the Barron norm of the function.
In order to control the size of the generalization gap, we find that the use of explicit regularization becomes increasingly more important as $d$ increases.
arXiv Detail & Related papers (2020-02-24T21:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.