Contrastive Learning Can Find An Optimal Basis For Approximately
View-Invariant Functions
- URL: http://arxiv.org/abs/2210.01883v1
- Date: Tue, 4 Oct 2022 20:02:52 GMT
- Title: Contrastive Learning Can Find An Optimal Basis For Approximately
View-Invariant Functions
- Authors: Daniel D. Johnson, Ayoub El Hanchi, Chris J. Maddison
- Abstract summary: We show that multiple contrastive learning methods can be reinterpreted as learning kernel functions that approximate a fixed positive-pair kernel.
We prove that a simple representation obtained by combining this kernel with PCA provably minimizes the worst-case approximation error of linear predictors.
- Score: 18.440569330385323
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contrastive learning is a powerful framework for learning self-supervised
representations that generalize well to downstream supervised tasks. We show
that multiple existing contrastive learning methods can be reinterpreted as
learning kernel functions that approximate a fixed positive-pair kernel. We
then prove that a simple representation obtained by combining this kernel with
PCA provably minimizes the worst-case approximation error of linear predictors,
under a straightforward assumption that positive pairs have similar labels. Our
analysis is based on a decomposition of the target function in terms of the
eigenfunctions of a positive-pair Markov chain, and a surprising equivalence
between these eigenfunctions and the output of Kernel PCA. We give
generalization bounds for downstream linear prediction using our Kernel PCA
representation, and show empirically on a set of synthetic tasks that applying
Kernel PCA to contrastive learning models can indeed approximately recover the
Markov chain eigenfunctions, although the accuracy depends on the kernel
parameterization as well as on the augmentation strength.
Related papers
- Joint Embedding Self-Supervised Learning in the Kernel Regime [21.80241600638596]
Self-supervised learning (SSL) produces useful representations of data without access to any labels for classifying the data.
We extend this framework to incorporate algorithms based on kernel methods where embeddings are constructed by linear maps acting on the feature space of a kernel.
We analyze our kernel model on small datasets to identify common features of self-supervised learning algorithms and gain theoretical insights into their performance on downstream tasks.
arXiv Detail & Related papers (2022-09-29T15:53:19Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces.
We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z) - Revisiting Memory Efficient Kernel Approximation: An Indefinite Learning
Perspective [0.8594140167290097]
Matrix approximations are a key element in large-scale machine learning approaches.
We extend MEKA to be applicable not only for shift-invariant kernels but also for non-stationary kernels.
We present a Lanczos-based estimation of a spectrum shift to develop a stable positive semi-definite MEKA approximation.
arXiv Detail & Related papers (2021-12-18T10:01:34Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Provable Guarantees for Self-Supervised Deep Learning with Spectral
Contrastive Loss [72.62029620566925]
Recent works in self-supervised learning have advanced the state-of-the-art by relying on the contrastive learning paradigm.
Our work analyzes contrastive learning without assuming conditional independence of positive pairs.
We propose a loss that performs spectral decomposition on the population augmentation graph and can be succinctly written as a contrastive learning objective.
arXiv Detail & Related papers (2021-06-08T07:41:02Z) - From Majorization to Interpolation: Distributionally Robust Learning
using Kernel Smoothing [1.2891210250935146]
We study the function approximation aspect of distributionally robust optimization (DRO) based on probability metrics.
This paper instead proposes robust learning algorithms based on smooth function approximation and convolution.
arXiv Detail & Related papers (2021-02-16T22:25:18Z) - Generalized vec trick for fast learning of pairwise kernel models [3.867363075280544]
We present a comprehensive review of pairwise kernels, that have been proposed for incorporating prior knowledge about the relationship between the objects.
We show how all the reviewed kernels can be expressed as sums of Kronecker products, allowing the use of generalized vec trick for speeding up their computation.
arXiv Detail & Related papers (2020-09-02T13:27:51Z) - Evaluating probabilistic classifiers: Reliability diagrams and score
decompositions revisited [68.8204255655161]
We introduce the CORP approach, which generates provably statistically Consistent, Optimally binned, and Reproducible reliability diagrams in an automated way.
Corpor is based on non-parametric isotonic regression and implemented via the Pool-adjacent-violators (PAV) algorithm.
arXiv Detail & Related papers (2020-08-07T08:22:26Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.