Related papers: Joint Embedding Self-Supervised Learning in the Kernel Regime

Joint Embedding Self-Supervised Learning in the Kernel Regime

URL: http://arxiv.org/abs/2209.14884v1
Date: Thu, 29 Sep 2022 15:53:19 GMT
Title: Joint Embedding Self-Supervised Learning in the Kernel Regime
Authors: Bobak T. Kiani, Randall Balestriero, Yubei Chen, Seth Lloyd, Yann LeCun
Abstract summary: Self-supervised learning (SSL) produces useful representations of data without access to any labels for classifying the data. We extend this framework to incorporate algorithms based on kernel methods where embeddings are constructed by linear maps acting on the feature space of a kernel. We analyze our kernel model on small datasets to identify common features of self-supervised learning algorithms and gain theoretical insights into their performance on downstream tasks.
Score: 21.80241600638596
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The fundamental goal of self-supervised learning (SSL) is to produce useful representations of data without access to any labels for classifying the data. Modern methods in SSL, which form representations based on known or constructed relationships between samples, have been particularly effective at this task. Here, we aim to extend this framework to incorporate algorithms based on kernel methods where embeddings are constructed by linear maps acting on the feature space of a kernel. In this kernel regime, we derive methods to find the optimal form of the output representations for contrastive and non-contrastive loss functions. This procedure produces a new representation space with an inner product denoted as the induced kernel which generally correlates points which are related by an augmentation in kernel space and de-correlates points otherwise. We analyze our kernel model on small datasets to identify common features of self-supervised learning algorithms and gain theoretical insights into their performance on downstream tasks.

Related papers

Optimal Kernel Choice for Score Function-based Causal Discovery [92.65034439889872]
We propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data. We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms kernel selection methods.
arXiv Detail & Related papers (2024-07-14T09:32:20Z)
Learning "best" kernels from data in Gaussian process regression. With application to aerodynamics [0.4588028371034406]
We introduce algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques. A first class of algorithms is kernel flow, which was introduced in a context of classification in machine learning. A second class of algorithms is called spectral kernel ridge regression, and aims at selecting a "best" kernel such that the norm of the function to be approximated is minimal.
arXiv Detail & Related papers (2022-06-03T07:50:54Z)
Nonparametric learning of kernels in nonlocal operators [6.314604944530131]
We provide a rigorous identifiability analysis and convergence study for the learning of kernels in nonlocal operators. We propose a nonparametric regression algorithm with a novel data adaptive RKHS Tikhonov regularization method based on the function space of identifiability.
arXiv Detail & Related papers (2022-05-23T02:47:55Z)
Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains. We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z)
On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods. We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z)
Multiple Kernel Representation Learning on Networks [12.106994960669924]
We propose a weighted matrix factorization model that encodes random walk-based information about nodes of the network. We extend the approach with a multiple kernel learning formulation that provides the flexibility of learning the kernel as the linear combination of a dictionary of kernels.
arXiv Detail & Related papers (2021-06-09T13:22:26Z)
Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network. We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z)
Kernel k-Means, By All Means: Algorithms and Strong Consistency [21.013169939337583]
Kernel $k$ clustering is a powerful tool for unsupervised learning of non-linear data. In this paper, we generalize results leveraging a general family of means to combat sub-optimal local solutions. Our algorithm makes use of majorization-minimization (MM) to better solve this non-linear separation problem.
arXiv Detail & Related papers (2020-11-12T16:07:18Z)
Kernel learning approaches for summarising and combining posterior similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models. A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z)
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.