Contrastive Learning as Kernel Approximation
- URL: http://arxiv.org/abs/2309.02651v1
- Date: Wed, 6 Sep 2023 01:25:30 GMT
- Title: Contrastive Learning as Kernel Approximation
- Authors: Konstantinos Christopher Tsiolis
- Abstract summary: This thesis provides an overview of the current theoretical understanding of contrastive learning.
We highlight popular contrastive loss functions whose minimizers implicitly approximate a positive semidefinite (PSD) kernel.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In standard supervised machine learning, it is necessary to provide a label
for every input in the data. While raw data in many application domains is
easily obtainable on the Internet, manual labelling of this data is
prohibitively expensive. To circumvent this issue, contrastive learning methods
produce low-dimensional vector representations (also called features) of
high-dimensional inputs on large unlabelled datasets. This is done by training
with a contrastive loss function, which enforces that similar inputs have high
inner product and dissimilar inputs have low inner product in the feature
space. Rather than annotating each input individually, it suffices to define a
means of sampling pairs of similar and dissimilar inputs. Contrastive features
can then be fed as inputs to supervised learning systems on much smaller
labelled datasets to obtain high accuracy on end tasks of interest.
The goal of this thesis is to provide an overview of the current theoretical
understanding of contrastive learning, specifically as it pertains to the
minimizers of contrastive loss functions and their relationship to prior
methods for learning features from unlabelled data. We highlight popular
contrastive loss functions whose minimizers implicitly approximate a positive
semidefinite (PSD) kernel. The latter is a well-studied object in functional
analysis and learning theory that formalizes a notion of similarity between
elements of a space. PSD kernels provide an implicit definition of features
through the theory of reproducing kernel Hilbert spaces.
Related papers
- Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Deep Metric Learning for Computer Vision: A Brief Overview [4.980117530293724]
Objective functions that optimize deep neural networks play a vital role in creating an enhanced feature representation of the input data.
Deep Metric Learning seeks to develop methods that aim to measure the similarity between data samples.
We will provide an overview of recent progress in this area and discuss state-of-the-art Deep Metric Learning approaches.
arXiv Detail & Related papers (2023-12-01T21:53:36Z) - Joint Embedding Self-Supervised Learning in the Kernel Regime [21.80241600638596]
Self-supervised learning (SSL) produces useful representations of data without access to any labels for classifying the data.
We extend this framework to incorporate algorithms based on kernel methods where embeddings are constructed by linear maps acting on the feature space of a kernel.
We analyze our kernel model on small datasets to identify common features of self-supervised learning algorithms and gain theoretical insights into their performance on downstream tasks.
arXiv Detail & Related papers (2022-09-29T15:53:19Z) - Non-contrastive representation learning for intervals from well logs [58.70164460091879]
The representation learning problem in the oil & gas industry aims to construct a model that provides a representation based on logging data for a well interval.
One of the possible approaches is self-supervised learning (SSL)
We are the first to introduce non-contrastive SSL for well-logging data.
arXiv Detail & Related papers (2022-09-28T13:27:10Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Positive-Negative Equal Contrastive Loss for Semantic Segmentation [8.664491798389662]
Previous works commonly design plug-and-play modules and structural losses to effectively extract and aggregate the global context.
We propose Positive-Negative Equal contrastive loss (PNE loss), which increases the latent impact of positive embedding on the anchor and treats the positive as well as negative sample pairs equally.
We conduct comprehensive experiments and achieve state-of-the-art performance on two benchmark datasets.
arXiv Detail & Related papers (2022-07-04T13:51:29Z) - Learning sparse features can lead to overfitting in neural networks [9.2104922520782]
We show that feature learning can perform worse than lazy training.
Although sparsity is known to be essential for learning anisotropic data, it is detrimental when the target function is constant or smooth.
arXiv Detail & Related papers (2022-06-24T14:26:33Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Contrastive learning of strong-mixing continuous-time stochastic
processes [53.82893653745542]
Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data.
We show that a properly constructed contrastive learning task can be used to estimate the transition kernel for small-to-mid-range intervals in the diffusion case.
arXiv Detail & Related papers (2021-03-03T23:06:47Z) - Complex-valued embeddings of generic proximity data [0.6117371161379209]
Proximities are at the heart of almost all machine learning methods.
We propose a complex-valued vector embedding of proximity data.
The complex-valued data can serve as an input to complex-valued machine learning algorithms.
arXiv Detail & Related papers (2020-08-31T09:40:30Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.