Related papers: Contrastive Learning as Kernel Approximation

Contrastive Learning as Kernel Approximation

URL: http://arxiv.org/abs/2309.02651v1
Date: Wed, 6 Sep 2023 01:25:30 GMT
Title: Contrastive Learning as Kernel Approximation
Authors: Konstantinos Christopher Tsiolis
Abstract summary: This thesis provides an overview of the current theoretical understanding of contrastive learning. We highlight popular contrastive loss functions whose minimizers implicitly approximate a positive semidefinite (PSD) kernel.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In standard supervised machine learning, it is necessary to provide a label for every input in the data. While raw data in many application domains is easily obtainable on the Internet, manual labelling of this data is prohibitively expensive. To circumvent this issue, contrastive learning methods produce low-dimensional vector representations (also called features) of high-dimensional inputs on large unlabelled datasets. This is done by training with a contrastive loss function, which enforces that similar inputs have high inner product and dissimilar inputs have low inner product in the feature space. Rather than annotating each input individually, it suffices to define a means of sampling pairs of similar and dissimilar inputs. Contrastive features can then be fed as inputs to supervised learning systems on much smaller labelled datasets to obtain high accuracy on end tasks of interest. The goal of this thesis is to provide an overview of the current theoretical understanding of contrastive learning, specifically as it pertains to the minimizers of contrastive loss functions and their relationship to prior methods for learning features from unlabelled data. We highlight popular contrastive loss functions whose minimizers implicitly approximate a positive semidefinite (PSD) kernel. The latter is a well-studied object in functional analysis and learning theory that formalizes a notion of similarity between elements of a space. PSD kernels provide an implicit definition of features through the theory of reproducing kernel Hilbert spaces.

Related papers

Data-Driven Self-Supervised Learning for the Discovery of Solution Singularity for Partial Differential Equations [0.0]
The appearance of singularities in the function of interest constitutes a fundamental challenge in scientific computing.<n>We propose a self-supervised learning framework for estimating the location of the singularity.<n>Various experiments are presented to demonstrate the ability of the proposed approach to deal with input perturbation, label corruption, and different kinds of singularities.
arXiv Detail & Related papers (2025-06-29T17:39:41Z)
Multi-Label Contrastive Learning : A Comprehensive Study [48.81069245141415]
Multi-label classification has emerged as a key area in both research and industry. Applying contrastive learning to multi-label classification presents unique challenges. We conduct an in-depth study of contrastive learning loss for multi-label classification across diverse settings.
arXiv Detail & Related papers (2024-11-27T20:20:06Z)
Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning. One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems. We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z)
Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples. Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance. We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z)
Deep Metric Learning for Computer Vision: A Brief Overview [4.980117530293724]
Objective functions that optimize deep neural networks play a vital role in creating an enhanced feature representation of the input data. Deep Metric Learning seeks to develop methods that aim to measure the similarity between data samples. We will provide an overview of recent progress in this area and discuss state-of-the-art Deep Metric Learning approaches.
arXiv Detail & Related papers (2023-12-01T21:53:36Z)
Joint Embedding Self-Supervised Learning in the Kernel Regime [21.80241600638596]
Self-supervised learning (SSL) produces useful representations of data without access to any labels for classifying the data. We extend this framework to incorporate algorithms based on kernel methods where embeddings are constructed by linear maps acting on the feature space of a kernel. We analyze our kernel model on small datasets to identify common features of self-supervised learning algorithms and gain theoretical insights into their performance on downstream tasks.
arXiv Detail & Related papers (2022-09-29T15:53:19Z)
Non-contrastive representation learning for intervals from well logs [58.70164460091879]
The representation learning problem in the oil & gas industry aims to construct a model that provides a representation based on logging data for a well interval. One of the possible approaches is self-supervised learning (SSL) We are the first to introduce non-contrastive SSL for well-logging data.
arXiv Detail & Related papers (2022-09-28T13:27:10Z)
A Survey of Learning on Small Data: Generalization, Optimization, and Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI. This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data. Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z)
Positive-Negative Equal Contrastive Loss for Semantic Segmentation [8.664491798389662]
Previous works commonly design plug-and-play modules and structural losses to effectively extract and aggregate the global context. We propose Positive-Negative Equal contrastive loss (PNE loss), which increases the latent impact of positive embedding on the anchor and treats the positive as well as negative sample pairs equally. We conduct comprehensive experiments and achieve state-of-the-art performance on two benchmark datasets.
arXiv Detail & Related papers (2022-07-04T13:51:29Z)
Learning sparse features can lead to overfitting in neural networks [9.2104922520782]
We show that feature learning can perform worse than lazy training. Although sparsity is known to be essential for learning anisotropic data, it is detrimental when the target function is constant or smooth.
arXiv Detail & Related papers (2022-06-24T14:26:33Z)
Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts. We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data. We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z)
Contrastive learning of strong-mixing continuous-time stochastic processes [53.82893653745542]
Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data. We show that a properly constructed contrastive learning task can be used to estimate the transition kernel for small-to-mid-range intervals in the diffusion case.
arXiv Detail & Related papers (2021-03-03T23:06:47Z)
Complex-valued embeddings of generic proximity data [0.6117371161379209]
Proximities are at the heart of almost all machine learning methods. We propose a complex-valued vector embedding of proximity data. The complex-valued data can serve as an input to complex-valued machine learning algorithms.
arXiv Detail & Related papers (2020-08-31T09:40:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.