Related papers: Tight PAC-Bayesian Risk Certificates for Contrastive Learning

Tight PAC-Bayesian Risk Certificates for Contrastive Learning

URL: http://arxiv.org/abs/2412.03486v2
Date: Thu, 05 Dec 2024 09:26:26 GMT
Title: Tight PAC-Bayesian Risk Certificates for Contrastive Learning
Authors: Anna Van Elst, Debarghya Ghoshdastidar,
Abstract summary: We develop non-vacuous PAC-Bayesian risk certificates for contrastive representation learning.<n>We incorporate SimCLR-specific factors, including data augmentation and temperature scaling, and derive risk certificates for the contrastive zero-one risk.
Score: 6.944372188747803
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Contrastive representation learning is a modern paradigm for learning representations of unlabeled data via augmentations -- precisely, contrastive models learn to embed semantically similar pairs of samples (positive pairs) closer than independently drawn samples (negative samples). In spite of its empirical success and widespread use in foundation models, statistical theory for contrastive learning remains less explored. Recent works have developed generalization error bounds for contrastive losses, but the resulting risk certificates are either vacuous (certificates based on Rademacher complexity or $f$-divergence) or require strong assumptions about samples that are unreasonable in practice. The present paper develops non-vacuous PAC-Bayesian risk certificates for contrastive representation learning, considering the practical considerations of the popular SimCLR framework. Notably, we take into account that SimCLR reuses positive pairs of augmented data as negative samples for other data, thereby inducing strong dependence and making classical PAC or PAC-Bayesian bounds inapplicable. We further refine existing bounds on the downstream classification loss by incorporating SimCLR-specific factors, including data augmentation and temperature scaling, and derive risk certificates for the contrastive zero-one risk. The resulting bounds for contrastive loss and downstream prediction are much tighter than those of previous risk certificates, as demonstrated by experiments on CIFAR-10.

Related papers

A Statistical Theory of Contrastive Learning via Approximate Sufficient Statistics [19.24473530318175]
We develop a new theoretical framework for analyzing data augmentation-based contrastive learning. We show that minimizing SimCLR and other contrastive losses yields encoders that are approximately sufficient.
arXiv Detail & Related papers (2025-03-21T21:07:18Z)
Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations. We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting. We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z)
Combining T-learning and DR-learning: a framework for oracle-efficient estimation of causal contrasts [1.0896141997814233]
We introduce efficient plug-in (EP) learning, a novel framework for the estimation of heterogeneous causal contrasts. EP-learners of the conditional average treatment and conditional relative risk outperform state-of-the-art competitors.
arXiv Detail & Related papers (2024-02-03T00:47:50Z)
Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical [66.57396042747706]
Complementary-label learning is a weakly supervised learning problem. We propose a consistent approach that does not rely on the uniform distribution assumption. We find that complementary-label learning can be expressed as a set of negative-unlabeled binary classification problems.
arXiv Detail & Related papers (2023-11-27T02:59:17Z)
A Generalized Unbiased Risk Estimator for Learning with Augmented Classes [70.20752731393938]
Given unlabeled data, an unbiased risk estimator (URE) can be derived, which can be minimized for LAC with theoretical guarantees. We propose a generalized URE that can be equipped with arbitrary loss functions while maintaining the theoretical guarantees.
arXiv Detail & Related papers (2023-06-12T06:52:04Z)
Safe Deployment for Counterfactual Learning to Rank with Exposure-Based Risk Minimization [63.93275508300137]
We introduce a novel risk-aware Counterfactual Learning To Rank method with theoretical guarantees for safe deployment. Our experimental results demonstrate the efficacy of our proposed method, which is effective at avoiding initial periods of bad performance when little data is available.
arXiv Detail & Related papers (2023-04-26T15:54:23Z)
Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm. Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z)
Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class. For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z)
Contrastive Attraction and Contrastive Repulsion for Representation Learning [131.72147978462348]
Contrastive learning (CL) methods learn data representations in a self-supervision manner, where the encoder contrasts each positive sample over multiple negative samples. Recent CL methods have achieved promising results when pretrained on large-scale datasets, such as ImageNet. We propose a doubly CL strategy that separately compares positive and negative samples within their own groups, and then proceeds with a contrast between positive and negative groups.
arXiv Detail & Related papers (2021-05-08T17:25:08Z)
SENTINEL: Taming Uncertainty with Ensemble-based Distributional Reinforcement Learning [6.587644069410234]
We consider risk-sensitive sequential decision-making in model-based reinforcement learning (RL) We introduce a novel quantification of risk, namely emphcomposite risk We experimentally verify that SENTINEL-K estimates the return distribution better, and while used with composite risk estimate, demonstrates better risk-sensitive performance than competing RL algorithms.
arXiv Detail & Related papers (2021-02-22T14:45:39Z)
Learning from Similarity-Confidence Data [94.94650350944377]
We investigate a novel weakly supervised learning problem of learning from similarity-confidence (Sconf) data. We propose an unbiased estimator of the classification risk that can be calculated from only Sconf data and show that the estimation error bound achieves the optimal convergence rate.
arXiv Detail & Related papers (2021-02-13T07:31:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.