Probabilistic Contrastive Learning for Domain Adaptation
- URL: http://arxiv.org/abs/2111.06021v6
- Date: Sat, 8 Jun 2024 09:03:59 GMT
- Title: Probabilistic Contrastive Learning for Domain Adaptation
- Authors: Junjie Li, Yixin Zhang, Zilei Wang, Saihui Hou, Keyu Tu, Man Zhang,
- Abstract summary: Contrastive learning has shown impressive success in enhancing feature discriminability for various visual tasks in a self-supervised manner.
Standard contrastive paradigm (features+$ell_2$ normalization) has limited benefits when applied in domain adaptation.
We propose Probabilistic Contrastive Learning (PCL), which moves beyond the standard paradigm by removing $ell_2$ normalization and replacing the features with probabilities.
PCL can guide the probability distribution towards a one-hot configuration, thus minimizing the discrepancy between features and class weights.
- Score: 42.33633916857581
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contrastive learning has shown impressive success in enhancing feature discriminability for various visual tasks in a self-supervised manner, but the standard contrastive paradigm (features+$\ell_{2}$ normalization) has limited benefits when applied in domain adaptation. We find that this is mainly because the class weights (weights of the final fully connected layer) are ignored in the domain adaptation optimization process, which makes it difficult for features to cluster around the corresponding class weights. To solve this problem, we propose the \emph{simple but powerful} Probabilistic Contrastive Learning (PCL), which moves beyond the standard paradigm by removing $\ell_{2}$ normalization and replacing the features with probabilities. PCL can guide the probability distribution towards a one-hot configuration, thus minimizing the discrepancy between features and class weights. We conduct extensive experiments to validate the effectiveness of PCL and observe consistent performance gains on five tasks, i.e., Unsupervised/Semi-Supervised Domain Adaptation (UDA/SSDA), Semi-Supervised Learning (SSL), UDA Detection and Semantic Segmentation. Notably, for UDA Semantic Segmentation on SYNTHIA, PCL surpasses the sophisticated CPSL-D by $>\!2\%$ in terms of mean IoU with a much lower training cost (PCL: 1*3090, 5 days v.s. CPSL-D: 4*V100, 11 days). Code is available at https://github.com/ljjcoder/Probabilistic-Contrastive-Learning.
Related papers
- Proto-EVFL: Enhanced Vertical Federated Learning via Dual Prototype with Extremely Unaligned Data [28.626677790020082]
In vertical federated learning (VFL), unaligned samples across different parties in VFL can be extremely class-imbalanced.<n>We propose Proto-EVFL, an enhanced VFL framework via dual prototypes.<n>We prove that Proto-EVFL, as the first bi-level optimization framework in VFL, has a convergence rate of 1/sqrt T.
arXiv Detail & Related papers (2025-07-30T08:48:33Z) - MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on Large Language Models [53.36415620647177]
Semi-structured sparsity offers a promising solution by strategically retaining $N$ elements out of every $M$ weights.<n>Existing (N:M)-compatible approaches typically fall into two categories: rule-based layerwise greedy search, which suffers from considerable errors, and gradient-driven learning, which incurs prohibitive training costs.<n>We propose a novel linear-space probabilistic framework named MaskPro, which aims to learn a prior categorical distribution for every $M$ consecutive weights and subsequently leverages this distribution to generate the (N:M)-sparsity throughout an $N$-way sampling
arXiv Detail & Related papers (2025-06-15T15:02:59Z) - The Last Mile to Supervised Performance: Semi-Supervised Domain Adaptation for Semantic Segmentation [51.77968964691317]
We study the promising setting of Semi-Supervised Domain Adaptation (SSDA)
We propose a simple SSDA framework that combines consistency regularization, pixel contrastive learning, and self-training to effectively utilize a few target-domain labels.
Our method outperforms prior art in the popular GTA-to-Cityscapes benchmark and shows that as little as 50 target labels can suffice to achieve near-supervised performance.
arXiv Detail & Related papers (2024-11-27T20:07:42Z) - Dynamic Label Injection for Imbalanced Industrial Defect Segmentation [42.841736467487785]
We propose a Dynamic Label Injection (DLI) algorithm to impose a uniform distribution in the input batch.
Our algorithm computes the current batch defect distribution and re-balances it by transferring defects using a combination of Poisson-based seamless image cloning and cut-paste techniques.
arXiv Detail & Related papers (2024-08-19T14:24:46Z) - Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs [24.305423716384272]
We study the impact of the batch size on the iteration time $T$ of training two-layer neural networks with one-pass gradient descent (SGD)
We show that performing gradient updates with large batches minimizes the training time without changing the total sample complexity.
We show that one can track the training progress by a system of low-dimensional ordinary differential equations (ODEs)
arXiv Detail & Related papers (2024-06-04T09:44:49Z) - COBias and Debias: Balancing Class Accuracies for Language Models in Inference Time via Nonlinear Integer Programming [12.287692969438169]
This paper investigates a fundamental inference-time problem in language models: imbalanced class accuracies.
We find what's underneath the issue is a tendency to over-predict some classes while under-predicting some others.
We show it can be effectively mitigated via inference-time optimization.
arXiv Detail & Related papers (2024-05-13T10:30:33Z) - Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - BaCon: Boosting Imbalanced Semi-supervised Learning via Balanced Feature-Level Contrastive Learning [0.9160375060389779]
In Class Imbalanced Semi-supervised Learning (CISSL), the bias introduced by unreliable pseudo-labels can be exacerbated by imbalanced data distributions.
Our method directly regularizes the distribution of instances' representations in a well-designed contrastive manner.
Our method demonstrates its effectiveness through comprehensive experiments on the CIFAR10-LT, CIFAR100-LT, STL10-LT, and SVHN-LT datasets.
arXiv Detail & Related papers (2024-03-04T06:43:16Z) - Three Heads Are Better Than One: Complementary Experts for Long-Tailed Semi-supervised Learning [74.44500692632778]
We propose a novel method named ComPlementary Experts (CPE) to model various class distributions.
CPE achieves state-of-the-art performances on CIFAR-10-LT, CIFAR-100-LT, and STL-10-LT dataset benchmarks.
arXiv Detail & Related papers (2023-12-25T11:54:07Z) - Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data [21.6350640726058]
Semi-supervised learning (SSL) has attracted enormous attention due to its vast potential of mitigating the dependence on large labeled datasets.
We propose two novel techniques: Entropy Meaning Loss (EML) and Adaptive Negative Learning (ANL)
We integrate these techniques with FixMatch, and develop a simple yet powerful framework called FullMatch.
arXiv Detail & Related papers (2023-03-20T12:44:11Z) - Balanced Contrastive Learning for Long-Tailed Visual Recognition [32.789465918318925]
Real-world data typically follow a long-tailed distribution, where a few majority categories occupy most of the data.
In this paper, we focus on representation learning for imbalanced data.
We propose a novel loss for balanced contrastive learning (BCL)
arXiv Detail & Related papers (2022-07-19T03:48:59Z) - Positive-Negative Equal Contrastive Loss for Semantic Segmentation [8.664491798389662]
Previous works commonly design plug-and-play modules and structural losses to effectively extract and aggregate the global context.
We propose Positive-Negative Equal contrastive loss (PNE loss), which increases the latent impact of positive embedding on the anchor and treats the positive as well as negative sample pairs equally.
We conduct comprehensive experiments and achieve state-of-the-art performance on two benchmark datasets.
arXiv Detail & Related papers (2022-07-04T13:51:29Z) - Unsupervised Feature Learning by Cross-Level Instance-Group
Discrimination [68.83098015578874]
We integrate between-instance similarity into contrastive learning, not directly by instance grouping, but by cross-level discrimination.
CLD effectively brings unsupervised learning closer to natural data and real-world applications.
New state-of-the-art on self-supervision, semi-supervision, and transfer learning benchmarks, and beats MoCo v2 and SimCLR on every reported performance.
arXiv Detail & Related papers (2020-08-09T21:13:13Z) - Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise.
We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z) - Generalized Zero-Shot Learning Via Over-Complete Distribution [79.5140590952889]
We propose to generate an Over-Complete Distribution (OCD) using Conditional Variational Autoencoder (CVAE) of both seen and unseen classes.
The effectiveness of the framework is evaluated using both Zero-Shot Learning and Generalized Zero-Shot Learning protocols.
arXiv Detail & Related papers (2020-04-01T19:05:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.