CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for
Optimized Learning Fusion
- URL: http://arxiv.org/abs/2402.14551v1
- Date: Thu, 22 Feb 2024 13:45:01 GMT
- Title: CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for
Optimized Learning Fusion
- Authors: Zijun Long and George Killick and Lipeng Zhuang and Gerardo
Aragon-Camarasa and Zaiqiao Meng and Richard Mccreadie
- Abstract summary: Cross-Entropy loss (CE) can compromise model generalization and stability.
We introduce a novel approach named CLCE, which integrates Contrastive Learning with CE.
We show that CLCE significantly outperforms CE in Top-1 accuracy across twelve benchmarks.
- Score: 16.00706418526691
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: State-of-the-art pre-trained image models predominantly adopt a two-stage
approach: initial unsupervised pre-training on large-scale datasets followed by
task-specific fine-tuning using Cross-Entropy loss~(CE). However, it has been
demonstrated that CE can compromise model generalization and stability. While
recent works employing contrastive learning address some of these limitations
by enhancing the quality of embeddings and producing better decision
boundaries, they often overlook the importance of hard negative mining and rely
on resource intensive and slow training using large sample batches. To counter
these issues, we introduce a novel approach named CLCE, which integrates
Label-Aware Contrastive Learning with CE. Our approach not only maintains the
strengths of both loss functions but also leverages hard negative mining in a
synergistic way to enhance performance. Experimental results demonstrate that
CLCE significantly outperforms CE in Top-1 accuracy across twelve benchmarks,
achieving gains of up to 3.52% in few-shot learning scenarios and 3.41% in
transfer learning settings with the BEiT-3 model. Importantly, our proposed
CLCE approach effectively mitigates the dependency of contrastive learning on
large batch sizes such as 4096 samples per batch, a limitation that has
previously constrained the application of contrastive learning in
budget-limited hardware environments.
Related papers
- ICL-TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models [103.45785408116146]
Continual learning (CL) aims to train a model that can solve multiple tasks presented sequentially.
Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks.
However, such methods lack theoretical guarantees, making them prone to unexpected failures.
We bridge this gap by integrating an empirically strong approach into a principled framework, designed to prevent forgetting.
arXiv Detail & Related papers (2024-10-01T12:58:37Z) - When hard negative sampling meets supervised contrastive learning [17.173114048398947]
We introduce a new supervised contrastive learning objective, SCHaNe, which incorporates hard negative sampling during the fine-tuning phase.
SchaNe outperforms the strong baseline BEiT-3 in Top-1 accuracy across various benchmarks.
Our proposed objective sets a new state-of-the-art for base models on ImageNet-1k, achieving an 86.14% accuracy.
arXiv Detail & Related papers (2023-08-28T20:30:10Z) - FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity
in Data-Efficient GANs [24.18718734850797]
Data-Efficient GANs (DE-GANs) aim to learn generative models with a limited amount of training data.
Contrastive learning has shown the great potential of increasing the synthesis quality of DE-GANs.
We propose FakeCLR, which only applies contrastive learning on fake samples.
arXiv Detail & Related papers (2022-07-18T14:23:38Z) - Learning with Multiclass AUC: Theory and Algorithms [141.63211412386283]
Area under the ROC curve (AUC) is a well-known ranking metric for problems such as imbalanced learning and recommender systems.
In this paper, we start an early trial to consider the problem of learning multiclass scoring functions via optimizing multiclass AUC metrics.
arXiv Detail & Related papers (2021-07-28T05:18:10Z) - Semi-supervised Contrastive Learning with Similarity Co-calibration [72.38187308270135]
We propose a novel training strategy, termed as Semi-supervised Contrastive Learning (SsCL)
SsCL combines the well-known contrastive loss in self-supervised learning with the cross entropy loss in semi-supervised learning.
We show that SsCL produces more discriminative representation and is beneficial to few shot learning.
arXiv Detail & Related papers (2021-05-16T09:13:56Z) - Solving Inefficiency of Self-supervised Representation Learning [87.30876679780532]
Existing contrastive learning methods suffer from very low learning efficiency.
Under-clustering and over-clustering problems are major obstacles to learning efficiency.
We propose a novel self-supervised learning framework using a median triplet loss.
arXiv Detail & Related papers (2021-04-18T07:47:10Z) - Highly Efficient Knowledge Graph Embedding Learning with Orthogonal
Procrustes Analysis [10.154836127889487]
Knowledge Graph Embeddings (KGEs) have been intensively explored in recent years due to their promise for a wide range of applications.
This paper proposes a simple yet effective KGE framework which can reduce the training time and carbon footprint by orders of magnitudes.
arXiv Detail & Related papers (2021-04-10T03:55:45Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.