ACL: Aligned Contrastive Learning Improves BERT and Multi-exit BERT Fine-tuning
- URL: http://arxiv.org/abs/2602.03563v2
- Date: Thu, 12 Feb 2026 02:34:46 GMT
- Title: ACL: Aligned Contrastive Learning Improves BERT and Multi-exit BERT Fine-tuning
- Authors: Liz Li, Wei Zhu,
- Abstract summary: We introduce a novel underlineAligned underlineContrastive underlineLearning (ACL) framework.<n>ACL-Embed regards label embeddings as extra augmented samples with different labels and employs contrastive learning to align the label embeddings with its samples' representations.<n>To facilitate the optimization of ACL-Embed objective combined with the CE loss, we propose ACL-Grad, which will discard the ACL-Embed term if the two objectives are in conflict.
- Score: 3.060720241524644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite its success in self-supervised learning, contrastive learning is less studied in the supervised setting. In this work, we first use a set of pilot experiments to show that in the supervised setting, the cross-entropy loss objective (CE) and the contrastive learning objective often conflict with each other, thus hindering the applications of CL in supervised settings. To resolve this problem, we introduce a novel \underline{A}ligned \underline{C}ontrastive \underline{L}earning (ACL) framework. First, ACL-Embed regards label embeddings as extra augmented samples with different labels and employs contrastive learning to align the label embeddings with its samples' representations. Second, to facilitate the optimization of ACL-Embed objective combined with the CE loss, we propose ACL-Grad, which will discard the ACL-Embed term if the two objectives are in conflict. To further enhance the performances of intermediate exits of multi-exit BERT, we further propose cross-layer ACL (ACL-CL), which is to ask the teacher exit to guide the optimization of student shallow exits. Extensive experiments on the GLUE benchmark results in the following takeaways: (a) ACL-BRT outperforms or performs comparably with CE and CE+SCL on the GLUE tasks; (b) ACL, especially CL-ACL, significantly surpasses the baseline methods on the fine-tuning of multi-exit BERT, thus providing better quality-speed tradeoffs for low-latency applications.
Related papers
- A Practical Guide to Streaming Continual Learning [53.995807801604506]
Continual Learning (CL) and Streaming Machine Learning () study the ability of agents to learn from a stream of non-stationary data.<n>Despite sharing some similarities, they address different and complementary challenges.<n>We discuss Streaming Continual Learning (SCL), an emerging paradigm providing a unifying solution to real-world problems.
arXiv Detail & Related papers (2026-03-02T10:06:34Z) - Aligned Contrastive Loss for Long-Tailed Recognition [43.33186901322387]
We propose an Aligned Contrastive Learning (ACL) algorithm to address the long-tailed recognition problem.<n>Our findings indicate that while multi-view training boosts the performance, contrastive learning does not consistently enhance model generalization as the number of views increases.<n>Our ACL algorithm is designed to eliminate these problems and demonstrates strong performance across multiple benchmarks.
arXiv Detail & Related papers (2025-06-01T16:19:30Z) - In-context Continual Learning Assisted by an External Continual Learner [19.382196203113836]
Existing continual learning (CL) methods rely on fine-tuning or adapting large language models (LLMs)<n>We introduce InCA, a novel approach that integrates an external continual learner (ECL) with ICL to enable scalable CL without CF.
arXiv Detail & Related papers (2024-12-20T04:44:41Z) - Fusion Self-supervised Learning for Recommendation [16.02820746003461]
We propose a Fusion Self-supervised Learning framework for recommendation.<n>Specifically, we use high-order information from GCN process to create contrastive views.<n>To integrate self-supervised signals from various CL objectives, we propose an advanced CL objective.
arXiv Detail & Related papers (2024-07-29T04:30:38Z) - RecDCL: Dual Contrastive Learning for Recommendation [65.6236784430981]
We propose a dual contrastive learning recommendation framework -- RecDCL.
In RecDCL, the FCL objective is designed to eliminate redundant solutions on user-item positive pairs.
The BCL objective is utilized to generate contrastive embeddings on output vectors for enhancing the robustness of the representations.
arXiv Detail & Related papers (2024-01-28T11:51:09Z) - DimCL: Dimensional Contrastive Learning For Improving Self-Supervised
Learning [40.25324481491231]
This paper proposes a strategy of performing contrastive learning along the dimensional direction instead of along the batch direction.
DimCL aims to enhance the feature diversity, and it can serve as a regularizer to prior SSL frameworks.
arXiv Detail & Related papers (2023-09-21T05:12:55Z) - Adversarial Training with Complementary Labels: On the Benefit of
Gradually Informative Attacks [119.38992029332883]
Adversarial training with imperfect supervision is significant but receives limited attention.
We propose a new learning strategy using gradually informative attacks.
Experiments are conducted to demonstrate the effectiveness of our method on a range of benchmarked datasets.
arXiv Detail & Related papers (2022-11-01T04:26:45Z) - Decoupled Adversarial Contrastive Learning for Self-supervised
Adversarial Robustness [69.39073806630583]
Adversarial training (AT) for robust representation learning and self-supervised learning (SSL) for unsupervised representation learning are two active research fields.
We propose a two-stage framework termed Decoupled Adversarial Contrastive Learning (DeACL)
arXiv Detail & Related papers (2022-07-22T06:30:44Z) - Contrastive Learning with Adversarial Examples [79.39156814887133]
Contrastive learning (CL) is a popular technique for self-supervised learning (SSL) of visual representations.
This paper introduces a new family of adversarial examples for constrastive learning and using these examples to define a new adversarial training algorithm for SSL, denoted as CLAE.
arXiv Detail & Related papers (2020-10-22T20:45:10Z) - Learning with Multiple Complementary Labels [94.8064553345801]
A complementary label (CL) simply indicates an incorrect class of an example, but learning with CLs results in multi-class classifiers.
We propose a novel problem setting to allow MCLs for each example and two ways for learning with MCLs.
arXiv Detail & Related papers (2019-12-30T13:50:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.