SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized
Sequence Representations
- URL: http://arxiv.org/abs/2109.07424v1
- Date: Wed, 15 Sep 2021 16:51:18 GMT
- Title: SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized
Sequence Representations
- Authors: Hooman Sedghamiz, Shivam Raval, Enrico Santus, Tuka Alhanai, Mohammad
Ghassemi
- Abstract summary: This paper introduces SupCL-Seq, which extends the supervised contrastive learning from computer vision to the optimization of sequence representations in NLP.
We show that SupCL-Seq leads to large gains in many sequence classification tasks on the GLUE benchmark compared to a standard BERTbase.
- Score: 4.392337343771302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While contrastive learning is proven to be an effective training strategy in
computer vision, Natural Language Processing (NLP) is only recently adopting it
as a self-supervised alternative to Masked Language Modeling (MLM) for
improving sequence representations. This paper introduces SupCL-Seq, which
extends the supervised contrastive learning from computer vision to the
optimization of sequence representations in NLP. By altering the dropout mask
probability in standard Transformer architectures, for every representation
(anchor), we generate augmented altered views. A supervised contrastive loss is
then utilized to maximize the system's capability of pulling together similar
samples (e.g., anchors and their altered views) and pushing apart the samples
belonging to the other classes. Despite its simplicity, SupCLSeq leads to large
gains in many sequence classification tasks on the GLUE benchmark compared to a
standard BERTbase, including 6% absolute improvement on CoLA, 5.4% on MRPC,
4.7% on RTE and 2.6% on STSB. We also show consistent gains over self
supervised contrastively learned representations, especially in non-semantic
tasks. Finally we show that these gains are not solely due to augmentation, but
rather to a downstream optimized sequence representation. Code:
https://github.com/hooman650/SupCL-Seq
Related papers
- MAGMA: Manifold Regularization for MAEs [1.7478203318226315]
Masked Autoencoders (MAEs) are an important divide in self-supervised learning (SSL)
We introduce MAGMA, a novel batch-wide layer-wise regularization loss applied to representations of different Transformer layers.
We demonstrate that by plugging in the proposed regularization loss, one can significantly improve the performance of MAE-based models.
arXiv Detail & Related papers (2024-12-03T22:14:10Z) - Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning [29.39584492735953]
We identify representation collapse in the model's intermediate layers as a key factor limiting their reasoning capabilities.
We propose Sequential Variance-Covariance Regularization (Seq-VCR), which enhances the entropy of intermediate representations and prevents collapse.
arXiv Detail & Related papers (2024-11-04T18:14:07Z) - Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - Decoupled Contrastive Learning for Long-Tailed Recognition [58.255966442426484]
Supervised Contrastive Loss (SCL) is popular in visual representation learning.
In the scenario of long-tailed recognition, where the number of samples in each class is imbalanced, treating two types of positive samples equally leads to the biased optimization for intra-category distance.
We propose a patch-based self distillation to transfer knowledge from head to tail classes to relieve the under-representation of tail classes.
arXiv Detail & Related papers (2024-03-10T09:46:28Z) - Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training [55.12082817901671]
We propose a new self-supervised pre-training approach, named Masked and Permuted Vision Transformer (MaPeT)
MaPeT employs autoregressive and permuted predictions to capture intra-patch dependencies.
Our results demonstrate that MaPeT achieves competitive performance on ImageNet, compared to baselines and competitors under the same model setting.
arXiv Detail & Related papers (2023-06-12T18:12:19Z) - Alleviating Over-smoothing for Unsupervised Sentence Representation [96.19497378628594]
We present a Simple method named Self-Contrastive Learning (SSCL) to alleviate this issue.
Our proposed method is quite simple and can be easily extended to various state-of-the-art models for performance boosting.
arXiv Detail & Related papers (2023-05-09T11:00:02Z) - A Simplified Framework for Contrastive Learning for Node Representations [2.277447144331876]
We investigate the potential of deploying contrastive learning in combination with Graph Neural Networks for embedding nodes in a graph.
We show that the quality of the resulting embeddings and training time can be significantly improved by a simple column-wise postprocessing of the embedding matrix.
This modification yields improvements in downstream classification tasks of up to 1.5% and even beats existing state-of-the-art approaches on 6 out of 8 different benchmarks.
arXiv Detail & Related papers (2023-05-01T02:04:36Z) - Self-Supervised Learning Disentangled Group Representation as Feature [82.07737719232972]
We show that existing Self-Supervised Learning (SSL) only disentangles simple augmentation features such as rotation and colorization.
We propose an iterative SSL algorithm: Iterative Partition-based Invariant Risk Minimization (IP-IRM)
We prove that IP-IRM converges to a fully disentangled representation and show its effectiveness on various benchmarks.
arXiv Detail & Related papers (2021-10-28T16:12:33Z) - Weakly Supervised Contrastive Learning [68.47096022526927]
We introduce a weakly supervised contrastive learning framework (WCL) to tackle this issue.
WCL achieves 65% and 72% ImageNet Top-1 Accuracy using ResNet50, which is even higher than SimCLRv2 with ResNet101.
arXiv Detail & Related papers (2021-10-10T12:03:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.