SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized
Sequence Representations
- URL: http://arxiv.org/abs/2109.07424v1
- Date: Wed, 15 Sep 2021 16:51:18 GMT
- Title: SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized
Sequence Representations
- Authors: Hooman Sedghamiz, Shivam Raval, Enrico Santus, Tuka Alhanai, Mohammad
Ghassemi
- Abstract summary: This paper introduces SupCL-Seq, which extends the supervised contrastive learning from computer vision to the optimization of sequence representations in NLP.
We show that SupCL-Seq leads to large gains in many sequence classification tasks on the GLUE benchmark compared to a standard BERTbase.
- Score: 4.392337343771302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While contrastive learning is proven to be an effective training strategy in
computer vision, Natural Language Processing (NLP) is only recently adopting it
as a self-supervised alternative to Masked Language Modeling (MLM) for
improving sequence representations. This paper introduces SupCL-Seq, which
extends the supervised contrastive learning from computer vision to the
optimization of sequence representations in NLP. By altering the dropout mask
probability in standard Transformer architectures, for every representation
(anchor), we generate augmented altered views. A supervised contrastive loss is
then utilized to maximize the system's capability of pulling together similar
samples (e.g., anchors and their altered views) and pushing apart the samples
belonging to the other classes. Despite its simplicity, SupCLSeq leads to large
gains in many sequence classification tasks on the GLUE benchmark compared to a
standard BERTbase, including 6% absolute improvement on CoLA, 5.4% on MRPC,
4.7% on RTE and 2.6% on STSB. We also show consistent gains over self
supervised contrastively learned representations, especially in non-semantic
tasks. Finally we show that these gains are not solely due to augmentation, but
rather to a downstream optimized sequence representation. Code:
https://github.com/hooman650/SupCL-Seq
Related papers
- Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning [29.39584492735953]
We identify representation collapse in the model's intermediate layers as a key factor limiting their reasoning capabilities.
We propose Sequential Variance-Covariance Regularization (Seq-VCR), which enhances the entropy of intermediate representations and prevents collapse.
arXiv Detail & Related papers (2024-11-04T18:14:07Z) - L^2CL: Embarrassingly Simple Layer-to-Layer Contrastive Learning for Graph Collaborative Filtering [33.165094795515785]
Graph neural networks (GNNs) have recently emerged as an effective approach to model neighborhood signals in collaborative filtering.
We propose L2CL, a principled Layer-to-Layer Contrastive Learning framework that contrasts representations from different layers.
We find that L2CL, using only one-hop contrastive learning paradigm, is able to capture intrinsic semantic structures and improve the quality of node representation.
arXiv Detail & Related papers (2024-07-19T12:45:21Z) - Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - Decoupled Contrastive Learning for Long-Tailed Recognition [58.255966442426484]
Supervised Contrastive Loss (SCL) is popular in visual representation learning.
In the scenario of long-tailed recognition, where the number of samples in each class is imbalanced, treating two types of positive samples equally leads to the biased optimization for intra-category distance.
We propose a patch-based self distillation to transfer knowledge from head to tail classes to relieve the under-representation of tail classes.
arXiv Detail & Related papers (2024-03-10T09:46:28Z) - VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video
Anomaly Detection [58.47940430618352]
We propose VadCLIP, a new paradigm for weakly supervised video anomaly detection (WSVAD)
VadCLIP makes full use of fine-grained associations between vision and language on the strength of CLIP.
We conduct extensive experiments on two commonly-used benchmarks, demonstrating that VadCLIP achieves the best performance on both coarse-grained and fine-grained WSVAD.
arXiv Detail & Related papers (2023-08-22T14:58:36Z) - Alleviating Over-smoothing for Unsupervised Sentence Representation [96.19497378628594]
We present a Simple method named Self-Contrastive Learning (SSCL) to alleviate this issue.
Our proposed method is quite simple and can be easily extended to various state-of-the-art models for performance boosting.
arXiv Detail & Related papers (2023-05-09T11:00:02Z) - A Simplified Framework for Contrastive Learning for Node Representations [2.277447144331876]
We investigate the potential of deploying contrastive learning in combination with Graph Neural Networks for embedding nodes in a graph.
We show that the quality of the resulting embeddings and training time can be significantly improved by a simple column-wise postprocessing of the embedding matrix.
This modification yields improvements in downstream classification tasks of up to 1.5% and even beats existing state-of-the-art approaches on 6 out of 8 different benchmarks.
arXiv Detail & Related papers (2023-05-01T02:04:36Z) - Self-Supervised Learning Disentangled Group Representation as Feature [82.07737719232972]
We show that existing Self-Supervised Learning (SSL) only disentangles simple augmentation features such as rotation and colorization.
We propose an iterative SSL algorithm: Iterative Partition-based Invariant Risk Minimization (IP-IRM)
We prove that IP-IRM converges to a fully disentangled representation and show its effectiveness on various benchmarks.
arXiv Detail & Related papers (2021-10-28T16:12:33Z) - Weakly Supervised Contrastive Learning [68.47096022526927]
We introduce a weakly supervised contrastive learning framework (WCL) to tackle this issue.
WCL achieves 65% and 72% ImageNet Top-1 Accuracy using ResNet50, which is even higher than SimCLRv2 with ResNet101.
arXiv Detail & Related papers (2021-10-10T12:03:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.