Related papers: SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized Sequence Representations

SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized Sequence Representations

URL: http://arxiv.org/abs/2109.07424v1
Date: Wed, 15 Sep 2021 16:51:18 GMT
Title: SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized Sequence Representations
Authors: Hooman Sedghamiz, Shivam Raval, Enrico Santus, Tuka Alhanai, Mohammad Ghassemi
Abstract summary: This paper introduces SupCL-Seq, which extends the supervised contrastive learning from computer vision to the optimization of sequence representations in NLP. We show that SupCL-Seq leads to large gains in many sequence classification tasks on the GLUE benchmark compared to a standard BERTbase.
Score: 4.392337343771302
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While contrastive learning is proven to be an effective training strategy in computer vision, Natural Language Processing (NLP) is only recently adopting it as a self-supervised alternative to Masked Language Modeling (MLM) for improving sequence representations. This paper introduces SupCL-Seq, which extends the supervised contrastive learning from computer vision to the optimization of sequence representations in NLP. By altering the dropout mask probability in standard Transformer architectures, for every representation (anchor), we generate augmented altered views. A supervised contrastive loss is then utilized to maximize the system's capability of pulling together similar samples (e.g., anchors and their altered views) and pushing apart the samples belonging to the other classes. Despite its simplicity, SupCLSeq leads to large gains in many sequence classification tasks on the GLUE benchmark compared to a standard BERTbase, including 6% absolute improvement on CoLA, 5.4% on MRPC, 4.7% on RTE and 2.6% on STSB. We also show consistent gains over self supervised contrastively learned representations, especially in non-semantic tasks. Finally we show that these gains are not solely due to augmentation, but rather to a downstream optimized sequence representation. Code: https://github.com/hooman650/SupCL-Seq

Related papers

Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning [29.39584492735953]
We identify representation collapse in the model's intermediate layers as a key factor limiting their reasoning capabilities. We propose Sequential Variance-Covariance Regularization (Seq-VCR), which enhances the entropy of intermediate representations and prevents collapse.
arXiv Detail & Related papers (2024-11-04T18:14:07Z)
CLIP's Visual Embedding Projector is a Few-shot Cornucopia [45.93202559299953]
We introduce an alternative way for few-shot CLIP adaptation without adding ''external'' parameters to optimize. We find that simply fine-tuning the embedding projection matrix of the vision leads to better performance than all baselines. This simple approach, coined ProLIP, yields state-of-the-art performance on 11 few-shot classification benchmarks, few-shot cross-dataset encoder transfer, domain generalization, and base-to-new class generalization.
arXiv Detail & Related papers (2024-10-07T17:59:59Z)
L^2CL: Embarrassingly Simple Layer-to-Layer Contrastive Learning for Graph Collaborative Filtering [33.165094795515785]
Graph neural networks (GNNs) have recently emerged as an effective approach to model neighborhood signals in collaborative filtering. We propose L2CL, a principled Layer-to-Layer Contrastive Learning framework that contrasts representations from different layers. We find that L2CL, using only one-hop contrastive learning paradigm, is able to capture intrinsic semantic structures and improve the quality of node representation.
arXiv Detail & Related papers (2024-07-19T12:45:21Z)
Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones. In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z)
Decoupled Contrastive Learning for Long-Tailed Recognition [58.255966442426484]
Supervised Contrastive Loss (SCL) is popular in visual representation learning. In the scenario of long-tailed recognition, where the number of samples in each class is imbalanced, treating two types of positive samples equally leads to the biased optimization for intra-category distance. We propose a patch-based self distillation to transfer knowledge from head to tail classes to relieve the under-representation of tail classes.
arXiv Detail & Related papers (2024-03-10T09:46:28Z)
VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection [58.47940430618352]
We propose VadCLIP, a new paradigm for weakly supervised video anomaly detection (WSVAD) VadCLIP makes full use of fine-grained associations between vision and language on the strength of CLIP. We conduct extensive experiments on two commonly-used benchmarks, demonstrating that VadCLIP achieves the best performance on both coarse-grained and fine-grained WSVAD.
arXiv Detail & Related papers (2023-08-22T14:58:36Z)
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training [55.12082817901671]
We propose a new self-supervised pre-training approach, named Masked and Permuted Vision Transformer (MaPeT) MaPeT employs autoregressive and permuted predictions to capture intra-patch dependencies. Our results demonstrate that MaPeT achieves competitive performance on ImageNet, compared to baselines and competitors under the same model setting.
arXiv Detail & Related papers (2023-06-12T18:12:19Z)
Alleviating Over-smoothing for Unsupervised Sentence Representation [96.19497378628594]
We present a Simple method named Self-Contrastive Learning (SSCL) to alleviate this issue. Our proposed method is quite simple and can be easily extended to various state-of-the-art models for performance boosting.
arXiv Detail & Related papers (2023-05-09T11:00:02Z)
A Simplified Framework for Contrastive Learning for Node Representations [2.277447144331876]
We investigate the potential of deploying contrastive learning in combination with Graph Neural Networks for embedding nodes in a graph. We show that the quality of the resulting embeddings and training time can be significantly improved by a simple column-wise postprocessing of the embedding matrix. This modification yields improvements in downstream classification tasks of up to 1.5% and even beats existing state-of-the-art approaches on 6 out of 8 different benchmarks.
arXiv Detail & Related papers (2023-05-01T02:04:36Z)
Self-Supervised Learning Disentangled Group Representation as Feature [82.07737719232972]
We show that existing Self-Supervised Learning (SSL) only disentangles simple augmentation features such as rotation and colorization. We propose an iterative SSL algorithm: Iterative Partition-based Invariant Risk Minimization (IP-IRM) We prove that IP-IRM converges to a fully disentangled representation and show its effectiveness on various benchmarks.
arXiv Detail & Related papers (2021-10-28T16:12:33Z)
Weakly Supervised Contrastive Learning [68.47096022526927]
We introduce a weakly supervised contrastive learning framework (WCL) to tackle this issue. WCL achieves 65% and 72% ImageNet Top-1 Accuracy using ResNet50, which is even higher than SimCLRv2 with ResNet101.
arXiv Detail & Related papers (2021-10-10T12:03:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.