Don't Judge a Language Model by Its Last Layer: Contrastive Learning
with Layer-Wise Attention Pooling
- URL: http://arxiv.org/abs/2209.05972v1
- Date: Tue, 13 Sep 2022 13:09:49 GMT
- Title: Don't Judge a Language Model by Its Last Layer: Contrastive Learning
with Layer-Wise Attention Pooling
- Authors: Dongsuk Oh, Yejin Kim, Hodong Lee, H. Howie Huang and Heuiseok Lim
- Abstract summary: Recent pre-trained language models (PLMs) achieved great success on many natural language processing tasks through learning linguistic features and contextualized sentence representation.
This paper introduces the attention-based pooling strategy, which enables the model to preserve layer-wise signals captured in each layer and learn digested linguistic features for downstream tasks.
- Score: 6.501126898523172
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent pre-trained language models (PLMs) achieved great success on many
natural language processing tasks through learning linguistic features and
contextualized sentence representation. Since attributes captured in stacked
layers of PLMs are not clearly identified, straightforward approaches such as
embedding the last layer are commonly preferred to derive sentence
representations from PLMs. This paper introduces the attention-based pooling
strategy, which enables the model to preserve layer-wise signals captured in
each layer and learn digested linguistic features for downstream tasks. The
contrastive learning objective can adapt the layer-wise attention pooling to
both unsupervised and supervised manners. It results in regularizing the
anisotropic space of pre-trained embeddings and being more uniform. We evaluate
our model on standard semantic textual similarity (STS) and semantic search
tasks. As a result, our method improved the performance of the base contrastive
learned BERT_base and variants.
Related papers
- Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics [50.982315553104975]
We investigate the bottom-up evolution of lexical semantics for a popular large language model, namely Llama2.
Our experiments show that the representations in lower layers encode lexical semantics, while the higher layers, with weaker semantic induction, are responsible for prediction.
This is in contrast to models with discriminative objectives, such as mask language modeling, where the higher layers obtain better lexical semantics.
arXiv Detail & Related papers (2024-03-03T13:14:47Z) - Breaking Down Word Semantics from Pre-trained Language Models through
Layer-wise Dimension Selection [0.0]
This paper aims to disentangle semantic sense from BERT by applying a binary mask to middle outputs across the layers.
The disentangled embeddings are evaluated through binary classification to determine if the target word in two different sentences has the same meaning.
arXiv Detail & Related papers (2023-10-08T11:07:19Z) - Prompting classes: Exploring the Power of Prompt Class Learning in
Weakly Supervised Semantic Segmentation [15.467510304266883]
We study the impact of prompt tuning on weakly supervised semantic segmentation.
We introduce a novel approach based on a PrOmpt cLass lEarning (POLE) strategy.
We demonstrate that our simple, yet efficient approach achieves SOTA performance in a well-known WSSS benchmark.
arXiv Detail & Related papers (2023-06-30T19:25:18Z) - Alleviating Over-smoothing for Unsupervised Sentence Representation [96.19497378628594]
We present a Simple method named Self-Contrastive Learning (SSCL) to alleviate this issue.
Our proposed method is quite simple and can be easily extended to various state-of-the-art models for performance boosting.
arXiv Detail & Related papers (2023-05-09T11:00:02Z) - Probing for Understanding of English Verb Classes and Alternations in
Large Pre-trained Language Models [4.243426191555036]
We investigate the extent to which verb alternation classes are encoded in the embeddings of Large Pre-trained Language Models.
We find that contextual embeddings from PLMs achieve astonishingly high accuracies on tasks across most classes.
arXiv Detail & Related papers (2022-09-11T08:04:40Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of
Pre-trained Language Models [59.49705076369856]
We introduce a novel framework to improve the fine-tuning phase of pre-trained language models (PLMs)
We retrieve positive and negative instances from large-scale unlabeled corpora according to their domain-level and class-level semantic relatedness to a task.
We then perform contrastive semi-supervised learning on both the retrieved unlabeled and original labeled instances to help PLMs capture crucial task-related semantic features.
arXiv Detail & Related papers (2021-02-07T09:27:26Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.