Contrastive Learning for Prompt-Based Few-Shot Language Learners
- URL: http://arxiv.org/abs/2205.01308v1
- Date: Tue, 3 May 2022 04:56:45 GMT
- Title: Contrastive Learning for Prompt-Based Few-Shot Language Learners
- Authors: Yiren Jian and Chongyang Gao and Soroush Vosoughi
- Abstract summary: We present a contrastive learning framework that clusters inputs from the same class under different augmented "views"
We create different "views" of an example by appending it with different language prompts and contextual demonstrations.
Our method can improve over the state-of-the-art methods in a diverse set of 15 language tasks.
- Score: 14.244787327283335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The impressive performance of GPT-3 using natural language prompts and
in-context learning has inspired work on better fine-tuning of moderately-sized
models under this paradigm. Following this line of work, we present a
contrastive learning framework that clusters inputs from the same class for
better generality of models trained with only limited examples. Specifically,
we propose a supervised contrastive framework that clusters inputs from the
same class under different augmented "views" and repel the ones from different
classes. We create different "views" of an example by appending it with
different language prompts and contextual demonstrations. Combining a
contrastive loss with the standard masked language modeling (MLM) loss in
prompt-based few-shot learners, the experimental results show that our method
can improve over the state-of-the-art methods in a diverse set of 15 language
tasks. Our framework makes minimal assumptions on the task or the base model,
and can be applied to many recent methods with little modification. The code
will be made available at: https://github.com/yiren-jian/LM-SupCon.
Related papers
- Improving Visual Commonsense in Language Models via Multiple Image Generation [41.565399860320966]
Existing large language models (LLMs) are primarily trained using textual data only.
Visual Language Models, which excel at visually-oriented tasks, often fail at non-visual tasks such as basic commonsense reasoning.
This divergence highlights a critical challenge - the integration of robust visual understanding with foundational text-based language reasoning.
arXiv Detail & Related papers (2024-06-19T15:17:10Z) - Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - FILM: How can Few-Shot Image Classification Benefit from Pre-Trained
Language Models? [14.582209994281374]
Few-shot learning aims to train models that can be generalized to novel classes with only a few samples.
We propose a novel few-shot learning framework that uses pre-trained language models based on contrastive learning.
arXiv Detail & Related papers (2023-07-09T08:07:43Z) - Alleviating Over-smoothing for Unsupervised Sentence Representation [96.19497378628594]
We present a Simple method named Self-Contrastive Learning (SSCL) to alleviate this issue.
Our proposed method is quite simple and can be easily extended to various state-of-the-art models for performance boosting.
arXiv Detail & Related papers (2023-05-09T11:00:02Z) - On the Compositional Generalization Gap of In-Context Learning [73.09193595292233]
We look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning.
We evaluate four model families, OPT, BLOOM, CodeGen and Codex on three semantic parsing datasets.
arXiv Detail & Related papers (2022-11-15T19:56:37Z) - CPL: Counterfactual Prompt Learning for Vision and Language Models [76.18024920393245]
This paper presents a novel underlinetextbfCounterfactual underlinetextbfPrompt underlinetextbfLearning (CPL) method for vision and language models.
CPL simultaneously employs counterfactual generation and contrastive learning in a joint optimization framework.
Experiments demonstrate that CPL can obtain superior few-shot performance on different vision and language tasks.
arXiv Detail & Related papers (2022-10-19T08:06:39Z) - MaPLe: Multi-modal Prompt Learning [54.96069171726668]
We propose Multi-modal Prompt Learning (MaPLe) for both vision and language branches to improve alignment between the vision and language representations.
Compared with the state-of-the-art method Co-CoOp, MaPLe exhibits favorable performance and achieves an absolute gain of 3.45% on novel classes.
arXiv Detail & Related papers (2022-10-06T17:59:56Z) - Multi-Modal Few-Shot Object Detection with Meta-Learning-Based
Cross-Modal Prompting [77.69172089359606]
We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection.
Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning.
We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.
arXiv Detail & Related papers (2022-04-16T16:45:06Z) - CoLLIE: Continual Learning of Language Grounding from Language-Image
Embeddings [2.8478710949588284]
CoLLIE is a model for continual learning of how language is grounded in vision.
It learns a transformation function that adjusts the language embeddings when needed to accommodate new language use.
We show that CoLLIE can efficiently learn and generalize from only a few examples.
arXiv Detail & Related papers (2021-11-15T18:54:58Z) - Multimodal Few-Shot Learning with Frozen Language Models [36.75551859968596]
We train a vision encoder to represent each image as a sequence of continuous embeddings, such that a pre-trained, frozen language model prompted with this prefix generates the appropriate caption.
The resulting system is a multimodal few-shot learner, with the surprising ability to learn a variety of new tasks when conditioned on examples.
arXiv Detail & Related papers (2021-06-25T21:07:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.