BERTs are Generative In-Context Learners
- URL: http://arxiv.org/abs/2406.04823v1
- Date: Fri, 7 Jun 2024 10:48:45 GMT
- Title: BERTs are Generative In-Context Learners
- Authors: David Samuel,
- Abstract summary: We present an embarrassingly simple inference technique that enables DeBERTa to operate as a generative model without any additional training.
Our findings demonstrate that DeBERTa can match and even surpass GPT-3, its contemporary that famously introduced the paradigm of in-context learning.
- Score: 5.121744234312891
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper explores the in-context learning capabilities of masked language models, challenging the common view that this ability does not 'emerge' in them. We present an embarrassingly simple inference technique that enables DeBERTa to operate as a generative model without any additional training. Our findings demonstrate that DeBERTa can match and even surpass GPT-3, its contemporary that famously introduced the paradigm of in-context learning. The comparative analysis reveals that the masked and causal language models behave very differently, as they clearly outperform each other on different categories of tasks. This suggests that there is great potential for a hybrid training approach that takes advantage of the strengths of both training objectives.
Related papers
- Leveraging Large Language Models to Generate Course-specific Semantically Annotated Learning Objects [2.1845291030915974]
Recent progress in generative natural language models has opened up new potential in the generation of educational content.
This paper explores the potential of large language models for generating computer science questions that are sufficiently annotated for automatic learner model updates.
arXiv Detail & Related papers (2024-12-05T14:24:07Z) - Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - Erasing Conceptual Knowledge from Language Models [24.63143961814566]
We introduce Erasure of Language Memory (ELM), a principled approach to concept-level unlearning.<n>ELM operates by matching distributions defined by the model's own introspective classification capabilities.<n>We demonstrate ELM's efficacy on biosecurity, cybersecurity, and literary domain erasure tasks.
arXiv Detail & Related papers (2024-10-03T17:59:30Z) - Auto-ICL: In-Context Learning without Human Supervision [93.05202223767463]
We propose Automatic In-Context Learning framework that enables the model to autonomously generate examples and instructions for problem-solving.
With experiments across various models and datasets, results show that model-generated contexts outperform human-annotated contexts.
arXiv Detail & Related papers (2023-11-15T07:37:28Z) - Explanation-aware Soft Ensemble Empowers Large Language Model In-context
Learning [50.00090601424348]
Large language models (LLMs) have shown remarkable capabilities in various natural language understanding tasks.
We propose EASE, an Explanation-Aware Soft Ensemble framework to empower in-context learning with LLMs.
arXiv Detail & Related papers (2023-11-13T06:13:38Z) - RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling.
Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training.
Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - Switch-BERT: Learning to Model Multimodal Interactions by Switching
Attention and Input [27.102030262319197]
We present textbfSwitch-BERT for joint vision and language representation learning to address the problem of modality mismatch.
Switch-BERT extends BERT architecture by introducing learnable layer-wise and cross-layer interactions.
Results confirm that, whereas alternative architectures including ViLBERT and UNITER may excel in particular tasks, Switch-BERT can consistently achieve better or comparable performances.
arXiv Detail & Related papers (2023-06-25T09:28:40Z) - Feature Interactions Reveal Linguistic Structure in Language Models [2.0178765779788495]
We study feature interactions in the context of feature attribution methods for post-hoc interpretability.
We work out a grey box methodology, in which we train models to perfection on a formal language classification task.
We show that under specific configurations, some methods are indeed able to uncover the grammatical rules acquired by a model.
arXiv Detail & Related papers (2023-06-21T11:24:41Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Representing Knowledge by Spans: A Knowledge-Enhanced Model for
Information Extraction [7.077412533545456]
We propose a new pre-trained model that learns representations of both entities and relationships simultaneously.
By encoding spans efficiently with span modules, our model can represent both entities and their relationships but requires fewer parameters than existing models.
arXiv Detail & Related papers (2022-08-20T07:32:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.