ORCA: Interpreting Prompted Language Models via Locating Supporting Data
  Evidence in the Ocean of Pretraining Data
        - URL: http://arxiv.org/abs/2205.12600v1
- Date: Wed, 25 May 2022 09:25:06 GMT
- Title: ORCA: Interpreting Prompted Language Models via Locating Supporting Data
  Evidence in the Ocean of Pretraining Data
- Authors: Xiaochuang Han and Yulia Tsvetkov
- Abstract summary: Large pretrained language models have been performing increasingly well in a variety of downstream tasks via prompting.
It remains unclear from where the model learns the task-specific knowledge, especially in a zero-shot setup.
In this work, we want to find evidence of the model's task-specific competence from pretraining and are specifically interested in locating a very small subset of pretraining data.
- Score: 38.20984369410193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Large pretrained language models have been performing increasingly well in a
variety of downstream tasks via prompting. However, it remains unclear from
where the model learns the task-specific knowledge, especially in a zero-shot
setup. In this work, we want to find evidence of the model's task-specific
competence from pretraining and are specifically interested in locating a very
small subset of pretraining data that directly supports the model in the task.
We call such a subset supporting data evidence and propose a novel method ORCA
to effectively identify it, by iteratively using gradient information related
to the downstream task. This supporting data evidence offers interesting
insights about the prompted language models: in the tasks of sentiment analysis
and textual entailment, BERT shows a substantial reliance on BookCorpus, the
smaller corpus of BERT's two pretraining corpora, as well as on pretraining
examples that mask out synonyms to the task verbalizers.
 
      
        Related papers
        - Boosting Short Text Classification with Multi-Source Information   Exploration and Dual-Level Contrastive Learning [12.377363857246602]
 We propose a novel model named MI-DELIGHT for short text classification.
It first performs multi-source information exploration to alleviate the sparsity issues.
Then, the graph learning approach is adopted to learn the representation of short texts.
 arXiv  Detail & Related papers  (2025-01-16T00:26:15Z)
- Understanding In-Context Learning via Supportive Pretraining Data [55.648777340129364]
 In-context learning (ICL) improves language models' performance on a variety of NLP tasks by simply demonstrating a handful of examples at inference time.
It is not well understood why ICL ability emerges, as the model has never been specifically trained on such demonstrations.
Our work takes a first step towards understanding ICL via analyzing instance-level pretraining data.
 arXiv  Detail & Related papers  (2023-06-26T22:14:04Z)
- What does BERT learn about prosody? [1.1548853370822343]
 We study whether prosody is part of the structural information of the language that models learn.
Our results show that information about prosodic prominence spans across many layers but is mostly focused in middle layers suggesting that BERT relies mostly on syntactic and semantic information.
 arXiv  Detail & Related papers  (2023-04-25T10:34:56Z)
- AnnoLLM: Making Large Language Models to Be Better Crowdsourced   Annotators [98.11286353828525]
 GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
 arXiv  Detail & Related papers  (2023-03-29T17:03:21Z)
- Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
 We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
 Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
 arXiv  Detail & Related papers  (2022-10-23T00:37:08Z)
- SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation
  on Natural Speech [44.68649535280397]
 We propose a suite of benchmark tasks for Spoken Language Understanding Evaluation (SLUE)
SLUE consists of limited-size labeled training sets and corresponding evaluation sets.
We present the first phase of the SLUE benchmark suite, consisting of named entity recognition, sentiment analysis, and ASR on the corresponding datasets.
We provide new transcriptions and annotations on subsets of the VoxCeleb and VoxPopuli datasets, evaluation metrics and results for baseline models, and an open-source toolkit to reproduce the baselines and evaluate new models.
 arXiv  Detail & Related papers  (2021-11-19T18:59:23Z)
- An Explanation of In-context Learning as Implicit Bayesian Inference [117.19809377740188]
 We study the role of the pretraining distribution on the emergence of in-context learning.
We prove that in-context learning occurs implicitly via Bayesian inference of the latent concept.
We empirically find that scaling model size improves in-context accuracy even when the pretraining loss is the same.
 arXiv  Detail & Related papers  (2021-11-03T09:12:33Z)
- mLUKE: The Power of Entity Representations in Multilingual Pretrained
  Language Models [15.873069955407406]
 We train a multilingual language model with 24 languages with entity representations.
We show the model consistently outperforms word-based pretrained models in various cross-lingual transfer tasks.
We also evaluate the model with a multilingual cloze prompt task with the mLAMA dataset.
 arXiv  Detail & Related papers  (2021-10-15T15:28:38Z)
- Evaluating Document Coherence Modelling [37.287725949616934]
 We examine the performance of a broad range of pretrained LMs on a sentence intrusion detection task for English.
Our experiments show that pretrained LMs perform impressively in in-domain evaluation, but experience a substantial drop in the cross-domain setting.
 arXiv  Detail & Related papers  (2021-03-18T10:05:06Z)
- Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
 We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
 arXiv  Detail & Related papers  (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.