Comparing Template-based and Template-free Language Model Probing
- URL: http://arxiv.org/abs/2402.00123v2
- Date: Wed, 30 Oct 2024 02:16:34 GMT
- Title: Comparing Template-based and Template-free Language Model Probing
- Authors: Sagi Shaier, Kevin Bennett, Lawrence E Hunter, Katharina von der Wense,
- Abstract summary: We evaluate 16 different cloze-task language models (LMs) on 10 probing English datasets.
We find that template-free and template-based approaches often rank models differently, except for the top domain-specific models.
- Score: 0.0
- License:
- Abstract: The differences between cloze-task language model (LM) probing with 1) expert-made templates and 2) naturally-occurring text have often been overlooked. Here, we evaluate 16 different LMs on 10 probing English datasets -- 4 template-based and 6 template-free -- in general and biomedical domains to answer the following research questions: (RQ1) Do model rankings differ between the two approaches? (RQ2) Do models' absolute scores differ between the two approaches? (RQ3) Do the answers to RQ1 and RQ2 differ between general and domain-specific models? Our findings are: 1) Template-free and template-based approaches often rank models differently, except for the top domain-specific models. 2) Scores decrease by up to 42% Acc@1 when comparing parallel template-free and template-based prompts. 3) Perplexity is negatively correlated with accuracy in the template-free approach, but, counter-intuitively, they are positively correlated for template-based probing. 4) Models tend to predict the same answers frequently across prompts for template-based probing, which is less common when employing template-free techniques.
Related papers
- Detection and Measurement of Syntactic Templates in Generated Text [58.111650675717414]
We offer an analysis of syntactic features to characterize general repetition in models.
We find that models tend to produce templated text in downstream tasks at a higher rate than what is found in human-reference texts.
arXiv Detail & Related papers (2024-06-28T19:34:23Z) - BvSP: Broad-view Soft Prompting for Few-Shot Aspect Sentiment Quad Prediction [10.313467662221319]
Aspect sentiment quad prediction (ASQP) aims to predict four aspect-based elements, including aspect term, opinion term, aspect category, and sentiment polarity.
This work formulates ASQP into the few-shot scenario, which aims for fast adaptation in real applications.
arXiv Detail & Related papers (2024-06-11T15:32:32Z) - Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements [10.687101698324897]
Large language models demonstrate a remarkable capability for learning to solve new tasks from a few examples.
The prompt template, or the way the input examples are formatted to obtain the prompt, is an important yet often overlooked aspect of in-context learning.
We show that a poor choice of the template can reduce the performance of the strongest models and inference methods to a random guess level.
arXiv Detail & Related papers (2024-01-12T18:58:26Z) - CALM : A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias [7.28980829208179]
Comprehensive Assessment of Language Models (CALM) for robust measurement of two types of universally relevant sociodemographic bias, gender and race.
Our empirical evaluation shows that CALM bias scores are more robust and far less sensitive than previous bias measurements to perturbations in the templates.
arXiv Detail & Related papers (2023-08-24T03:53:55Z) - Event Extraction as Question Generation and Answering [72.04433206754489]
Recent work on Event Extraction has reframed the task as Question Answering (QA)
We propose QGA-EE, which enables a Question Generation (QG) model to generate questions that incorporate rich contextual information instead of using fixed templates.
Experiments show that QGA-EE outperforms all prior single-task-based models on the ACE05 English dataset.
arXiv Detail & Related papers (2023-07-10T01:46:15Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - Explanation-based Finetuning Makes Models More Robust to Spurious Cues [21.327036110196637]
Large Language Models (LLMs) are so powerful that they sometimes learn correlations between labels and features that are irrelevant to the task.
We propose explanation-based finetuning as a general approach to mitigate LLMs' reliance on spurious correlations.
We finetune the model to additionally generate a free-text explanation supporting its answer.
arXiv Detail & Related papers (2023-05-08T18:53:45Z) - Weakly-Supervised Questions for Zero-Shot Relation Extraction [3.030622181266347]
Zero-Shot Relation Extraction (ZRE) is the task of Relation Extraction where the training and test sets have no shared relation types.
Previous approaches to ZRE reframed relation extraction as Question Answering (QA)
Here, we do away with these gold templates and instead learn a model that can generate questions for unseen relations.
arXiv Detail & Related papers (2023-01-21T22:18:24Z) - An Information-theoretic Approach to Prompt Engineering Without Ground
Truth Labels [55.06990011183662]
We introduce a new method for selecting prompt templates textitwithout labeled examples and textitwithout direct access to the model.
Across 8 datasets representing 7 distinct NLP tasks, we show that when a template has high mutual information, it also has high accuracy on the task.
arXiv Detail & Related papers (2022-03-21T21:51:43Z) - tFold-TR: Combining Deep Learning Enhanced Hybrid Potential Energy for
Template-Based Modelling Structure Refinement [53.98034511648985]
The current template-based modeling approach suffers from two important problems.
The accuracy of the distance pairs from different regions of the template varies, and this information is not well introduced into the modeling.
Two neural network models predict the distance information of the missing regions and the accuracy of the distance pairs of different regions in the template modeling structure.
arXiv Detail & Related papers (2021-05-10T13:32:12Z) - Robust Question Answering Through Sub-part Alignment [53.94003466761305]
We model question answering as an alignment problem.
We train our model on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
arXiv Detail & Related papers (2020-04-30T09:10:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.