Continuous Entailment Patterns for Lexical Inference in Context
- URL: http://arxiv.org/abs/2109.03695v1
- Date: Wed, 8 Sep 2021 14:57:00 GMT
- Title: Continuous Entailment Patterns for Lexical Inference in Context
- Authors: Martin Schmitt and Hinrich Sch\"utze
- Abstract summary: A pretrained language model (PLM) with textual patterns has been shown to help in both zero- and few-shot settings.
For zero-shot performance, it makes sense to design patterns that closely resemble the text seen during self-supervised pretraining because the model has never seen anything else.
Supervised training allows for more flexibility. If we allow for tokens outside the PLM's vocabulary, patterns can be adapted more flexibly to a PLM's idiosyncrasies.
- Score: 4.581468205348204
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Combining a pretrained language model (PLM) with textual patterns has been
shown to help in both zero- and few-shot settings. For zero-shot performance,
it makes sense to design patterns that closely resemble the text seen during
self-supervised pretraining because the model has never seen anything else.
Supervised training allows for more flexibility. If we allow for tokens outside
the PLM's vocabulary, patterns can be adapted more flexibly to a PLM's
idiosyncrasies. Contrasting patterns where a "token" can be any continuous
vector vs. those where a discrete choice between vocabulary elements has to be
made, we call our method CONtinuous pAtterNs (CONAN). We evaluate CONAN on two
established benchmarks for lexical inference in context (LIiC) a.k.a. predicate
entailment, a challenging natural language understanding task with relatively
small training sets. In a direct comparison with discrete patterns, CONAN
consistently leads to improved performance, setting a new state of the art. Our
experiments give valuable insights into the kind of pattern that enhances a
PLM's performance on LIiC and raise important questions regarding our
understanding of PLMs using text patterns.
Related papers
- Contrastive Instruction Tuning [61.97704869248903]
We propose Contrastive Instruction Tuning to maximize the similarity between semantically equivalent instruction-instance pairs.
Experiments on the PromptBench benchmark show that CoIN consistently improves LLMs' robustness to unseen instructions with variations across character, word, sentence, and semantic levels by an average of +2.5% in accuracy.
arXiv Detail & Related papers (2024-02-17T00:09:32Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - In-Context Probing: Toward Building Robust Classifiers via Probing Large
Language Models [5.5089506884366735]
In this paper, we propose an alternative approach, which we term In-Context Probing (ICP)
Similar to in-context learning, we contextualize the representation of the input with an instruction, but instead of decoding the output prediction, we probe the contextualized representation to predict the label.
We show that ICP performs competitive or superior to finetuning and can be particularly helpful to build classifiers on top of smaller models.
arXiv Detail & Related papers (2023-05-23T15:43:04Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - Contrastive Learning for Prompt-Based Few-Shot Language Learners [14.244787327283335]
We present a contrastive learning framework that clusters inputs from the same class under different augmented "views"
We create different "views" of an example by appending it with different language prompts and contextual demonstrations.
Our method can improve over the state-of-the-art methods in a diverse set of 15 language tasks.
arXiv Detail & Related papers (2022-05-03T04:56:45Z) - Language modeling via stochastic processes [30.796382023812022]
Modern language models can generate high-quality short texts, but often meander or are incoherent when generating longer texts.
Recent work in self-supervised learning suggests that models can learn good latent representations via contrastive learning.
We propose one approach for leveraging constrastive representations, which we call Time Control.
arXiv Detail & Related papers (2022-03-21T22:13:53Z) - Learning to Ask Conversational Questions by Optimizing Levenshtein
Distance [83.53855889592734]
We introduce a Reinforcement Iterative Sequence Editing (RISE) framework that optimize the minimum Levenshtein distance (MLD) through explicit editing actions.
RISE is able to pay attention to tokens that are related to conversational characteristics.
Experimental results on two benchmark datasets show that RISE significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-06-30T08:44:19Z) - COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences.
COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences.
Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z) - Syntax-Enhanced Pre-trained Model [49.1659635460369]
We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa.
Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages.
We present a model that utilizes the syntax of text in both pre-training and fine-tuning stages.
arXiv Detail & Related papers (2020-12-28T06:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.