The language of prompting: What linguistic properties make a prompt
successful?
- URL: http://arxiv.org/abs/2311.01967v1
- Date: Fri, 3 Nov 2023 15:03:36 GMT
- Title: The language of prompting: What linguistic properties make a prompt
successful?
- Authors: Alina Leidinger, Robert van Rooij, Ekaterina Shutova
- Abstract summary: LLMs can be prompted to achieve impressive zero-shot or few-shot performance in many NLP tasks.
Yet, we still lack a systematic understanding of how linguistic properties of prompts correlate with task performance.
We investigate both grammatical properties such as mood, tense, aspect and modality, as well as lexico-semantic variation through the use of synonyms.
- Score: 13.034603322224548
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The latest generation of LLMs can be prompted to achieve impressive zero-shot
or few-shot performance in many NLP tasks. However, since performance is highly
sensitive to the choice of prompts, considerable effort has been devoted to
crowd-sourcing prompts or designing methods for prompt optimisation. Yet, we
still lack a systematic understanding of how linguistic properties of prompts
correlate with task performance. In this work, we investigate how LLMs of
different sizes, pre-trained and instruction-tuned, perform on prompts that are
semantically equivalent, but vary in linguistic structure. We investigate both
grammatical properties such as mood, tense, aspect and modality, as well as
lexico-semantic variation through the use of synonyms. Our findings contradict
the common assumption that LLMs achieve optimal performance on lower perplexity
prompts that reflect language use in pretraining or instruction-tuning data.
Prompts transfer poorly between datasets or models, and performance cannot
generally be explained by perplexity, word frequency, ambiguity or prompt
length. Based on our results, we put forward a proposal for a more robust and
comprehensive evaluation standard for prompting research.
Related papers
- Rolling the DICE on Idiomaticity: How LLMs Fail to Grasp Context [12.781022584125925]
We construct a novel, controlled contrastive dataset designed to test whether LLMs can effectively use context to disambiguate idiomatic meaning.
Our findings reveal that LLMs often fail to resolve idiomaticity when it is required to attend to the surrounding context.
We make our code and dataset publicly available.
arXiv Detail & Related papers (2024-10-21T14:47:37Z) - Unveiling the Lexical Sensitivity of LLMs: Combinatorial Optimization for Prompt Enhancement [11.363521189714504]
We show that large language models (LLMs) are over-sensitive to lexical variations in task instructions.
We propose a black-box Combinatorial Optimization framework for Prompt Lexical Enhancement (COPLE)
arXiv Detail & Related papers (2024-05-31T08:53:59Z) - Helping Language Models Learn More: Multi-dimensional Task Prompt for
Few-shot Tuning [36.14688633670085]
We propose MTPrompt, a multi-dimensional task prompt learning method based on task-related object, summary, and task description information.
By automatically building and searching for appropriate prompts, our proposed MTPrompt achieves the best results on few-shot samples setting and five different datasets.
arXiv Detail & Related papers (2023-12-13T10:00:44Z) - AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations [52.43593893122206]
Alignedcot is an in-context learning technique for invoking Large Language Models.
It achieves consistent and correct step-wise prompts in zero-shot scenarios.
We conduct experiments on mathematical reasoning and commonsense reasoning.
arXiv Detail & Related papers (2023-11-22T17:24:21Z) - Is Prompt-Based Finetuning Always Better than Vanilla Finetuning?
Insights from Cross-Lingual Language Understanding [0.30586855806896046]
We propose the ProFiT pipeline to investigate the cross-lingual capabilities of Prompt-based Finetuning.
Our results reveal the effectiveness and versatility of prompt-based finetuning in cross-lingual language understanding.
arXiv Detail & Related papers (2023-07-15T20:33:33Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs.
Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Explaining Patterns in Data with Language Models via Interpretable
Autoprompting [143.4162028260874]
We introduce interpretable autoprompting (iPrompt), an algorithm that generates a natural-language string explaining the data.
iPrompt can yield meaningful insights by accurately finding groundtruth dataset descriptions.
Experiments with an fMRI dataset show the potential for iPrompt to aid in scientific discovery.
arXiv Detail & Related papers (2022-10-04T18:32:14Z) - RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning [84.75064077323098]
This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL)
RLPrompt is flexibly applicable to different types of LMs, such as masked gibberish (e.g., grammaBERT) and left-to-right models (e.g., GPTs)
Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods.
arXiv Detail & Related papers (2022-05-25T07:50:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.