Related papers: The language of prompting: What linguistic properties make a prompt successful?

The language of prompting: What linguistic properties make a prompt successful?

URL: http://arxiv.org/abs/2311.01967v1
Date: Fri, 3 Nov 2023 15:03:36 GMT
Title: The language of prompting: What linguistic properties make a prompt successful?
Authors: Alina Leidinger, Robert van Rooij, Ekaterina Shutova
Abstract summary: LLMs can be prompted to achieve impressive zero-shot or few-shot performance in many NLP tasks. Yet, we still lack a systematic understanding of how linguistic properties of prompts correlate with task performance. We investigate both grammatical properties such as mood, tense, aspect and modality, as well as lexico-semantic variation through the use of synonyms.
Score: 13.034603322224548
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The latest generation of LLMs can be prompted to achieve impressive zero-shot or few-shot performance in many NLP tasks. However, since performance is highly sensitive to the choice of prompts, considerable effort has been devoted to crowd-sourcing prompts or designing methods for prompt optimisation. Yet, we still lack a systematic understanding of how linguistic properties of prompts correlate with task performance. In this work, we investigate how LLMs of different sizes, pre-trained and instruction-tuned, perform on prompts that are semantically equivalent, but vary in linguistic structure. We investigate both grammatical properties such as mood, tense, aspect and modality, as well as lexico-semantic variation through the use of synonyms. Our findings contradict the common assumption that LLMs achieve optimal performance on lower perplexity prompts that reflect language use in pretraining or instruction-tuning data. Prompts transfer poorly between datasets or models, and performance cannot generally be explained by perplexity, word frequency, ambiguity or prompt length. Based on our results, we put forward a proposal for a more robust and comprehensive evaluation standard for prompting research.

Related papers

Do LLMs Need to Think in One Language? Correlation between Latent Language and Task Performance [38.785363522684385]
We study how the discrepancy between the latent language and the input and output language affects downstream task performance.<n>Our work varies the input prompt languages across multiple downstream tasks and analyzes the correlation between consistency in latent language and task performance.<n> Experimental results indicate that maintaining consistency in latent language is not always necessary for optimal downstream task performance.
arXiv Detail & Related papers (2025-05-27T17:30:57Z)
Linguistic Blind Spots of Large Language Models [14.755831733659699]
We study the performance of recent large language models (LLMs) on linguistic annotation tasks. We find that recent LLMs show limited efficacy in addressing linguistic queries and often struggle with linguistically complex inputs. Our results provide insights to inform future advancements in LLM design and development.
arXiv Detail & Related papers (2025-03-25T01:47:13Z)
Rolling the DICE on Idiomaticity: How LLMs Fail to Grasp Context [12.781022584125925]
We construct a novel, controlled contrastive dataset designed to test whether LLMs can effectively use context to disambiguate idiomatic meaning. Our findings reveal that LLMs often fail to resolve idiomaticity when it is required to attend to the surrounding context. We make our code and dataset publicly available.
arXiv Detail & Related papers (2024-10-21T14:47:37Z)
Unveiling the Lexical Sensitivity of LLMs: Combinatorial Optimization for Prompt Enhancement [11.363521189714504]
We show that large language models (LLMs) are over-sensitive to lexical variations in task instructions. We propose a black-box Combinatorial Optimization framework for Prompt Lexical Enhancement (COPLE)
arXiv Detail & Related papers (2024-05-31T08:53:59Z)
Decomposed Prompting: Probing Multilingual Linguistic Structure Knowledge in Large Language Models [54.58989938395976]
We introduce a decomposed prompting approach for sequence labeling tasks.<n>We test our method on the Universal Dependencies part-of-speech tagging dataset for 38 languages.
arXiv Detail & Related papers (2024-02-28T15:15:39Z)
Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning [36.14688633670085]
We propose MTPrompt, a multi-dimensional task prompt learning method based on task-related object, summary, and task description information. By automatically building and searching for appropriate prompts, our proposed MTPrompt achieves the best results on few-shot samples setting and five different datasets.
arXiv Detail & Related papers (2023-12-13T10:00:44Z)
AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations [52.43593893122206]
Alignedcot is an in-context learning technique for invoking Large Language Models. It achieves consistent and correct step-wise prompts in zero-shot scenarios. We conduct experiments on mathematical reasoning and commonsense reasoning.
arXiv Detail & Related papers (2023-11-22T17:24:21Z)
Is Prompt-Based Finetuning Always Better than Vanilla Finetuning? Insights from Cross-Lingual Language Understanding [0.30586855806896046]
We propose the ProFiT pipeline to investigate the cross-lingual capabilities of Prompt-based Finetuning. Our results reveal the effectiveness and versatility of prompt-based finetuning in cross-lingual language understanding.
arXiv Detail & Related papers (2023-07-15T20:33:33Z)
Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions. This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z)
OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs. Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z)
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial. We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments. The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z)
Explaining Patterns in Data with Language Models via Interpretable Autoprompting [143.4162028260874]
We introduce interpretable autoprompting (iPrompt), an algorithm that generates a natural-language string explaining the data. iPrompt can yield meaningful insights by accurately finding groundtruth dataset descriptions. Experiments with an fMRI dataset show the potential for iPrompt to aid in scientific discovery.
arXiv Detail & Related papers (2022-10-04T18:32:14Z)
RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning [84.75064077323098]
This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL) RLPrompt is flexibly applicable to different types of LMs, such as masked gibberish (e.g., grammaBERT) and left-to-right models (e.g., GPTs) Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods.
arXiv Detail & Related papers (2022-05-25T07:50:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.