Instruction-following Evaluation through Verbalizer Manipulation
- URL: http://arxiv.org/abs/2307.10558v2
- Date: Tue, 2 Apr 2024 04:38:21 GMT
- Title: Instruction-following Evaluation through Verbalizer Manipulation
- Authors: Shiyang Li, Jun Yan, Hai Wang, Zheng Tang, Xiang Ren, Vijay Srinivasan, Hongxia Jin,
- Abstract summary: We propose a novel instruction-following evaluation protocol called verbalizer manipulation.
It instructs the model to verbalize the task label with words aligning with model priors to different extents.
We observe that the instruction-following abilities of models, across different families and scales, are significantly distinguished by their performance on less natural verbalizers.
- Score: 64.73188776428799
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While instruction-tuned models have shown remarkable success in various natural language processing tasks, accurately evaluating their ability to follow instructions remains challenging. Existing benchmarks primarily focus on common instructions that align well with what the model learned during training. However, proficiency in responding to these instructions does not necessarily imply strong ability in instruction following. In this paper, we propose a novel instruction-following evaluation protocol called verbalizer manipulation. It instructs the model to verbalize the task label with words aligning with model priors to different extents, adopting verbalizers from highly aligned (e.g., outputting ``postive'' for positive sentiment), to minimally aligned (e.g., outputting ``negative'' for positive sentiment). Verbalizer manipulation can be seamlessly integrated with any classification benchmark to examine the model's reliance on priors and its ability to override them to accurately follow the instructions. We conduct a comprehensive evaluation of four major model families across nine datasets, employing twelve sets of verbalizers for each of them. We observe that the instruction-following abilities of models, across different families and scales, are significantly distinguished by their performance on less natural verbalizers. Even the strongest GPT-4 model struggles to perform better than random guessing on the most challenging verbalizer, emphasizing the need for continued advancements to improve their instruction-following abilities.
Related papers
- Improving Instruction-Following in Language Models through Activation Steering [58.876600545898675]
We derive instruction-specific vector representations from language models and use them to steer models accordingly.
We demonstrate how this method can enhance model adherence to constraints such as output format, length, and word inclusion.
Our findings demonstrate that activation steering offers a practical and scalable approach for fine-grained control in language generation.
arXiv Detail & Related papers (2024-10-15T08:38:20Z) - Instruction Following without Instruction Tuning [87.72635104686275]
We find two forms of adaptation (tuning) that are deficient compared to instruction tuning, yet still yield instruction following.
We support this by hand-writing a rule-based language model which yields instruction following in a product-of-experts with a pretrained model.
arXiv Detail & Related papers (2024-09-21T22:36:22Z) - Self-Judge: Selective Instruction Following with Alignment Self-Evaluation [27.69410513313001]
We study the study of selective instruction following, whereby the system declines to execute instructions if the anticipated response quality is low.
We introduce Self-J, a novel self-training framework for developing judge models without needing human-annotated quality scores.
arXiv Detail & Related papers (2024-09-02T04:14:13Z) - The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models [48.455388608863785]
We introduce a benchmark designed to evaluate models' abilities to follow multiple instructions through sequential instruction following tasks.
Our benchmark evaluates instruction following using four tasks (text modification, question answering, mathematics, and security rules)
More recent and larger models significantly outperform their older and smaller counterparts on the SIFo tasks, validating the benchmark's effectiveness.
arXiv Detail & Related papers (2024-06-28T15:34:26Z) - Improving the Robustness of Large Language Models via Consistency Alignment [36.24876571343749]
Large language models (LLMs) have shown tremendous success in following user instructions and generating helpful responses.
LLMs may generate significantly inconsistent responses due to minor changes in the verbalized instructions.
We propose a two-stage training framework consisting of instruction-augmented supervised fine-tuning and consistency alignment training.
arXiv Detail & Related papers (2024-03-21T08:21:12Z) - Instructive Decoding: Instruction-Tuned Large Language Models are
Self-Refiner from Noisy Instructions [26.192531184689763]
This paper presents Instructive Decoding (ID), a simple yet effective approach that augments the efficacy of instruction-tuned models.
ID adjusts the logits for next-token prediction in a contrastive manner, utilizing predictions generated from a manipulated version of the original instruction.
We conduct experiments across a spectrum of such noisy instructions, ranging from those that insert semantic noise via random words to others like 'opposite' that elicit deviated responses.
arXiv Detail & Related papers (2023-11-01T02:31:35Z) - Evaluating the Zero-shot Robustness of Instruction-tuned Language Models [23.488398944358643]
We find that using novel (unobserved) but appropriate instruction phrasings consistently degrades model performance.
We propose a simple method to mitigate this issue by introducing soft prompt'' embedding parameters.
We show that this method consistently improves the robustness of instruction-tuned models.
arXiv Detail & Related papers (2023-06-20T03:48:51Z) - Self-Instruct: Aligning Language Models with Self-Generated Instructions [76.42871502364697]
Self-Instruct is a framework for improving the instruction-following capabilities of pretrained language models.
Our pipeline generates instructions, input, and output samples from a language model, then filters invalid or similar ones before using them to finetune the original model.
For further evaluation, we curate a set of expert-written instructions for novel tasks, and show through human evaluation that tuning GPT3 with Self-Instruct outperforms using existing public instruction datasets by a large margin.
arXiv Detail & Related papers (2022-12-20T18:59:19Z) - Learning Action Conditions from Instructional Manuals for Instruction Understanding [48.52663250368341]
We propose a task dubbed action condition inference, and collecting a high-quality, human annotated dataset of preconditions and postconditions of actions in instructional manuals.
We propose a weakly supervised approach to automatically construct large-scale training instances from online instructional manuals, and curate a densely human-annotated and validated dataset to study how well the current NLP models can infer action-condition dependencies in instruction texts.
arXiv Detail & Related papers (2022-05-25T00:19:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.