Functionality learning through specification instructions
- URL: http://arxiv.org/abs/2311.08481v1
- Date: Tue, 14 Nov 2023 19:15:55 GMT
- Title: Functionality learning through specification instructions
- Authors: Pedro Henrique Luz de Araujo and Benjamin Roth
- Abstract summary: Test suites assess natural language processing models' performance on specific functionalities.
Previous work has explored functionality learning by fine-tuning models on suite data.
This paper analyses a fine-tuning-free approach to functionality learning.
- Score: 2.846550189998273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Test suites assess natural language processing models' performance on
specific functionalities: cases of interest involving model robustness,
fairness, or particular linguistic capabilities. They enable fine-grained
evaluations of model aspects that would otherwise go unnoticed in standard
evaluation datasets, but they do not address the problem of how to fix the
failure cases. Previous work has explored functionality learning by fine-tuning
models on suite data. While this improves performance on seen functionalities,
it often does not generalize to unseen ones and can harm general performance.
This paper analyses a fine-tuning-free approach to functionality learning.
For each functionality in a suite, we generate a specification instruction that
encodes it. We combine the obtained specification instructions to create
specification-augmented prompts, which we feed to language models pre-trained
on natural instruction data to generate suite predictions. A core aspect of our
analysis is to measure the effect that including a set of specifications has on
a held-out set of unseen, qualitatively different specifications. Our
experiments across four tasks and models ranging from 80M to 175B parameters
show that smaller models struggle to follow specification instructions.
However, larger models (> 3B params.) can benefit from specifications and even
generalize desirable behaviors across functionalities.
Related papers
- Third-Party Language Model Performance Prediction from Instruction [59.574169249307054]
Language model-based instruction-following systems have lately shown increasing performance on many benchmark tasks.
A user may easily prompt a model with an instruction without any idea of whether the responses should be expected to be accurate.
We propose a third party performance prediction framework, where a separate model is trained to predict the metric resulting from evaluating an instruction-following system on a task.
arXiv Detail & Related papers (2024-03-19T03:53:47Z) - Specialist or Generalist? Instruction Tuning for Specific NLP Tasks [58.422495509760154]
We investigate whether incorporating broad-coverage generalist instruction tuning can contribute to building a specialist model.
Our experiments assess four target tasks with distinct coverage levels.
The effect is particularly pronounced when the amount of task-specific training data is limited.
arXiv Detail & Related papers (2023-10-23T19:46:48Z) - How Far Can Camels Go? Exploring the State of Instruction Tuning on Open
Resources [117.6496550359768]
This work explores recent advances in instruction-tuning language models on a range of open instruction-following datasets.
We provide a large set of instruction-tuned models from 6.7B to 65B parameters in size, trained on 12 instruction datasets.
We evaluate them on their factual knowledge, reasoning, multilinguality, coding, and open-ended instruction following abilities.
arXiv Detail & Related papers (2023-06-07T19:59:23Z) - Assessing the Generalizability of a Performance Predictive Model [0.6070952062639761]
We propose a workflow to estimate the generalizability of a predictive model for algorithm performance.
The results show that generalizability patterns in the landscape feature space are reflected in the performance space.
arXiv Detail & Related papers (2023-05-31T12:50:44Z) - Do Models Really Learn to Follow Instructions? An Empirical Study of
Instruction Tuning [37.01833561948585]
Recent works on instruction tuning (IT) have achieved great performance with zero-shot generalizability to unseen tasks.
We analyze how models utilize instructions during IT by comparing model training with altered vs. original instructions.
arXiv Detail & Related papers (2023-05-19T02:00:47Z) - Assessing Out-of-Domain Language Model Performance from Few Examples [38.245449474937914]
We address the task of predicting out-of-domain (OOD) performance in a few-shot fashion.
We benchmark the performance on this task when looking at model accuracy on the few-shot examples.
We show that attribution-based factors can help rank relative model OOD performance.
arXiv Detail & Related papers (2022-10-13T04:45:26Z) - Plex: Towards Reliability using Pretrained Large Model Extensions [69.13326436826227]
We develop ViT-Plex and T5-Plex, pretrained large model extensions for vision and language modalities, respectively.
Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol.
We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples.
arXiv Detail & Related papers (2022-07-15T11:39:37Z) - Predicting is not Understanding: Recognizing and Addressing
Underspecification in Machine Learning [47.651130958272155]
Underspecification refers to the existence of multiple models that are indistinguishable in their in-domain accuracy.
We formalize the concept of underspecification and propose a method to identify and partially address it.
arXiv Detail & Related papers (2022-07-06T11:20:40Z) - Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models.
Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers.
We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.