Functionality learning through specification instructions
- URL: http://arxiv.org/abs/2311.08481v1
- Date: Tue, 14 Nov 2023 19:15:55 GMT
- Title: Functionality learning through specification instructions
- Authors: Pedro Henrique Luz de Araujo and Benjamin Roth
- Abstract summary: Test suites assess natural language processing models' performance on specific functionalities.
Previous work has explored functionality learning by fine-tuning models on suite data.
This paper analyses a fine-tuning-free approach to functionality learning.
- Score: 2.846550189998273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Test suites assess natural language processing models' performance on
specific functionalities: cases of interest involving model robustness,
fairness, or particular linguistic capabilities. They enable fine-grained
evaluations of model aspects that would otherwise go unnoticed in standard
evaluation datasets, but they do not address the problem of how to fix the
failure cases. Previous work has explored functionality learning by fine-tuning
models on suite data. While this improves performance on seen functionalities,
it often does not generalize to unseen ones and can harm general performance.
This paper analyses a fine-tuning-free approach to functionality learning.
For each functionality in a suite, we generate a specification instruction that
encodes it. We combine the obtained specification instructions to create
specification-augmented prompts, which we feed to language models pre-trained
on natural instruction data to generate suite predictions. A core aspect of our
analysis is to measure the effect that including a set of specifications has on
a held-out set of unseen, qualitatively different specifications. Our
experiments across four tasks and models ranging from 80M to 175B parameters
show that smaller models struggle to follow specification instructions.
However, larger models (> 3B params.) can benefit from specifications and even
generalize desirable behaviors across functionalities.
Related papers
- Evaluating the Instruction-following Abilities of Language Models using Knowledge Tasks [4.945902994386117]
We focus on developing a benchmark for instruction-following where it is easy to verify both task performance as well as instruction-following capabilities.
We adapt existing knowledge benchmarks and augment them with instructions that are a) conditional on correctly answering the knowledge task or b) use the space of candidate options in multiple-choice knowledge-answering tasks.
We find that even large-scale instruction-tuned LLMs fail to follow simple instructions in zero-shot settings.
arXiv Detail & Related papers (2024-10-16T19:07:37Z) - Eliciting Instruction-tuned Code Language Models' Capabilities to Utilize Auxiliary Function for Code Generation [25.434546255499242]
We study the code generation behavior of instruction-tuned models built on top of code pre-trained language models.
We design several ways to provide auxiliary functions to the models by adding them to the query or providing a response prefix.
arXiv Detail & Related papers (2024-09-20T22:28:20Z) - Third-Party Language Model Performance Prediction from Instruction [59.574169249307054]
Language model-based instruction-following systems have lately shown increasing performance on many benchmark tasks.
A user may easily prompt a model with an instruction without any idea of whether the responses should be expected to be accurate.
We propose a third party performance prediction framework, where a separate model is trained to predict the metric resulting from evaluating an instruction-following system on a task.
arXiv Detail & Related papers (2024-03-19T03:53:47Z) - Specialist or Generalist? Instruction Tuning for Specific NLP Tasks [58.422495509760154]
We investigate whether incorporating broad-coverage generalist instruction tuning can contribute to building a specialist model.
Our experiments assess four target tasks with distinct coverage levels.
The effect is particularly pronounced when the amount of task-specific training data is limited.
arXiv Detail & Related papers (2023-10-23T19:46:48Z) - UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions [64.50935101415776]
We build a single model that jointly performs various spoken language understanding (SLU) tasks.
We demonstrate the efficacy of our single multi-task learning model "UniverSLU" for 12 speech classification and sequence generation task types spanning 17 datasets and 9 languages.
arXiv Detail & Related papers (2023-10-04T17:10:23Z) - Did You Read the Instructions? Rethinking the Effectiveness of Task
Definitions in Instruction Learning [74.70157466822612]
We systematically study the role of task definitions in instruction learning.
We find that model performance drops substantially when removing contents describing the task output.
We propose two strategies to help models better leverage task instructions.
arXiv Detail & Related papers (2023-06-01T21:11:24Z) - Instruction Induction: From Few Examples to Natural Language Task
Descriptions [55.139554327372934]
We show that language models can explicitly infer an underlying task from a few demonstrations by prompting them to generate a natural language instruction that fits the examples.
InstructGPT achieves 65.7% of human performance in our execution-based metric, while the original GPT-3 model reaches only 9.8% of human performance.
arXiv Detail & Related papers (2022-05-22T09:22:37Z) - Multi Task Learning For Zero Shot Performance Prediction of Multilingual
Models [12.759281077118567]
Massively Multilingual Transformer based Language Models have been observed to be surprisingly effective on zero-shot transfer across languages.
We build upon some of the existing techniques for predicting the zero-shot performance on a task, by modeling it as a multi-task learning problem.
arXiv Detail & Related papers (2022-05-12T14:47:03Z) - Quantifying Adaptability in Pre-trained Language Models with 500 Tasks [60.0364822929442]
We present a large-scale empirical study of the features and limits of LM adaptability using a new benchmark, TaskBench500.
We evaluate three facets of adaptability, finding that adaptation procedures differ dramatically in their ability to memorize small datasets.
Our experiments show that adaptability to new tasks, like generalization to new examples, can be systematically described and understood.
arXiv Detail & Related papers (2021-12-06T18:00:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.