Related papers: Requirements Satisfiability with In-Context Learning

Requirements Satisfiability with In-Context Learning

URL: http://arxiv.org/abs/2404.12576v1
Date: Fri, 19 Apr 2024 01:58:24 GMT
Title: Requirements Satisfiability with In-Context Learning
Authors: Sarah Santos, Travis Breaux, Thomas Norton, Sara Haghighi, Sepideh Ghanavati,
Abstract summary: Language models that can learn a task at an inference time, called in-context learning (ICL), show increasing promise in natural language tasks. In this paper, we apply ICL to a design evaluation of satisfaction arguments, which describe how a requirement is satisfied by a system specification and associated knowledge. The approach builds on three prompt design patterns, including augmented generation, prompt tuning, and chain-of-thought prompting.
Score: 1.747623282473278
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language models that can learn a task at inference time, called in-context learning (ICL), show increasing promise in natural language inference tasks. In ICL, a model user constructs a prompt to describe a task with a natural language instruction and zero or more examples, called demonstrations. The prompt is then input to the language model to generate a completion. In this paper, we apply ICL to the design and evaluation of satisfaction arguments, which describe how a requirement is satisfied by a system specification and associated domain knowledge. The approach builds on three prompt design patterns, including augmented generation, prompt tuning, and chain-of-thought prompting, and is evaluated on a privacy problem to check whether a mobile app scenario and associated design description satisfies eight consent requirements from the EU General Data Protection Regulation (GDPR). The overall results show that GPT-4 can be used to verify requirements satisfaction with 96.7% accuracy and dissatisfaction with 93.2% accuracy. Inverting the requirement improves verification of dissatisfaction to 97.2%. Chain-of-thought prompting improves overall GPT-3.5 performance by 9.0% accuracy. We discuss the trade-offs among templates, models and prompt strategies and provide a detailed analysis of the generated specifications to inform how the approach can be applied in practice.

Related papers

Conformal Linguistic Calibration: Trading-off between Factuality and Specificity [41.45862052156885]
We propose a unified framework that connects abstention and linguistic calibration through the lens of linguistic pragmatics. We describe an implementation that allows for controlling the level of imprecision in model responses. Our approach enables fine-tuning models to perform uncertainty-aware adaptive claim rewriting, offering a controllable balance between factuality and specificity.
arXiv Detail & Related papers (2025-02-26T13:01:49Z)
Establishing Knowledge Preference in Language Models [80.70632813935644]
Language models are known to encode a great amount of factual knowledge through pretraining. Such knowledge might be insufficient to cater to user requests. When answering questions about ongoing events, the model should use recent news articles to update its response. When some facts are edited in the model, the updated facts should override all prior knowledge learned by the model.
arXiv Detail & Related papers (2024-07-17T23:16:11Z)
Model Generation with LLMs: From Requirements to UML Sequence Diagrams [9.114284818139069]
This paper investigates the capability of ChatGPT to generate a specific type of model, i.e., sequence diagrams, from NL requirements. We examine the sequence diagrams generated by ChatGPT for 28 requirements documents of various types and from different domains. Our results indicate that, although the models generally conform to the standard and exhibit a reasonable level of understandability, their completeness and correctness with respect to the specified requirements often present challenges.
arXiv Detail & Related papers (2024-04-09T15:07:25Z)
Large Language Models in the Workplace: A Case Study on Prompt Engineering for Job Type Classification [58.720142291102135]
This case study investigates the task of job classification in a real-world setting. The goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position.
arXiv Detail & Related papers (2023-03-13T14:09:53Z)
Making Large Language Models Better Reasoners with Step-Aware Verifier [49.16750018427259]
DIVERSE (Diverse Verifier on Reasoning Step) is a novel approach that further enhances the reasoning capability of language models. We evaluate DIVERSE on the latest language model code-davinci and show that it achieves new state-of-the-art results on six of eight reasoning benchmarks.
arXiv Detail & Related papers (2022-06-06T03:38:36Z)
Instruction Induction: From Few Examples to Natural Language Task Descriptions [55.139554327372934]
We show that language models can explicitly infer an underlying task from a few demonstrations by prompting them to generate a natural language instruction that fits the examples. InstructGPT achieves 65.7% of human performance in our execution-based metric, while the original GPT-3 model reaches only 9.8% of human performance.
arXiv Detail & Related papers (2022-05-22T09:22:37Z)
Finetuned Language Models Are Zero-Shot Learners [67.70352207685558]
We show that instruction tuning boosts zero-shot performance on unseen tasks. We take a 137B parameter pretrained language model and instruction-tune it on over 60 NLP tasks verbalized via natural language instruction templates. We evaluate this instruction-tuned model, which we call FLAN, on unseen task types.
arXiv Detail & Related papers (2021-09-03T17:55:52Z)
Meta-tuning Language Models to Answer Prompts Better [35.71265221884353]
Large pretrained language models like GPT-3 have acquired a surprising ability to perform zero-shot classification (ZSC) We propose meta-tuning, which trains the model to specialize in answering prompts but still generalize to unseen tasks. After meta-tuning, our model outperforms a same-sized QA model for most labels on unseen tasks.
arXiv Detail & Related papers (2021-04-10T02:57:22Z)
A Simple Language Model for Task-Oriented Dialogue [61.84084939472287]
SimpleTOD is a simple approach to task-oriented dialogue that uses a single, causal language model trained on all sub-tasks recast as a single sequence prediction problem. This allows SimpleTOD to fully leverage transfer learning from pre-trained, open domain, causal language models such as GPT-2.
arXiv Detail & Related papers (2020-05-02T11:09:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.