Natural Language Processing for Requirements Traceability
- URL: http://arxiv.org/abs/2405.10845v1
- Date: Fri, 17 May 2024 15:17:00 GMT
- Title: Natural Language Processing for Requirements Traceability
- Authors: Jin L. C. Guo, Jan-Philipp Steghöfer, Andreas Vogelsang, Jane Cleland-Huang,
- Abstract summary: Traceability plays a crucial role in requirements and software engineering, particularly for safety-critical systems.
Natural language processing (NLP) and related techniques have made considerable progress in the past decade.
- Score: 47.93107382627423
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traceability, the ability to trace relevant software artifacts to support reasoning about the quality of the software and its development process, plays a crucial role in requirements and software engineering, particularly for safety-critical systems. In this chapter, we provide a comprehensive overview of the representative tasks in requirement traceability for which natural language processing (NLP) and related techniques have made considerable progress in the past decade. We first present the definition of traceability in the context of requirements and the overall engineering process, as well as other important concepts related to traceability tasks. Then, we discuss two tasks in detail, including trace link recovery and trace link maintenance. We also introduce two other related tasks concerning when trace links are used in practical contexts. For each task, we explain the characteristics of the task, how it can be approached through NLP techniques, and how to design and conduct the experiment to demonstrate the performance of the NLP techniques. We further discuss practical considerations on how to effectively apply NLP techniques and assess their effectiveness regarding the data set collection, the metrics selection, and the role of humans when evaluating the NLP approaches. Overall, this chapter prepares the readers with the fundamental knowledge of designing automated traceability solutions enabled by NLP in practice.
Related papers
- Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.
We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.
We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z) - Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework [81.29965270493238]
We develop a specialized dataset aimed at enhancing the evaluation and fine-tuning of large language models (LLMs) for wireless communication applications.
The dataset includes a diverse set of multi-hop questions, including true/false and multiple-choice types, spanning varying difficulty levels from easy to hard.
We introduce a Pointwise V-Information (PVI) based fine-tuning method, providing a detailed theoretical analysis and justification for its use in quantifying the information content of training data.
arXiv Detail & Related papers (2025-01-16T16:19:53Z) - ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure [0.0]
We propose a benchmark that focuses on a specific aspect of reasoning ability: the direct evaluation of multi-step inference.
Our dataset comprises pairs of explicit instructions and corresponding questions, where the procedures necessary for solving the questions are entirely detailed within the instructions.
By constructing problems that require varying numbers of steps to solve and evaluating responses at each step, we enable a thorough assessment of state-of-the-art LLMs' ability to follow instructions.
arXiv Detail & Related papers (2024-10-04T03:21:24Z) - Combatting Human Trafficking in the Cyberspace: A Natural Language
Processing-Based Methodology to Analyze the Language in Online Advertisements [55.2480439325792]
This project tackles the pressing issue of human trafficking in online C2C marketplaces through advanced Natural Language Processing (NLP) techniques.
We introduce a novel methodology for generating pseudo-labeled datasets with minimal supervision, serving as a rich resource for training state-of-the-art NLP models.
A key contribution is the implementation of an interpretability framework using Integrated Gradients, providing explainable insights crucial for law enforcement.
arXiv Detail & Related papers (2023-11-22T02:45:01Z) - Understanding the Challenges of Deploying Live-Traceability Solutions [45.235173351109374]
SAFA.ai is a startup focusing on fine-tuning project-specific models that deliver automated traceability in a near real-time environment.
This paper describes the challenges that characterize commercializing software traceability and highlights possible future directions.
arXiv Detail & Related papers (2023-06-19T14:34:16Z) - Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data
Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z) - Natural Language in Requirements Engineering for Structure Inference --
An Integrative Review [0.0]
The paper provides an integrative review regarding Natural Language Processing tools for Requirements Engineering.
Results are that currently no open source approach exists that allows for the direct/primary extraction of information structure.
An approach that allows for individual management of the algorithm, knowledge base, and text corpus is a possibility being pursued.
arXiv Detail & Related papers (2022-02-10T14:46:09Z) - Weighted Training for Cross-Task Learning [71.94908559469475]
We introduce Target-Aware Weighted Training (TAWT), a weighted training algorithm for cross-task learning.
We show that TAWT is easy to implement, is computationally efficient, requires little hyper parameter tuning, and enjoys non-asymptotic learning-theoretic guarantees.
As a byproduct, the proposed representation-based task distance allows one to reason in a theoretically principled way about several critical aspects of cross-task learning.
arXiv Detail & Related papers (2021-05-28T20:27:02Z) - Designing Multimodal Datasets for NLP Challenges [5.874143210792986]
We identify challenges and tasks that are reflective of linguistic and cognitive competencies that humans have when speaking and reasoning.
We describe a diagnostic dataset, Recipe-to-Video Questions (R2VQ), designed for testing competence-based comprehension over a multimodal recipe collection.
arXiv Detail & Related papers (2021-05-12T23:02:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.