Related papers: Natural Language Processing for Requirements Traceability

Natural Language Processing for Requirements Traceability

URL: http://arxiv.org/abs/2405.10845v1
Date: Fri, 17 May 2024 15:17:00 GMT
Title: Natural Language Processing for Requirements Traceability
Authors: Jin L. C. Guo, Jan-Philipp Steghöfer, Andreas Vogelsang, Jane Cleland-Huang,
Abstract summary: Traceability plays a crucial role in requirements and software engineering, particularly for safety-critical systems. Natural language processing (NLP) and related techniques have made considerable progress in the past decade.
Score: 47.93107382627423
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Traceability, the ability to trace relevant software artifacts to support reasoning about the quality of the software and its development process, plays a crucial role in requirements and software engineering, particularly for safety-critical systems. In this chapter, we provide a comprehensive overview of the representative tasks in requirement traceability for which natural language processing (NLP) and related techniques have made considerable progress in the past decade. We first present the definition of traceability in the context of requirements and the overall engineering process, as well as other important concepts related to traceability tasks. Then, we discuss two tasks in detail, including trace link recovery and trace link maintenance. We also introduce two other related tasks concerning when trace links are used in practical contexts. For each task, we explain the characteristics of the task, how it can be approached through NLP techniques, and how to design and conduct the experiment to demonstrate the performance of the NLP techniques. We further discuss practical considerations on how to effectively apply NLP techniques and assess their effectiveness regarding the data set collection, the metrics selection, and the role of humans when evaluating the NLP approaches. Overall, this chapter prepares the readers with the fundamental knowledge of designing automated traceability solutions enabled by NLP in practice.

Related papers

TraceLLM: Leveraging Large Language Models with Prompt Engineering for Enhanced Requirements Traceability [4.517933493143603]
This paper introduces TraceLLM, a framework for enhancing requirements traceability through prompt engineering and demonstration selection.<n>We assess prompt generalization and robustness using eight state-of-the-art LLMs on four benchmark datasets.
arXiv Detail & Related papers (2026-02-01T14:29:13Z)
Task Prototype-Based Knowledge Retrieval for Multi-Task Learning from Partially Annotated Data [38.55691652000724]
Multi-task learning (MTL) is critical in real-world applications such as autonomous driving and robotics.<n>Existing methods for partially labeled MTL typically rely on predictions from unlabeled tasks.<n>We propose a prototype-based knowledge retrieval framework that achieves robust MTL instead of relying on predictions from unlabeled tasks.
arXiv Detail & Related papers (2026-01-12T12:27:02Z)
Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework [81.29965270493238]
We develop a specialized dataset aimed at enhancing the evaluation and fine-tuning of large language models (LLMs) for wireless communication applications. The dataset includes a diverse set of multi-hop questions, including true/false and multiple-choice types, spanning varying difficulty levels from easy to hard. We introduce a Pointwise V-Information (PVI) based fine-tuning method, providing a detailed theoretical analysis and justification for its use in quantifying the information content of training data.
arXiv Detail & Related papers (2025-01-16T16:19:53Z)
ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure [0.0]
We propose a benchmark that focuses on a specific aspect of reasoning ability: the direct evaluation of multi-step inference. Our dataset comprises pairs of explicit instructions and corresponding questions, where the procedures necessary for solving the questions are entirely detailed within the instructions. By constructing problems that require varying numbers of steps to solve and evaluating responses at each step, we enable a thorough assessment of state-of-the-art LLMs' ability to follow instructions.
arXiv Detail & Related papers (2024-10-04T03:21:24Z)
REVEAL-IT: REinforcement learning with Visibility of Evolving Agent poLicy for InTerpretability [23.81322529587759]
REVEAL-IT is a novel framework for explaining the learning process of an agent in complex environments. We visualize the policy structure and the agent's learning process for various training tasks. A GNN-based explainer learns to highlight the most important section of the policy, providing a more clear and robust explanation of the agent's learning process.
arXiv Detail & Related papers (2024-06-20T11:29:26Z)
Combatting Human Trafficking in the Cyberspace: A Natural Language Processing-Based Methodology to Analyze the Language in Online Advertisements [55.2480439325792]
This project tackles the pressing issue of human trafficking in online C2C marketplaces through advanced Natural Language Processing (NLP) techniques. We introduce a novel methodology for generating pseudo-labeled datasets with minimal supervision, serving as a rich resource for training state-of-the-art NLP models. A key contribution is the implementation of an interpretability framework using Integrated Gradients, providing explainable insights crucial for law enforcement.
arXiv Detail & Related papers (2023-11-22T02:45:01Z)
Understanding the Challenges of Deploying Live-Traceability Solutions [45.235173351109374]
SAFA.ai is a startup focusing on fine-tuning project-specific models that deliver automated traceability in a near real-time environment. This paper describes the challenges that characterize commercializing software traceability and highlights possible future directions.
arXiv Detail & Related papers (2023-06-19T14:34:16Z)
Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z)
Natural Language in Requirements Engineering for Structure Inference -- An Integrative Review [0.0]
The paper provides an integrative review regarding Natural Language Processing tools for Requirements Engineering. Results are that currently no open source approach exists that allows for the direct/primary extraction of information structure. An approach that allows for individual management of the algorithm, knowledge base, and text corpus is a possibility being pursued.
arXiv Detail & Related papers (2022-02-10T14:46:09Z)
Weighted Training for Cross-Task Learning [71.94908559469475]
We introduce Target-Aware Weighted Training (TAWT), a weighted training algorithm for cross-task learning. We show that TAWT is easy to implement, is computationally efficient, requires little hyper parameter tuning, and enjoys non-asymptotic learning-theoretic guarantees. As a byproduct, the proposed representation-based task distance allows one to reason in a theoretically principled way about several critical aspects of cross-task learning.
arXiv Detail & Related papers (2021-05-28T20:27:02Z)
Designing Multimodal Datasets for NLP Challenges [5.874143210792986]
We identify challenges and tasks that are reflective of linguistic and cognitive competencies that humans have when speaking and reasoning. We describe a diagnostic dataset, Recipe-to-Video Questions (R2VQ), designed for testing competence-based comprehension over a multimodal recipe collection.
arXiv Detail & Related papers (2021-05-12T23:02:46Z)
Exploring Relational Context for Multi-Task Dense Prediction [76.86090370115]
We consider a multi-task environment for dense prediction tasks, represented by a common backbone and independent task-specific heads. We explore various attention-based contexts, such as global and local, in the multi-task setting. We propose an Adaptive Task-Relational Context module, which samples the pool of all available contexts for each task pair.
arXiv Detail & Related papers (2021-04-28T16:45:56Z)
Knowledge-Aware Procedural Text Understanding with Multi-Stage Training [110.93934567725826]
We focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process. Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved. We propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge.
arXiv Detail & Related papers (2020-09-28T10:28:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.