Related papers: Automated Concern Extraction from Textual Requirements of Cyber-Physical Systems: A Multi-solution Study

Automated Concern Extraction from Textual Requirements of Cyber-Physical Systems: A Multi-solution Study

URL: http://arxiv.org/abs/2510.19237v1
Date: Wed, 22 Oct 2025 04:44:01 GMT
Title: Automated Concern Extraction from Textual Requirements of Cyber-Physical Systems: A Multi-solution Study
Authors: Dongming Jin, Zhi Jin, Xiaohong Chen, Zheng Fang, Linyu Li, Shengxin Zhao, Chuihui Wang, Hongbin Xiao,
Abstract summary: Cyber-physical systems (CPSs) are characterized by a deep integration of the information space and the physical world.<n>Some automated solutions for requirements concern extraction have been proposed to alleviate the burden on requirements engineers.<n>We propose ReqEBench, a new CPSs requirements concern extraction benchmark, which contains 2,721 requirements from 12 real-world CPSs.
Score: 32.72367544006539
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cyber-physical systems (CPSs) are characterized by a deep integration of the information space and the physical world, which makes the extraction of requirements concerns more challenging. Some automated solutions for requirements concern extraction have been proposed to alleviate the burden on requirements engineers. However, evaluating the effectiveness of these solutions, which relies on fair and comprehensive benchmarks, remains an open question. To address this gap, we propose ReqEBench, a new CPSs requirements concern extraction benchmark, which contains 2,721 requirements from 12 real-world CPSs. ReqEBench offers four advantages. It aligns with real-world CPSs requirements in multiple dimensions, e.g., scale and complexity. It covers comprehensive concerns related to CPSs requirements. It undergoes a rigorous annotation process. It covers multiple application domains of CPSs, e.g., aerospace and healthcare. We conducted a comparative study on three types of automated requirements concern extraction solutions and revealed their performance in real-world CPSs using our ReqEBench. We found that the highest F1 score of GPT-4 is only 0.24 in entity concern extraction. We further analyze failure cases of popular LLM-based solutions, summarize their shortcomings, and provide ideas for improving their capabilities. We believe ReqEBench will facilitate the evaluation and development of automated requirements concern extraction.

Related papers

Prompting the Priorities: A First Look at Evaluating LLMs for Vulnerability Triage and Prioritization [0.8388262599725365]
Security analysts face increasing pressure to triage large and complex vulnerability backlogs.<n>We evaluate four models across twelve prompting techniques to interpret semi-structured and unstructured vulnerability information.<n>We issue more than 165,000 queries to assess performance under prompting styles including one-shot, few-shot, and chain-of-thought.
arXiv Detail & Related papers (2025-10-21T10:48:14Z)
How Good Are Synthetic Requirements ? Evaluating LLM-Generated Datasets for AI4RE [0.5156484100374059]
This paper presents an enhanced Product Line approach for generating synthetic requirements data.<n>We investigate four research questions assessing how prompting strategies, automated prompt optimization, and post-generation curation affect data quality.<n>Our results show that synthetic requirements can match or outperform human-authored ones for specific tasks.
arXiv Detail & Related papers (2025-06-26T10:52:07Z)
Federated Learning for Cyber Physical Systems: A Comprehensive Survey [49.54239703000928]
Federated learning (FL) has become increasingly popular in recent years.<n>The article scrutinizes how FL is utilized in critical CPS applications, e.g., intelligent transportation systems, cybersecurity services, smart cities, and smart healthcare solutions.
arXiv Detail & Related papers (2025-05-08T01:17:15Z)
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs. Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z)
Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage [74.70255719194819]
We introduce a novel framework based on sub-question coverage, which measures how well a RAG system addresses different facets of a question. We use this framework to evaluate three commercial generative answer engines: You.com, Perplexity AI, and Bing Chat. We find that while all answer engines cover core sub-questions more often than background or follow-up ones, they still miss around 50% of core sub-questions.
arXiv Detail & Related papers (2024-10-20T22:59:34Z)
Creating Automated Quantum-Assisted Solutions for Optimization Problems [0.0]
We propose the QuaST decision tree, a framework that allows to explore, automate and evaluate solution paths.<n>Our setup is modular, highly structured and flexible enough to include any kind of preparation, pre-processing and post-processing steps.
arXiv Detail & Related papers (2024-09-30T16:59:14Z)
An Evaluation of Requirements Modeling for Cyber-Physical Systems via LLMs [18.657412233247328]
Problem frame approach aims to shape real-world problems by capturing the characteristics and interconnections of components. Large language models (LLMs) have shown excellent performance in natural language understanding.
arXiv Detail & Related papers (2024-08-05T13:20:14Z)
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains [54.117238759317004]
Massive Multitask Agent Understanding (MMAU) benchmark features comprehensive offline tasks that eliminate the need for complex environment setups. It evaluates models across five domains, including Tool-use, Directed Acyclic Graph (DAG) QA, Data Science and Machine Learning coding, Contest-level programming and Mathematics. With a total of 20 meticulously designed tasks encompassing over 3K distinct prompts, MMAU provides a comprehensive framework for evaluating the strengths and limitations of LLM agents.
arXiv Detail & Related papers (2024-07-18T00:58:41Z)
EWEK-QA: Enhanced Web and Efficient Knowledge Graph Retrieval for Citation-based Question Answering Systems [103.91826112815384]
citation-based QA systems are suffering from two shortcomings. They usually rely only on web as a source of extracted knowledge and adding other external knowledge sources can hamper the efficiency of the system. We propose our enhanced web and efficient knowledge graph (KG) retrieval solution (EWEK-QA) to enrich the content of the extracted knowledge fed to the system.
arXiv Detail & Related papers (2024-06-14T19:40:38Z)
Classification, Challenges, and Automated Approaches to Handle Non-Functional Requirements in ML-Enabled Systems: A Systematic Literature Review [10.09767622002672]
We propose a systematic literature review targeting two key aspects: the classification of the non-functional requirements investigated so far, and the challenges to be faced when developing models in ML-enabled systems. We report that current research identified 30 different non-functional requirements, which can be grouped into six main classes. We also compiled a catalog of more than 23 software engineering challenges, based on which further research should consider the nonfunctional requirements of machine learning-enabled systems.
arXiv Detail & Related papers (2023-11-29T09:45:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.