Generative AI for Requirements Engineering: A Systematic Literature Review
- URL: http://arxiv.org/abs/2409.06741v3
- Date: Tue, 14 Oct 2025 01:49:56 GMT
- Title: Generative AI for Requirements Engineering: A Systematic Literature Review
- Authors: Haowei Cheng, Jati H. Husen, Yijun Lu, Teeradaj Racharak, Nobukazu Yoshioka, Naoyasu Ubayashi, Hironori Washizaki,
- Abstract summary: Generative pretrained transformer models dominate current applications.<n>Industrial adoption remains nascent, with over 90% of studies corresponding to early stage development.<n>Despite the transformative potential of GenAI based RE, several barriers hinder practical adoption.
- Score: 1.6986294649170766
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Introduction: Requirements engineering faces challenges due to the handling of increasingly complex software systems. These challenges can be addressed using generative AI. Given that GenAI based RE has not been systematically analyzed in detail, this review examines related research, focusing on trends, methodologies, challenges, and future directions. Methods: A systematic methodology for paper selection, data extraction, and feature analysis is used to comprehensively review 238 articles published from 2019 to 2025 and available from major academic databases. Results: Generative pretrained transformer models dominate current applications (67.3%), but research remains unevenly distributed across RE phases, with analysis (30.0%) and elicitation (22.1%) receiving the most attention, and management (6.8%) underexplored. Three core challenges: reproducibility (66.8%), hallucinations (63.4%), and interpretability (57.1%) form a tightly interlinked triad affecting trust and consistency. Strong correlations (35% cooccurrence) indicate these challenges must be addressed holistically. Industrial adoption remains nascent, with over 90% of studies corresponding to early stage development and only 1.3% reaching production level integration. Conclusions: Evaluation practices show maturity gaps, limited tool and dataset availability, and fragmented benchmarking approaches. Despite the transformative potential of GenAI based RE, several barriers hinder practical adoption. The strong correlations among core challenges demand specialized architectures targeting interdependencies rather than isolated solutions. The limited deployment reflects systemic bottlenecks in generalizability, data quality, and scalable evaluation methods. Successful adoption requires coordinated development across technical robustness, methodological maturity, and governance integration.
Related papers
- Human Identification at a Distance: Challenges, Methods and Results on the Competition HID 2025 [70.29305328364755]
The International Competition on Human Identification at a Distance (HID) has been organized annually since 2020.<n>The best-performing method reached 94.2% accuracy, setting a new benchmark on this dataset.<n>We analyze key technical trends and outline potential directions for future research in gait recognition.
arXiv Detail & Related papers (2026-02-07T14:22:17Z) - The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research [56.80927148740585]
We address the challenges of scalability and rigor by flipping the dynamic and developing AI agents as research evaluators.<n>We use mechanistic interpretability research as a testbed, build standardized research output, and develop MechEvalAgent.<n>Our work demonstrates the potential of AI agents to transform research evaluation and pave the way for rigorous scientific practices.
arXiv Detail & Related papers (2026-02-05T19:00:02Z) - AI, Metacognition, and the Verification Bottleneck: A Three-Wave Longitudinal Study of Human Problem-Solving [0.0]
This pilot study tracked how generative AI reshapes problem-solving over six months in an academic setting.<n>Results generalize primarily to early-adopter, academically affiliated populations.
arXiv Detail & Related papers (2026-01-21T15:49:04Z) - Large Language Models for Unit Test Generation: Achievements, Challenges, and the Road Ahead [15.43943391801509]
Unit testing is an essential yet laborious technique for verifying software.<n>Large Language Models (LLMs) address this limitation by utilizing by leveraging their data-driven knowledge of code semantics and programming patterns.<n>This framework analyzes the literature regarding core generative strategies and a set of enhancement techniques.
arXiv Detail & Related papers (2025-11-26T13:30:11Z) - On the Influence of Artificial Intelligence on Human Problem-Solving: Empirical Insights for the Third Wave in a Multinational Longitudinal Pilot Study [0.0]
This article investigates the evolving paradigm of human-AI collaboration in problem-solving contexts.<n>Building upon previous waves, our findings reveal the consolidation of a hybrid problem-solving culture.<n>The study concludes that educational and technological interventions must prioritize verification scaffolds.
arXiv Detail & Related papers (2025-11-13T10:20:07Z) - LiveOIBench: Can Large Language Models Outperform Human Contestants in Informatics Olympiads? [5.835205320809048]
LiveOIBench is a benchmark featuring 403 Olympiad-level competitive programming problems with an average of 60 expert-designed test cases.<n>The problems are sourced directly from 72 official Informatics Olympiads in different regions conducted between 2023 and 2025.<n>LiveOIBench distinguishes itself through four key features: meticulously curated high-quality tasks with detailed subtasks and extensive private test cases.
arXiv Detail & Related papers (2025-10-10T17:54:24Z) - OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks [52.87238755666243]
We present OmniEAR, a framework for evaluating how language models reason about physical interactions, tool usage, and multi-agent coordination in embodied tasks.<n>We model continuous physical properties and complex spatial relationships across 1,500 scenarios spanning household and industrial domains.<n>Our systematic evaluation reveals severe performance degradation when models must reason from constraints.
arXiv Detail & Related papers (2025-08-07T17:54:15Z) - HySemRAG: A Hybrid Semantic Retrieval-Augmented Generation Framework for Automated Literature Synthesis and Methodological Gap Analysis [55.2480439325792]
HySemRAG is a framework that combines Extract, Transform, Load (ETL) pipelines with Retrieval-Augmented Generation (RAG)<n>System addresses limitations in existing RAG architectures through a multi-layered approach.
arXiv Detail & Related papers (2025-08-01T20:30:42Z) - AI4Research: A Survey of Artificial Intelligence for Scientific Research [55.5452803680643]
We present a comprehensive survey on AI for Research (AI4Research)<n>We first introduce a systematic taxonomy to classify five mainstream tasks in AI4Research.<n>We identify key research gaps and highlight promising future directions.
arXiv Detail & Related papers (2025-07-02T17:19:20Z) - ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge [53.18163869901266]
ESGenius is a benchmark for evaluating and enhancing the proficiency of Large Language Models (LLMs) in Environmental, Social and Governance (ESG)<n> ESGenius comprises two key components: ESGenius-QA and ESGenius-Corpus.
arXiv Detail & Related papers (2025-06-02T13:19:09Z) - Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning [66.43194385702297]
Large Language Models (LLMs) have shown strong reasoning capabilities, particularly when enhanced through Reinforcement Learning (RL)
We propose NEMOTRON-CROSSTHINK, a framework that systematically incorporates multi-domain corpora, including both synthetic and real-world question-answer pairs, into RL training to improve generalization across diverse reasoning tasks.
arXiv Detail & Related papers (2025-04-15T21:37:13Z) - Identifying Trustworthiness Challenges in Deep Learning Models for Continental-Scale Water Quality Prediction [69.38041171537573]
Water quality is foundational to environmental sustainability, ecosystem resilience, and public health.<n>Deep learning offers transformative potential for large-scale water quality prediction and scientific insights generation.<n>Their widespread adoption in high-stakes operational decision-making, such as pollution mitigation and equitable resource allocation, is prevented by unresolved trustworthiness challenges.
arXiv Detail & Related papers (2025-03-13T01:50:50Z) - How Metacognitive Architectures Remember Their Own Thoughts: A Systematic Review [16.35521789216079]
Metacognition has gained significant attention for its potential to enhance autonomy and adaptability of artificial agents.<n>Existing overviews remain at a conceptual level that is undiscerning to the underlying algorithms, representations, and their respective success.
arXiv Detail & Related papers (2025-02-28T08:48:41Z) - Neuro-Symbolic AI in 2024: A Systematic Review [0.29260385019352086]
The review followed the PRISMA methodology, utilizing databases such as IEEE Explore, Google Scholar, arXiv, ACM, and SpringerLink.
From an initial pool of 1,428 papers, 167 met the inclusion criteria and were analyzed in detail.
The majority of research efforts are concentrated in the areas of learning and inference, logic and reasoning, and knowledge representation.
arXiv Detail & Related papers (2025-01-09T18:48:35Z) - Generative Artificial Intelligence Meets Synthetic Aperture Radar: A Survey [49.29751866761522]
This paper aims to investigate the intersection of GenAI and SAR.
First, we illustrate the common data generation-based applications in SAR field.
Then, an overview of the latest GenAI models is systematically reviewed.
Finally, the corresponding applications in SAR domain are also included.
arXiv Detail & Related papers (2024-11-05T03:06:00Z) - AI for ERW Detection in Clearance Operations -- The State of Research [12.278116747610158]
This article provides a literature review of academic research on AI for ERW detection for clearance operations.
It finds that research can be grouped into two main streams, AI for ERW object detection and AI for ERW risk prediction.
We develop three opportunities for future research, including a call for renewed efforts in the use of AI for ERW risk prediction.
arXiv Detail & Related papers (2024-10-31T11:50:29Z) - Generative AI in Health Economics and Outcomes Research: A Taxonomy of Key Definitions and Emerging Applications, an ISPOR Working Group Report [12.204470166456561]
Generative AI shows significant potential in health economics and outcomes research (HEOR)
Generative AI shows significant potential in HEOR, enhancing efficiency, productivity, and offering novel solutions to complex challenges.
Foundation models are promising in automating complex tasks, though challenges remain in scientific reliability, bias, interpretability, and workflow integration.
arXiv Detail & Related papers (2024-10-26T15:42:50Z) - An Adaptive Framework for Generating Systematic Explanatory Answer in Online Q&A Platforms [62.878616839799776]
We propose SynthRAG, an innovative framework designed to enhance Question Answering (QA) performance.
SynthRAG improves on conventional models by employing adaptive outlines for dynamic content structuring.
An online deployment on the Zhihu platform revealed that SynthRAG's answers achieved notable user engagement.
arXiv Detail & Related papers (2024-10-23T09:14:57Z) - A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions [0.0]
RAG combines retrieval mechanisms with generative language models to enhance the accuracy of outputs.
Recent research breakthroughs are discussed, highlighting novel methods for improving retrieval efficiency.
Future research directions are proposed, focusing on improving the robustness of RAG models.
arXiv Detail & Related papers (2024-10-03T22:29:47Z) - Heuristics and Biases in AI Decision-Making: Implications for Responsible AGI [0.0]
We investigate the presence of cognitive biases in three large language models (LLMs): GPT-4o, Gemma 2, and Llama 3.1.
The study uses 1,500 experiments across nine established cognitive biases to evaluate the models' responses and consistency.
arXiv Detail & Related papers (2024-09-26T05:34:00Z) - SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories [55.161075901665946]
Super aims to capture the realistic challenges faced by researchers working with Machine Learning (ML) and Natural Language Processing (NLP) research repositories.
Our benchmark comprises three distinct problem sets: 45 end-to-end problems with annotated expert solutions, 152 sub problems derived from the expert set that focus on specific challenges, and 602 automatically generated problems for larger-scale development.
We show that state-of-the-art approaches struggle to solve these problems with the best model (GPT-4o) solving only 16.3% of the end-to-end set, and 46.1% of the scenarios.
arXiv Detail & Related papers (2024-09-11T17:37:48Z) - How Mature is Requirements Engineering for AI-based Systems? A Systematic Mapping Study on Practices, Challenges, and Future Research Directions [5.6818729232602205]
It is unclear if existing RE methods are sufficient or if new ones are needed to address these challenges.
Existing RE4AI research focuses mainly on requirements analysis and elicitation, with most practices applied in these areas.
We identified requirements specification, explainability, and the gap between machine learning engineers and end-users as the most prevalent challenges.
arXiv Detail & Related papers (2024-09-11T11:28:16Z) - Generative AI Tools in Academic Research: Applications and Implications for Qualitative and Quantitative Research Methodologies [0.0]
This study examines the impact of Generative Artificial Intelligence (GenAI) on academic research, focusing on its application to qualitative and quantitative data analysis.
GenAI tools evolve rapidly, they offer new possibilities for enhancing research productivity and democratising complex analytical processes.
Their integration into academic practice raises significant questions regarding research integrity and security, authorship, and the changing nature of scholarly work.
arXiv Detail & Related papers (2024-08-13T13:10:03Z) - OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI [73.75520820608232]
We introduce OlympicArena, which includes 11,163 bilingual problems across both text-only and interleaved text-image modalities.
These challenges encompass a wide range of disciplines spanning seven fields and 62 international Olympic competitions, rigorously examined for data leakage.
Our evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy, illustrating current AI limitations in complex reasoning and multimodal integration.
arXiv Detail & Related papers (2024-06-18T16:20:53Z) - A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models [71.25225058845324]
Large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation.
Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge.
RA-LLMs have emerged to harness external and authoritative knowledge bases, rather than relying on the model's internal knowledge.
arXiv Detail & Related papers (2024-05-10T02:48:45Z) - A Survey on Retrieval-Augmented Text Generation for Large Language Models [1.4579344926652844]
Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements.
This paper organizes the RAG paradigm into four categories: pre-retrieval, retrieval, post-retrieval, and generation.
It outlines RAG's evolution and discusses the field's progression through the analysis of significant studies.
arXiv Detail & Related papers (2024-04-17T01:27:42Z) - Generative AI Agent for Next-Generation MIMO Design: Fundamentals, Challenges, and Vision [76.4345564864002]
Next-generation multiple input multiple output (MIMO) is expected to be intelligent and scalable.
We propose the concept of the generative AI agent, which is capable of generating tailored and specialized contents.
We present two compelling case studies that demonstrate the effectiveness of leveraging the generative AI agent for performance analysis.
arXiv Detail & Related papers (2024-04-13T02:39:36Z) - Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey on Hybrid Algorithms [50.91348344666895]
Evolutionary Reinforcement Learning (ERL) integrates Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) for optimization.
This survey offers a comprehensive overview of the diverse research branches in ERL.
arXiv Detail & Related papers (2024-01-22T14:06:37Z) - AI in Supply Chain Risk Assessment: A Systematic Literature Review and Bibliometric Analysis [0.0]
This study examines 1,903 articles from Google Scholar and Web of Science, with 54 studies selected through PRISMA guidelines.
Our findings reveal that ML models, including Random Forest, XGBoost, and hybrid approaches, significantly enhance risk prediction accuracy and adaptability in post-pandemic contexts.
The study underscores the necessity of dynamic strategies, interdisciplinary collaboration, and continuous model evaluation to address challenges such as data quality and interpretability.
arXiv Detail & Related papers (2023-12-12T17:47:51Z) - Classification, Challenges, and Automated Approaches to Handle Non-Functional Requirements in ML-Enabled Systems: A Systematic Literature Review [10.09767622002672]
We propose a systematic literature review targeting two key aspects: the classification of the non-functional requirements investigated so far, and the challenges to be faced when developing models in ML-enabled systems.
We report that current research identified 30 different non-functional requirements, which can be grouped into six main classes.
We also compiled a catalog of more than 23 software engineering challenges, based on which further research should consider the nonfunctional requirements of machine learning-enabled systems.
arXiv Detail & Related papers (2023-11-29T09:45:41Z) - A Survey on Interpretable Cross-modal Reasoning [64.37362731950843]
Cross-modal reasoning (CMR) has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics.
This survey delves into the realm of interpretable cross-modal reasoning (I-CMR)
This survey presents a comprehensive overview of the typical methods with a three-level taxonomy for I-CMR.
arXiv Detail & Related papers (2023-09-05T05:06:48Z) - Reinforcement Learning-assisted Evolutionary Algorithm: A Survey and
Research Opportunities [63.258517066104446]
Reinforcement learning integrated as a component in the evolutionary algorithm has demonstrated superior performance in recent years.
We discuss the RL-EA integration method, the RL-assisted strategy adopted by RL-EA, and its applications according to the existing literature.
In the applications of RL-EA section, we also demonstrate the excellent performance of RL-EA on several benchmarks and a range of public datasets.
arXiv Detail & Related papers (2023-08-25T15:06:05Z) - How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language
Understanding Tasks [65.7949334650854]
GPT-3.5 models have demonstrated impressive performance in various Natural Language Processing (NLP) tasks.
However, their robustness and abilities to handle various complexities of the open world have yet to be explored.
We show that GPT-3.5 faces some specific robustness challenges, including instability, prompt sensitivity, and number sensitivity.
arXiv Detail & Related papers (2023-03-01T07:39:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.