AI based Multiagent Approach for Requirements Elicitation and Analysis
- URL: http://arxiv.org/abs/2409.00038v1
- Date: Sun, 18 Aug 2024 07:23:12 GMT
- Title: AI based Multiagent Approach for Requirements Elicitation and Analysis
- Authors: Malik Abdul Sami, Muhammad Waseem, Zheying Zhang, Zeeshan Rasheed, Kari Systä, Pekka Abrahamsson,
- Abstract summary: This study empirically investigates the effectiveness of utilizing Large Language Models (LLMs) to automate requirements analysis tasks.
We deployed four models, namely GPT-3.5, GPT-4 Omni, LLaMA3-70, and Mixtral-8B, and conducted experiments to analyze requirements on four real-world projects.
Preliminary results indicate notable variations in task completion among the models.
- Score: 3.9422957660677476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Requirements Engineering (RE) plays a pivotal role in software development, encompassing tasks such as requirements elicitation, analysis, specification, and change management. Despite its critical importance, RE faces challenges including communication complexities, early-stage uncertainties, and accurate resource estimation. This study empirically investigates the effectiveness of utilizing Large Language Models (LLMs) to automate requirements analysis tasks. We implemented a multi-agent system that deploys AI models as agents to generate user stories from initial requirements, assess and improve their quality, and prioritize them using a selected technique. In our implementation, we deployed four models, namely GPT-3.5, GPT-4 Omni, LLaMA3-70, and Mixtral-8B, and conducted experiments to analyze requirements on four real-world projects. We evaluated the results by analyzing the semantic similarity and API performance of different models, as well as their effectiveness and efficiency in requirements analysis, gathering users' feedback on their experiences. Preliminary results indicate notable variations in task completion among the models. Mixtral-8B provided the quickest responses, while GPT-3.5 performed exceptionally well when processing complex user stories with a higher similarity score, demonstrating its capability in deriving accurate user stories from project descriptions. Feedback and suggestions from the four project members further corroborate the effectiveness of LLMs in improving and streamlining RE phases.
Related papers
- Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark [62.58869921806019]
We propose a task decomposition evaluation framework based on GPT-4o to automatically construct a new training dataset.
We design innovative training strategies to effectively distill GPT-4o's evaluation capabilities into a 7B open-source MLLM, MiniCPM-V-2.6.
Experimental results demonstrate that our distilled open-source MLLM significantly outperforms the current state-of-the-art GPT-4o-base baseline.
arXiv Detail & Related papers (2024-11-23T08:06:06Z) - MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs [97.94579295913606]
Multimodal Large Language Models (MLLMs) have garnered increased attention from both industry and academia.
In the development process, evaluation is critical since it provides intuitive feedback and guidance on improving models.
This work aims to offer researchers an easy grasp of how to effectively evaluate MLLMs according to different needs and to inspire better evaluation methods.
arXiv Detail & Related papers (2024-11-22T18:59:54Z) - Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets.
The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method.
The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z) - SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement [18.84439000902905]
SWE-Search is a multi-agent framework that integrates Monte Carlo Tree Search (MCTS) with a self-improvement mechanism to enhance software agents' performance.
This work highlights the potential of self-evaluation driven search techniques to enhance agent reasoning and planning in complex, dynamic software engineering environments.
arXiv Detail & Related papers (2024-10-26T22:45:56Z) - Large Language Model Evaluation Via Multi AI Agents: Preliminary results [3.8066447473175304]
We introduce a novel multi-agent AI model that aims to assess and compare the performance of various Large Language Models (LLMs)
Our model consists of eight distinct AI agents, each responsible for retrieving code based on a common description from different advanced language models.
We integrate the HumanEval benchmark into our verification agent to assess the generated code's performance, providing insights into their respective capabilities and efficiencies.
arXiv Detail & Related papers (2024-04-01T10:06:04Z) - Enhancing Robotic Manipulation with AI Feedback from Multimodal Large
Language Models [41.38520841504846]
Large language models (LLMs) can provide automated preference feedback solely from image inputs to guide decision-making.
In this study, we train a multimodal LLM, termed CriticGPT, capable of understanding trajectory videos in robot manipulation tasks.
Experimental evaluation of the algorithm's preference accuracy demonstrates its effective generalization ability to new tasks.
Performance on Meta-World tasks reveals that CriticGPT's reward model efficiently guides policy learning, surpassing rewards based on state-of-the-art pre-trained representation models.
arXiv Detail & Related papers (2024-02-22T03:14:03Z) - Large Language Models as Automated Aligners for benchmarking
Vision-Language Models [48.4367174400306]
Vision-Language Models (VLMs) have reached a new level of sophistication, showing notable competence in executing intricate cognition and reasoning tasks.
Existing evaluation benchmarks, primarily relying on rigid, hand-crafted datasets, face significant limitations in assessing the alignment of these increasingly anthropomorphic models with human intelligence.
In this work, we address the limitations via Auto-Bench, which delves into exploring LLMs as proficient curation, measuring the alignment betweenVLMs and human intelligence and value through automatic data curation and assessment.
arXiv Detail & Related papers (2023-11-24T16:12:05Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - GPT-Based Models Meet Simulation: How to Efficiently Use Large-Scale
Pre-Trained Language Models Across Simulation Tasks [0.0]
This paper is the first examination regarding the use of large-scale pre-trained language models for scientific simulations.
The first task is devoted to explaining the structure of a conceptual model to promote the engagement of participants.
The second task focuses on summarizing simulation outputs, so that model users can identify a preferred scenario.
The third task seeks to broaden accessibility to simulation platforms by conveying the insights of simulation visualizations via text.
arXiv Detail & Related papers (2023-06-21T15:42:36Z) - Zero-shot Item-based Recommendation via Multi-task Product Knowledge
Graph Pre-Training [106.85813323510783]
This paper presents a novel paradigm for the Zero-Shot Item-based Recommendation (ZSIR) task.
It pre-trains a model on product knowledge graph (PKG) to refine the item features from PLMs.
We identify three challenges for pre-training PKG, which are multi-type relations in PKG, semantic divergence between item generic information and relations and domain discrepancy from PKG to downstream ZSIR task.
arXiv Detail & Related papers (2023-05-12T17:38:24Z) - Empirical Evaluation of ChatGPT on Requirements Information Retrieval
Under Zero-Shot Setting [12.733403458944972]
We empirically evaluate ChatGPT's performance on requirements information retrieval tasks.
Under zero-shot setting, evaluation results reveal ChatGPT's promising ability to retrieve requirements relevant information.
arXiv Detail & Related papers (2023-04-25T04:09:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.