From Code Generation to Software Testing: AI Copilot with Context-Based RAG
- URL: http://arxiv.org/abs/2504.01866v2
- Date: Sat, 05 Apr 2025 09:03:33 GMT
- Title: From Code Generation to Software Testing: AI Copilot with Context-Based RAG
- Authors: Yuchen Wang, Shangxin Guo, Chee Wei Tan,
- Abstract summary: We propose a novel perspective on software testing by positing bug detection and coding with fewer bugs as two interconnected problems.<n>We introduce Copilot for Testing, an automated testing system that synchronizes bug detection with updates.<n>Our evaluation demonstrates a 31.2% improvement in bug detection accuracy, a 12.6% increase in critical test coverage, and a 10.5% higher user acceptance rate.
- Score: 8.28588489551341
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid pace of large-scale software development places increasing demands on traditional testing methodologies, often leading to bottlenecks in efficiency, accuracy, and coverage. We propose a novel perspective on software testing by positing bug detection and coding with fewer bugs as two interconnected problems that share a common goal, which is reducing bugs with limited resources. We extend our previous work on AI-assisted programming, which supports code auto-completion and chatbot-powered Q&A, to the realm of software testing. We introduce Copilot for Testing, an automated testing system that synchronizes bug detection with codebase updates, leveraging context-based Retrieval Augmented Generation (RAG) to enhance the capabilities of large language models (LLMs). Our evaluation demonstrates a 31.2% improvement in bug detection accuracy, a 12.6% increase in critical test coverage, and a 10.5% higher user acceptance rate, highlighting the transformative potential of AI-driven technologies in modern software development practices.
Related papers
- Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks.
However, improvement is plateauing due to the exhaustion of readily available high-quality data.
We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z) - Comparative Analysis of Quantum and Classical Support Vector Classifiers for Software Bug Prediction: An Exploratory Study [8.214986715680737]
We investigate the practical application of Quantum Support Vectors (QSVC) for detecting buggy software commits.<n>Our technique addresses large datasets in QSVC algorithms by dividing them into smaller subsets.<n>We propose an aggregation method to combine predictions from these models to detect the entire test dataset.
arXiv Detail & Related papers (2025-01-08T18:53:50Z) - What You See Is What You Get: Attention-based Self-guided Automatic Unit Test Generation [3.8244417073114003]
We propose Attention-based Self-guided Automatic Unit Test GenERation (AUGER) approach.<n>AUGER contains two stages: defect detection and error triggering.<n>It makes great improvements by 4.7% to 35.3% in terms of F1-score and Precision in defect detection.<n>It can trigger 23 to 84 more errors than state-of-the-art (SOTA) approaches in unit test generation.
arXiv Detail & Related papers (2024-12-01T14:28:48Z) - Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement [62.94719119451089]
Lingma SWE-GPT series learns from and simulating real-world code submission activities.
Lingma SWE-GPT 72B resolves 30.20% of GitHub issues, marking a significant improvement in automatic issue resolution.
arXiv Detail & Related papers (2024-11-01T14:27:16Z) - The Future of Software Testing: AI-Powered Test Case Generation and Validation [0.0]
This paper explores the transformative potential of AI in improving test case generation and validation.
It focuses on its ability to enhance efficiency, accuracy, and scalability in testing processes.
It also addresses key challenges associated with adapting AI for testing, including the need for high quality training data.
arXiv Detail & Related papers (2024-09-09T17:12:40Z) - Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z) - Leveraging Large Language Models for Efficient Failure Analysis in Game Development [47.618236610219554]
This paper proposes a new approach to automatically identify which change in the code caused a test to fail.
The method leverages Large Language Models (LLMs) to associate error messages with the corresponding code changes causing the failure.
Our approach reaches an accuracy of 71% in our newly created dataset, which comprises issues reported by developers at EA over a period of one year.
arXiv Detail & Related papers (2024-06-11T09:21:50Z) - Comparing Software Developers with ChatGPT: An Empirical Investigation [0.0]
This paper conducts an empirical investigation, contrasting the performance of software engineers and AI systems, like ChatGPT, across different evaluation metrics.
The paper posits that a comprehensive comparison of software engineers and AI-based solutions, considering various evaluation criteria, is pivotal in fostering human-machine collaboration.
arXiv Detail & Related papers (2023-05-19T17:25:54Z) - SUPERNOVA: Automating Test Selection and Defect Prevention in AAA Video
Games Using Risk Based Testing and Machine Learning [62.997667081978825]
Testing video games is an increasingly difficult task as traditional methods fail to scale with growing software systems.
We present SUPERNOVA, a system responsible for test selection and defect prevention while also functioning as an automation hub.
The direct impact of this has been observed to be a reduction in 55% or more testing hours for an undisclosed sports game title.
arXiv Detail & Related papers (2022-03-10T00:47:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.