Related papers: Reinforcement Learning Integrated Agentic RAG for Software Test Cases Authoring

Reinforcement Learning Integrated Agentic RAG for Software Test Cases Authoring

URL: http://arxiv.org/abs/2512.06060v1
Date: Fri, 05 Dec 2025 17:52:26 GMT
Title: Reinforcement Learning Integrated Agentic RAG for Software Test Cases Authoring
Authors: Mohanakrishnan Hariharan,
Abstract summary: This paper introduces a framework that integrates reinforcement learning (RL) with autonomous agents to enable continuous improvement in the automated process of software test cases authoring from business requirement documents within Quality Engineering (QE)<n>Our proposed Reinforcement Infused Agentic RAG (Retrieve, Augment, Generate) framework overcomes this limitation by employing AI agents that learn from QE feedback, assessments, and defect discovery outcomes to automatically improve their test case generation strategies.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This paper introduces a framework that integrates reinforcement learning (RL) with autonomous agents to enable continuous improvement in the automated process of software test cases authoring from business requirement documents within Quality Engineering (QE) workflows. Conventional systems employing Large Language Models (LLMs) generate test cases from static knowledge bases, which fundamentally limits their capacity to enhance performance over time. Our proposed Reinforcement Infused Agentic RAG (Retrieve, Augment, Generate) framework overcomes this limitation by employing AI agents that learn from QE feedback, assessments, and defect discovery outcomes to automatically improve their test case generation strategies. The system combines specialized agents with a hybrid vector-graph knowledge base that stores and retrieves software testing knowledge. Through advanced RL algorithms, specifically Proximal Policy Optimization (PPO) and Deep Q-Networks (DQN), these agents optimize their behavior based on QE-reported test effectiveness, defect detection rates, and workflow metrics. As QEs execute AI-generated test cases and provide feedback, the system learns from this expert guidance to improve future iterations. Experimental validation on enterprise Apple projects yielded substantive improvements: a 2.4% increase in test generation accuracy (from 94.8% to 97.2%), and a 10.8% improvement in defect detection rates. The framework establishes a continuous knowledge refinement loop driven by QE expertise, resulting in progressively superior test case quality that enhances, rather than replaces, human testing capabilities.

Related papers

Test-time Recursive Thinking: Self-Improvement without External Feedback [120.80790108733942]
Test-time Recursive Thinking (TRT) is an iterative self-improvement framework.<n>Open-source models reach 100% accuracy on AIME-25/24, and on LiveCodeBench's most difficult problems, closed-source models improve by 10.4-14.8 percentage points without external feedback.
arXiv Detail & Related papers (2026-02-03T04:37:37Z)
EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots [68.29056647487519]
Embodied AI is fueled by high-fidelity simulation and large-scale data collection.<n>However, this scaling capability remains bottlenecked by a reliance on labor-intensive manual oversight.<n>We introduce textscEmboCoach-Bench, a benchmark evaluating the capacity of LLM agents to autonomously engineer embodied policies.
arXiv Detail & Related papers (2026-01-29T11:33:49Z)
Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification [71.98473277917962]
Recent advances in Deep Research Agents (DRAs) are transforming automated knowledge discovery and problem-solving.<n>We propose an alternative paradigm: self-evolving the agent's ability by iteratively verifying the policy model's outputs, guided by meticulously crafted rubrics.<n>We present DeepVerifier, a rubrics-based outcome reward verifier that leverages the asymmetry of verification.
arXiv Detail & Related papers (2026-01-22T09:47:31Z)
The Rise of Agentic Testing: Multi-Agent Systems for Robust Software Quality Assurance [0.0]
Current AI-based test generators produce invalid, redundant, or non-executable tests due to lack of execution aware feedback.<n>This paper introduces a closed-loop, self-correcting system in which a Test Generation Agent, an Execution and Analysis Agent, and a Review and Optimization Agent collaboratively generate, execute, analyze, and refine tests.
arXiv Detail & Related papers (2026-01-05T18:20:14Z)
SelfAI: Building a Self-Training AI System with LLM Agents [79.10991818561907]
SelfAI is a general multi-agent platform that combines a User Agent for translating high-level research objectives into standardized experimental configurations.<n>An Experiment Manager orchestrates parallel, fault-tolerant training across heterogeneous hardware while maintaining a structured knowledge base for continuous feedback.<n>Across regression, computer vision, scientific computing, medical imaging, and drug discovery benchmarks, SelfAI consistently achieves strong performance and reduces redundant trials.
arXiv Detail & Related papers (2025-11-29T09:18:39Z)
Agentic RAG for Software Testing with Hybrid Vector-Graph and Multi-Agent Orchestration [0.0]
We present an approach to software testing automation using Agentic Retrieval-Augmented Generation (RAG) systems for Quality Engineering (QE) artifact creation.<n>We combine autonomous AI agents with hybrid vector-graph knowledge systems to automate test plan, case, and QE metric generation.
arXiv Detail & Related papers (2025-10-12T22:25:15Z)
Breaking Barriers in Software Testing: The Power of AI-Driven Automation [0.0]
This paper presents an AI-driven framework that automates test case generation and validation using natural language processing (NLP), reinforcement learning (RL), and predictive models, embedded within a policy-driven trust and fairness model.<n>Case studies demonstrate measurable gains in defect detection, reduced testing effort, and faster release cycles, showing that AI-enhanced testing improves both efficiency and reliability.
arXiv Detail & Related papers (2025-08-22T01:04:50Z)
Rethinking Verification for LLM Code Generation: From Generation to Testing [44.46778801679273]
Large language models (LLMs) have recently achieved notable success in code-generation benchmarks such as HumanEval and LiveCodeBench.<n>We propose a new multi-dimensional metrics designed to rigorously quantify test-suite.<n> Experiments show that SAGA achieves a detection rate of 90.62% and a verifier accuracy of 32.58% on TCGBench.
arXiv Detail & Related papers (2025-07-09T14:58:47Z)
AI-Driven Tools in Modern Software Quality Assurance: An Assessment of Benefits, Challenges, and Future Directions [0.0]
The research aims to assess the benefits, challenges, and prospects of integrating modern AI-oriented tools into quality assurance processes.<n>The research demonstrates AI's transformative potential for QA but highlights the importance of a strategic approach to implementing these technologies.
arXiv Detail & Related papers (2025-06-19T20:22:47Z)
Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning [59.25951947621526]
We propose an approach which can transform existing coding benchmarks into scoring and ranking datasets to evaluate the effectiveness of synthetic verifiers.<n>We release four new benchmarks (HE-R, HE-R+, MBPP-R, and MBPP-R+), and analyzed synthetic verification methods with standard, reasoning-based, and reward-based LLMs.<n>Our experiments show that reasoning can significantly improve test case generation and that scaling the number of test cases enhances the verification accuracy.
arXiv Detail & Related papers (2025-02-19T15:32:11Z)
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs. Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z)
The Future of Software Testing: AI-Powered Test Case Generation and Validation [0.0]
This paper explores the transformative potential of AI in improving test case generation and validation.<n>It focuses on its ability to enhance efficiency, accuracy, and scalability in testing processes.<n>It also addresses key challenges associated with adapting AI for testing, including the need for high quality training data.
arXiv Detail & Related papers (2024-09-09T17:12:40Z)
Position: AI Evaluation Should Learn from How We Test Humans [65.36614996495983]
We argue that psychometrics, a theory originating in the 20th century for human assessment, could be a powerful solution to the challenges in today's AI evaluations.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
SUPERNOVA: Automating Test Selection and Defect Prevention in AAA Video Games Using Risk Based Testing and Machine Learning [62.997667081978825]
Testing video games is an increasingly difficult task as traditional methods fail to scale with growing software systems. We present SUPERNOVA, a system responsible for test selection and defect prevention while also functioning as an automation hub. The direct impact of this has been observed to be a reduction in 55% or more testing hours for an undisclosed sports game title.
arXiv Detail & Related papers (2022-03-10T00:47:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.