Related papers: Cast: Automated Resilience Testing for Production Cloud Service Systems

Cast: Automated Resilience Testing for Production Cloud Service Systems

URL: http://arxiv.org/abs/2602.00972v1
Date: Sun, 01 Feb 2026 02:29:25 GMT
Title: Cast: Automated Resilience Testing for Production Cloud Service Systems
Authors: Zhuangbin Chen, Zhiling Deng, Kaiming Zhang, Yang Liu, Cheng Cui, Jinfeng Zhong, Zibin Zheng,
Abstract summary: We present Cast, an automated, end-to-end framework for microservice resilience testing in production.<n>It achieves high test fidelity by replaying production traffic against a comprehensive library of application-level faults.<n>Cast has been adopted by many service teams to proactively address resilience vulnerabilities.
Score: 38.54479293660192
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The distributed nature of microservice architecture introduces significant resilience challenges. Traditional testing methods, limited by extensive manual effort and oversimplified test environments, fail to capture production system complexity. To address these limitations, we present Cast, an automated, end-to-end framework for microservice resilience testing in production. It achieves high test fidelity by replaying production traffic against a comprehensive library of application-level faults to exercise internal error-handling logic. To manage the combinatorial test space, Cast employs a complexity-driven strategy to systematically prune redundant tests and prioritize high-value tests targeting the most critical service execution paths. Cast automates the testing lifecycle through a three-phase pipeline (i.e., startup, fault injection, and recovery) and uses a multi-faceted oracle to automatically verify system resilience against nuanced criteria. Deployed in Huawei Cloud for over eight months, Cast has been adopted by many service teams to proactively address resilience vulnerabilities. Our analysis on four large-scale applications with millions of traces reveals 137 potential vulnerabilities, with 89 confirmed by developers. To further quantify its performance, Cast is evaluated on a benchmark set of 48 reproduced bugs, achieving a high coverage of 90%. The results show that Cast is a practical and effective solution for systematically improving the reliability of industrial microservice systems.

Related papers

ProbeLLM: Automating Principled Diagnosis of LLM Failures [89.44131968886184]
We propose ProbeLLM, a benchmark-agnostic automated probing framework that elevates weakness discovery from individual failures to structured failure modes.<n>By restricting probing to verifiable test cases and leveraging tool-augmented generation and verification, ProbeLLM grounds failure discovery in reliable evidence.
arXiv Detail & Related papers (2026-02-13T14:33:13Z)
Scaling Mobile Chaos Testing with AI-Driven Test Execution [2.7786234871633995]
Mobile applications in large-scale distributed systems are susceptible to backend service failures.<n>Traditional chaos engineering approaches cannot scale mobile testing due to explosion of flows, locations, and failure scenarios.<n>We present an automated mobile chaos testing system that integrates DragonCrawl, an LLM-based mobile testing platform, with uHavoc, a service-level fault injection system.
arXiv Detail & Related papers (2026-02-05T22:01:50Z)
SWE-Universe: Scale Real-World Verifiable Environments to Millions [84.63665266236963]
SWE-Universe is a framework for automatically constructing real-world software engineering (SWE) verifiable environments from GitHub pull requests (PRs)<n>We propose a building agent powered by an efficient custom-trained model to overcome the prevalent challenges of automatic building.<n>We demonstrate the profound value of our environments through large-scale agentic mid-training and reinforcement learning.
arXiv Detail & Related papers (2026-02-02T17:20:30Z)
The Rise of Agentic Testing: Multi-Agent Systems for Robust Software Quality Assurance [0.0]
Current AI-based test generators produce invalid, redundant, or non-executable tests due to lack of execution aware feedback.<n>This paper introduces a closed-loop, self-correcting system in which a Test Generation Agent, an Execution and Analysis Agent, and a Review and Optimization Agent collaboratively generate, execute, analyze, and refine tests.
arXiv Detail & Related papers (2026-01-05T18:20:14Z)
Reliable LLM-Based Edge-Cloud-Expert Cascades for Telecom Knowledge Systems [54.916243942641444]
Large language models (LLMs) are emerging as key enablers of automation in domains such as telecommunications.<n>We study an edge-cloud-expert cascaded LLM-based knowledge system that supports decision-making through a question-and-answer pipeline.
arXiv Detail & Related papers (2025-12-23T03:10:09Z)
PentestEval: Benchmarking LLM-based Penetration Testing with Modular and Stage-Level Design [30.68819474524929]
PentestEval is the first comprehensive benchmark for evaluating Large Language Models (LLMs) across six penetration testing stages.<n>It integrates expert-annotated ground truth with a fully automated evaluation pipeline across 346 tasks covering all stages in 12 realistic vulnerable scenarios.<n>Our stage-level evaluation of 9 widely used LLMs reveals generally weak performance and distinct limitations across the stages of penetration-testing workflow.
arXiv Detail & Related papers (2025-12-16T09:37:21Z)
BOSQTGEN: Breaking the Sound Barrier in Test Generation [3.052470294814771]
We introduce BOSQTGEN, a novel black-box and tool for API test generation.<n> BOSQTGEN utilizes a novel approach for decomposing API specifications into primitives, using LLMs to suggest coherent interactions for them, and employing testing to efficiently sample over these values.<n>The resulting BOSQTGEN system achieves an average of 82% of critical code coverage on benchmarks, often a 20% or more increase over prior state-of-the-art systems.
arXiv Detail & Related papers (2025-10-22T17:11:30Z)
Bridging Research and Practice in Simulation-based Testing of Industrial Robot Navigation Systems [9.268151135904063]
Surrealist is a simulation-based test generation framework originally for UAVs.<n>Our method uses a search-based algorithm to automatically generate challenging obstacle avoidance scenarios.
arXiv Detail & Related papers (2025-10-10T13:50:32Z)
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs. Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z)
Towards Automatic Generation of Amplified Regression Test Oracles [44.45138073080198]
We propose a test oracle derivation approach to amplify regression test oracles. The approach monitors the object state during test execution and compares it to the previous version to detect any changes in relation to the SUT's intended behaviour.
arXiv Detail & Related papers (2023-07-28T12:38:44Z)
SUPERNOVA: Automating Test Selection and Defect Prevention in AAA Video Games Using Risk Based Testing and Machine Learning [62.997667081978825]
Testing video games is an increasingly difficult task as traditional methods fail to scale with growing software systems. We present SUPERNOVA, a system responsible for test selection and defect prevention while also functioning as an automation hub. The direct impact of this has been observed to be a reduction in 55% or more testing hours for an undisclosed sports game title.
arXiv Detail & Related papers (2022-03-10T00:47:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.