LLM-Powered Fully Automated Chaos Engineering: Towards Enabling Anyone to Build Resilient Software Systems at Low Cost
- URL: http://arxiv.org/abs/2511.07865v1
- Date: Wed, 12 Nov 2025 01:24:55 GMT
- Title: LLM-Powered Fully Automated Chaos Engineering: Towards Enabling Anyone to Build Resilient Software Systems at Low Cost
- Authors: Daisuke Kikuta, Hiroki Ikeuchi, Kengo Tajiri,
- Abstract summary: Chaos Engineering (CE) is an engineering technique aimed at improving the resilience of distributed systems.<n>This paper proposes ChaosEater, a system that automates the entire CE cycle with Large Language Models (LLMs)<n>The results demonstrate that it consistently completes reasonable CE cycles with significantly low time and monetary costs.
- Score: 3.9571744700171756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chaos Engineering (CE) is an engineering technique aimed at improving the resilience of distributed systems. It involves intentionally injecting faults into a system to test its resilience, uncover weaknesses, and address them before they cause failures in production. Recent CE tools automate the execution of predefined CE experiments. However, planning such experiments and improving the system based on the experimental results still remain manual. These processes are labor-intensive and require multi-domain expertise. To address these challenges and enable anyone to build resilient systems at low cost, this paper proposes ChaosEater, a system that automates the entire CE cycle with Large Language Models (LLMs). It predefines an agentic workflow according to a systematic CE cycle and assigns subdivided processes within the workflow to LLMs. ChaosEater targets CE for software systems built on Kubernetes. Therefore, the LLMs in ChaosEater complete CE cycles through software engineering tasks, including requirement definition, code generation, testing, and debugging. We evaluate ChaosEater through case studies on small- and large-scale Kubernetes systems. The results demonstrate that it consistently completes reasonable CE cycles with significantly low time and monetary costs. Its cycles are also qualitatively validated by human engineers and LLMs.
Related papers
- EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots [68.29056647487519]
Embodied AI is fueled by high-fidelity simulation and large-scale data collection.<n>However, this scaling capability remains bottlenecked by a reliance on labor-intensive manual oversight.<n>We introduce textscEmboCoach-Bench, a benchmark evaluating the capacity of LLM agents to autonomously engineer embodied policies.
arXiv Detail & Related papers (2026-01-29T11:33:49Z) - Failure Modes in LLM Systems: A System-Level Taxonomy for Reliable AI Applications [0.0]
Large language models (LLMs) are being rapidly integrated into decision-support tools, automation, and AI-enabled software systems.<n>This paper presents a system-level taxonomy of fifteen hidden failure modes that arise in real-world LLM applications.
arXiv Detail & Related papers (2025-11-25T05:19:23Z) - Rethinking Technology Stack Selection with AI Coding Proficiency [49.617080246389605]
Large language models (LLMs) are now an integral part of software development.<n>We propose the concept, AI coding proficiency, the degree to which LLMs can utilize a given technology to generate high-quality code snippets.<n>We conduct the first comprehensive empirical study examining AI proficiency across 170 third-party libraries and 61 task scenarios.
arXiv Detail & Related papers (2025-09-14T06:56:47Z) - BitsAI-Fix: LLM-Driven Approach for Automated Lint Error Resolution in Practice [11.767390004985979]
BitsAI-Fix is an automated lint error remediation workflow based on Large Language Models (LLMs)<n>In production deployment at ByteDance, our solution has supported over 5,000 engineers, resolved more than 12,000 static analysis issues, achieved approximately 85% remediation accuracy, with around 1,000 weekly active adopters.
arXiv Detail & Related papers (2025-08-05T14:17:30Z) - Automated Validation of LLM-based Evaluators for Software Engineering Artifacts [0.7548538278943616]
REFINE (Ranking Evaluators for FIne grained Nuanced Evaluation) is an automated framework for benchmarking large language models (LLMs)<n> REFINE applies novel generation techniques to automatically synthesize artifacts with progressively reduced quality.<n>It quantifies each candidate evaluator configuration by measuring how closely its rankings align with expected ordering.
arXiv Detail & Related papers (2025-08-04T18:52:01Z) - Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z) - MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision [76.42361936804313]
We introduce MAS-ZERO, the first self-evolved, inference-time framework for automatic MAS design.<n> MAS-ZERO employs meta-level design to iteratively generate, evaluate, and refine MAS configurations tailored to each problem instance.
arXiv Detail & Related papers (2025-05-21T00:56:09Z) - ChaosEater: Fully Automating Chaos Engineering with Large Language Models [1.7034420812099471]
Chaos Engineering (CE) is an engineering technique aimed at improving the resiliency of distributed systems.<n>To reduce the costs of the manual operations, we propose ChaosEater, a system for automating the entire CE operations.
arXiv Detail & Related papers (2025-01-19T16:35:09Z) - LABIIUM: AI-Enhanced Zero-configuration Measurement Automation System [0.0]
We present LABIIUM, an AI-enhanced measurement automation system designed to streamline experimental and improve user productivity.<n>Lab-Automation-Measurement Bridges (LAMBs) enable seamless instrument connectivity using standard tools such as VSCode and Python, eliminating setup overhead.<n>The evaluation underscores LABIIUM's ability to enhance laboratory productivity and support digital transformation in research and industry.
arXiv Detail & Related papers (2024-12-07T00:15:24Z) - TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation.
Specifically, task decomposition, tool selection, and parameter prediction are assessed.
Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z) - Induction and Exploitation of Subgoal Automata for Reinforcement
Learning [75.55324974788475]
We present ISA, an approach for learning and exploiting subgoals in episodic reinforcement learning (RL) tasks.
ISA interleaves reinforcement learning with the induction of a subgoal automaton, an automaton whose edges are labeled by the task's subgoals.
A subgoal automaton also consists of two special states: a state indicating the successful completion of the task, and a state indicating that the task has finished without succeeding.
arXiv Detail & Related papers (2020-09-08T16:42:55Z) - Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results.
We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.