"Let it be Chaos in the Plumbing!" Usage and Efficacy of Chaos Engineering in DevOps Pipelines
- URL: http://arxiv.org/abs/2509.14931v2
- Date: Mon, 22 Sep 2025 12:15:02 GMT
- Title: "Let it be Chaos in the Plumbing!" Usage and Efficacy of Chaos Engineering in DevOps Pipelines
- Authors: Stefano Fossati, Damian Andrew Tamburri, Massimiliano Di Penta, Marco Tonnarelli,
- Abstract summary: Chaos Engineering (CE) has emerged as a proactive method to improve the resilience of modern distributed systems.<n>We present a systematic gray literature review that investigates how industry practitioners have adopted and adapted CE principles over recent years.<n>Our study reveals that while the core tenets of CE remain influential, practitioners increasingly emphasize controlled experimentation, automation, and risk mitigation strategies.
- Score: 6.312266245317322
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chaos Engineering (CE) has emerged as a proactive method to improve the resilience of modern distributed systems, particularly within DevOps environments. Originally pioneered by Netflix, CE simulates real-world failures to expose weaknesses before they impact production. In this paper, we present a systematic gray literature review that investigates how industry practitioners have adopted and adapted CE principles over recent years. Analyzing 50 sources published between 2019 and early 2024, we developed a comprehensive classification framework that extends the foundational CE principles into ten distinct concepts. Our study reveals that while the core tenets of CE remain influential, practitioners increasingly emphasize controlled experimentation, automation, and risk mitigation strategies to align with the demands of agile and continuously evolving DevOps pipelines. Our results enhance the understanding of how CE is intended and implemented in practice, and offer guidance for future research and industrial applications aimed at improving system robustness in dynamic production environments.
Related papers
- Industrial Survey on Robustness Testing In Cyber Physical Systems [0.0]
This paper presents findings from an industrial survey conducted in Wallonia, covering a wide range of sectors.<n>It investigates robustness from how it is understood and applied in relationship with requirements engineering.<n>It identifies key challenges and gaps between industry practices and state-of-the-art methodologies.
arXiv Detail & Related papers (2026-03-04T20:30:39Z) - WarpRec: Unifying Academic Rigor and Industrial Scale for Responsible, Reproducible, and Efficient Recommendation [38.17743551493722]
We present WarpRec, a high-performance framework for Recommender Systems.<n>It includes state-of-the-art algorithms, 40 metrics, and 19 filtering and splitting strategies.<n>The framework enforces ecological responsibility by integrating CodeCarbon for real-time energy tracking.
arXiv Detail & Related papers (2026-02-19T15:09:04Z) - LLM-Powered Fully Automated Chaos Engineering: Towards Enabling Anyone to Build Resilient Software Systems at Low Cost [3.9571744700171756]
Chaos Engineering (CE) is an engineering technique aimed at improving the resilience of distributed systems.<n>This paper proposes ChaosEater, a system that automates the entire CE cycle with Large Language Models (LLMs)<n>The results demonstrate that it consistently completes reasonable CE cycles with significantly low time and monetary costs.
arXiv Detail & Related papers (2025-11-11T06:03:24Z) - Empowering Real-World: A Survey on the Technology, Practice, and Evaluation of LLM-driven Industry Agents [63.03252293761656]
This paper systematically reviews the technologies, applications, and evaluation methods of industry agents based on large language models (LLMs)<n>We examine the three key technological pillars that support the advancement of agent capabilities: Memory, Planning, and Tool Use.<n>We provide an overview of the application of industry agents in real-world domains such as digital engineering, scientific discovery, embodied intelligence, collaborative business execution, and complex system simulation.
arXiv Detail & Related papers (2025-10-20T12:46:55Z) - Embodied Intelligence in Disassembly: Multimodal Perception Cross-validation and Continual Learning in Neuro-Symbolic TAMP [12.081833179751724]
This paper proposes a continual learning framework based on Neuro-Symbolic task and motion planning (TAMP) to enhance the adaptability of embodied intelligence systems in dynamic environments.<n> Experimental results show that the proposed framework improves the task success rate in dynamic disassembly scenarios from 81.68% to 100%, while reducing the average number of perception misjudgments from 3.389 to 1.128.
arXiv Detail & Related papers (2025-09-14T13:47:07Z) - Greening AI-enabled Systems with Software Engineering: A Research Agenda for Environmentally Sustainable AI Practices [70.24403396375277]
The "Greening AI with Software Engineering" CECAM-Lorentz workshop was held February 3-7, 2025 in Lausanne, Switzerland.<n>This report presents a research agenda emerging from the workshop.<n>It outlines open research directions and practical recommendations to guide the development of environmentally sustainable AI-enabled systems.
arXiv Detail & Related papers (2025-06-02T15:19:49Z) - The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility? [54.18519360412294]
Large Language Models (LLMs) must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility.<n>This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance.<n>We analyze experimental results obtained from testing DeepSeek-R1 on our benchmark and reveal the critical ethical concerns raised by this highly acclaimed model.
arXiv Detail & Related papers (2025-01-20T06:35:01Z) - ChaosEater: Fully Automating Chaos Engineering with Large Language Models [1.7034420812099471]
Chaos Engineering (CE) is an engineering technique aimed at improving the resiliency of distributed systems.<n>To reduce the costs of the manual operations, we propose ChaosEater, a system for automating the entire CE operations.
arXiv Detail & Related papers (2025-01-19T16:35:09Z) - Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems [92.89673285398521]
o1-like reasoning systems have demonstrated remarkable capabilities in solving complex reasoning tasks.<n>We introduce an imitate, explore, and self-improve'' framework to train the reasoning model.<n>Our approach achieves competitive performance compared to industry-level reasoning systems.
arXiv Detail & Related papers (2024-12-12T16:20:36Z) - Framework for continuous transition to Agile Systems Engineering in the
Automotive Industry [0.0]
We propose an agile Systems Engineering (SE) Framework for the automotive industry to meet the new agility demand.
In addition to the methodological background, we present results of a pilot project in the chassis development department of a German automotive manufacturer.
arXiv Detail & Related papers (2023-11-21T10:21:47Z) - Towards a General Framework for Continual Learning with Pre-training [55.88910947643436]
We present a general framework for continual learning of sequentially arrived tasks with the use of pre-training.
We decompose its objective into three hierarchical components, including within-task prediction, task-identity inference, and task-adaptive prediction.
We propose an innovative approach to explicitly optimize these components with parameter-efficient fine-tuning (PEFT) techniques and representation statistics.
arXiv Detail & Related papers (2023-10-21T02:03:38Z) - Towards Autonomous Supply Chains: Definition, Characteristics, Conceptual Framework, and Autonomy Levels [47.009401895405006]
Recent global disruptions, such as the pandemic and geopolitical conflicts, have profoundly exposed vulnerabilities in traditional supply chains.
Recent global disruptions, such as the pandemic and geopolitical conflicts, have profoundly exposed vulnerabilities in traditional supply chains.
Autonomous supply chains (ASCs) have emerged as a potential solution, offering increased visibility, flexibility, and resilience in turbulent trade environments.
arXiv Detail & Related papers (2023-10-13T22:09:52Z) - CHESS: A Framework for Evaluation of Self-adaptive Systems based on
Chaos Engineering [0.6875312133832078]
There is an increasing need to assess the correct behavior of self-adaptive and self-healing systems.
There is a lack of systematic evaluation methods for self-adaptive and self-healing systems.
We propose CHESS to address this gap by evaluating self-adaptive and self-healing systems through fault injection based on chaos engineering.
arXiv Detail & Related papers (2023-03-13T17:00:55Z) - Distributed Adaptive Control: An ideal Cognitive Architecture candidate
for managing a robotic recycling plant [0.0]
This paper supports the Distributed Adaptive Control (DAC) theory as a suitable Cognitive Architecture for managing a recycling plant.
Specifically, a DAC between both single-agent and large-scale levels is proposed to meet the expected demands of the European Project HR-Recycler.
With the aim of having a realistic benchmark for future implementations of the DAC, a micro-recycling plant prototype is presented.
arXiv Detail & Related papers (2020-12-23T10:33:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.