Related papers: ChaosEater: Fully Automating Chaos Engineering with Large Language Models

Related papers

EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots [68.29056647487519]
Embodied AI is fueled by high-fidelity simulation and large-scale data collection.<n>However, this scaling capability remains bottlenecked by a reliance on labor-intensive manual oversight.<n>We introduce textscEmboCoach-Bench, a benchmark evaluating the capacity of LLM agents to autonomously engineer embodied policies.
arXiv Detail & Related papers (2026-01-29T11:33:49Z)
LLM-Powered Fully Automated Chaos Engineering: Towards Enabling Anyone to Build Resilient Software Systems at Low Cost [3.9571744700171756]
Chaos Engineering (CE) is an engineering technique aimed at improving the resilience of distributed systems.<n>This paper proposes ChaosEater, a system that automates the entire CE cycle with Large Language Models (LLMs)<n>The results demonstrate that it consistently completes reasonable CE cycles with significantly low time and monetary costs.
arXiv Detail & Related papers (2025-11-11T06:03:24Z)
"Let it be Chaos in the Plumbing!" Usage and Efficacy of Chaos Engineering in DevOps Pipelines [6.312266245317322]
Chaos Engineering (CE) has emerged as a proactive method to improve the resilience of modern distributed systems.<n>We present a systematic gray literature review that investigates how industry practitioners have adopted and adapted CE principles over recent years.<n>Our study reveals that while the core tenets of CE remain influential, practitioners increasingly emphasize controlled experimentation, automation, and risk mitigation strategies.
arXiv Detail & Related papers (2025-09-18T13:10:32Z)
Reinforcement Learning for Machine Learning Engineering Agents [52.03168614623642]
We show that agents backed by weaker models that improve via reinforcement learning can outperform agents backed by much larger, but static models.<n>We propose duration- aware gradient updates in a distributed asynchronous RL framework to amplify high-cost but high-reward actions.<n>We also propose environment instrumentation to offer partial credit, distinguishing almost-correct programs from those that fail early.
arXiv Detail & Related papers (2025-09-01T18:04:10Z)
BitsAI-Fix: LLM-Driven Approach for Automated Lint Error Resolution in Practice [11.767390004985979]
BitsAI-Fix is an automated lint error remediation workflow based on Large Language Models (LLMs)<n>In production deployment at ByteDance, our solution has supported over 5,000 engineers, resolved more than 12,000 static analysis issues, achieved approximately 85% remediation accuracy, with around 1,000 weekly active adopters.
arXiv Detail & Related papers (2025-08-05T14:17:30Z)
Design Automation in Quantum Error Correction [2.089191490381739]
Quantum error correction (QEC) underpins practical fault-tolerant quantum computing (FTQC)<n>QEC protocols are imperative to suppress logical error rates below threshold and ensure reliable operation.<n>Design automation in the QEC flow is thus critical, enabling automated synthesis, transpilation, layout, and verification of error-corrected circuits.
arXiv Detail & Related papers (2025-07-16T13:59:38Z)
Autonomous Control Leveraging LLMs: An Agentic Framework for Next-Generation Industrial Automation [0.0]
We introduce a unified agentic framework that leverages large language models (LLMs) for both discrete fault-recovery planning and continuous process control.<n>Our results demonstrate that, with structured feedback and modular agents, LLMs can unify high-level symbolic planningand low-level continuous control.
arXiv Detail & Related papers (2025-07-03T11:20:22Z)
Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z)
MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision [76.42361936804313]
We introduce MAS-ZERO, the first self-evolved, inference-time framework for automatic MAS design.<n> MAS-ZERO employs meta-level design to iteratively generate, evaluate, and refine MAS configurations tailored to each problem instance.
arXiv Detail & Related papers (2025-05-21T00:56:09Z)
AutoLoop: Fast Visual SLAM Fine-tuning through Agentic Curriculum Learning [1.282543877006303]
We present AutoLoop, a novel approach that combines automated curriculum learning with efficient fine-tuning for visual SLAM systems. Our method employs a DDPG (Deep Deterministic Policy Gradient) agent to dynamically adjust loop closure weights during training. Experiments conducted on TartanAir for training and validated across multiple benchmarks including KITTI, EuRoC, ICL-NUIM and TUM RGB-D demonstrate that AutoLoop achieves comparable or superior performance.
arXiv Detail & Related papers (2025-01-15T21:22:09Z)
LABIIUM: AI-Enhanced Zero-configuration Measurement Automation System [0.0]
We present LABIIUM, an AI-enhanced measurement automation system designed to streamline experimental and improve user productivity. Lab-Automation-Measurement Bridges (LAMBs) enable seamless instrument connectivity using standard tools such as VSCode and Python, eliminating setup overhead. The evaluation underscores LABIIUM's ability to enhance laboratory productivity and support digital transformation in research and industry.
arXiv Detail & Related papers (2024-12-07T00:15:24Z)
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection [56.66677293607114]
We propose Code-as-Monitor (CaM) for both open-set reactive and proactive failure detection.<n>To enhance the accuracy and efficiency of monitoring, we introduce constraint elements that abstract constraint-related entities.<n>Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances.
arXiv Detail & Related papers (2024-12-05T18:58:27Z)
zsLLMCode: An Effective Approach for Code Embedding via LLM with Zero-Shot Learning [6.976968804436321]
This paper proposes a novel zero-shot approach, zsLLMCode, to generate code embeddings by using large language models (LLMs) and sentence embedding models. The results have demonstrated the effectiveness and superiority of our method over state-of-the-art unsupervised approaches.
arXiv Detail & Related papers (2024-09-23T01:03:15Z)
A Roadmap Towards Automated and Regulated Robotic Systems [4.6015001632772545]
We argue that the unregulated generative processes from AI is fitted for low level end tasks. We propose a roadmap that can lead to fully automated and regulated robotic systems.
arXiv Detail & Related papers (2024-03-21T00:14:53Z)
Control and Automation for Industrial Production Storage Zone: Generation of Optimal Route Using Image Processing [49.1574468325115]
This article focuses on developing an industrial automation method for a zone of a production line model using the DIP. The neo-cascade methodology employed allowed for defining each of the stages in an adequate way, ensuring the inclusion of the relevant methods for its development. The system was based on the OpenCV library; tool focused on artificial vision, which was implemented on an object-oriented programming (OOP) platform based on Java language.
arXiv Detail & Related papers (2024-03-15T06:50:19Z)
E2E-AT: A Unified Framework for Tackling Uncertainty in Task-aware End-to-end Learning [9.741277008050927]
We propose a unified framework that covers the uncertainties emerging in both the input feature space of the machine learning models and the constrained optimization models. We show that neglecting the uncertainty of COs during training causes a new trigger for generalization errors. The framework is described as a robust optimization problem and is practically solved via end-to-end adversarial training (E2E-AT)
arXiv Detail & Related papers (2023-12-17T02:23:25Z)
TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation. Specifically, task decomposition, tool selection, and parameter prediction are assessed. Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z)
Exploring Continual Learning for Code Generation Models [80.78036093054855]
Continual Learning (CL) is an important aspect that remains underexplored in the code domain. We introduce a benchmark called CodeTask-CL that covers a wide range of tasks, including code generation, translation, summarization, and refinement. We find that effective methods like Prompt Pooling (PP) suffer from catastrophic forgetting due to the unstable training of the prompt selection mechanism.
arXiv Detail & Related papers (2023-07-05T16:58:39Z)
CHESS: A Framework for Evaluation of Self-adaptive Systems based on Chaos Engineering [0.6875312133832078]
There is an increasing need to assess the correct behavior of self-adaptive and self-healing systems. There is a lack of systematic evaluation methods for self-adaptive and self-healing systems. We propose CHESS to address this gap by evaluating self-adaptive and self-healing systems through fault injection based on chaos engineering.
arXiv Detail & Related papers (2023-03-13T17:00:55Z)
SIAD: Self-supervised Image Anomaly Detection System [18.410995759781006]
This paper outlines an automatic annotation system called SsaA, working in a self-supervised learning manner. With user-friendly web-based interfaces, SsaA is very convenient to integrate and deploy both of the unsupervised and supervised algorithms.
arXiv Detail & Related papers (2022-08-08T14:26:35Z)
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning. During inference, we introduce a new generation procedure with a critical sampling strategy. For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z)
OSCAR: Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation [50.59541802645156]
Operational Space Control (OSC) has been used as an effective task-space controller for manipulation. We propose OSC for Adaptation and Robustness (OSCAR), a data-driven variant of OSC that compensates for modeling errors. We evaluate our method on a variety of simulated manipulation problems, and find substantial improvements over an array of controller baselines.
arXiv Detail & Related papers (2021-10-02T01:21:38Z)
Deep Neural Network Approach to Estimate Early Worst-Case Execution Time [10.272133976201763]
Worst-Case Execution Time (WCET) is of utmost importance for developing Cyber-Physical and Safety-Critical Systems. This paper estimates early WCET using Deep Neural Networks as an approximate predictor model for hardware architecture and compiler.
arXiv Detail & Related papers (2021-07-28T06:32:02Z)
Online Learning of Competitive Equilibria in Exchange Economies [94.24357018178867]
In economics, the sharing of scarce resources among multiple rational agents is a classical problem. We propose an online learning mechanism to learn agent preferences. We demonstrate the effectiveness of this mechanism through numerical simulations.
arXiv Detail & Related papers (2021-06-11T21:32:17Z)
Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. We have developed a proven systems engineering approach for machine learning development and deployment. Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z)
Induction and Exploitation of Subgoal Automata for Reinforcement Learning [75.55324974788475]
We present ISA, an approach for learning and exploiting subgoals in episodic reinforcement learning (RL) tasks. ISA interleaves reinforcement learning with the induction of a subgoal automaton, an automaton whose edges are labeled by the task's subgoals. A subgoal automaton also consists of two special states: a state indicating the successful completion of the task, and a state indicating that the task has finished without succeeding.
arXiv Detail & Related papers (2020-09-08T16:42:55Z)
Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results. We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.