Related papers: Incidents During Microservice Decomposition: A Case Study

Incidents During Microservice Decomposition: A Case Study

URL: http://arxiv.org/abs/2505.09813v1
Date: Wed, 14 May 2025 21:27:29 GMT
Title: Incidents During Microservice Decomposition: A Case Study
Authors: Doğaç Eldenk, H. Alperen Çetin,
Abstract summary: In this study, we introduce Carbon Health's software stack, share our journey, and analyze 107 incidents.<n>We suggest that starting with monolithic modularization as an initial step toward microservice decomposition may help reduce incidents.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Software errors and incidents are inevitable in web based applications. Scalability challenges, increasing demand, and ongoing code changes can contribute to such failures. As software architectures evolve rapidly, understanding how and why incidents occur is crucial for enhancing system reliability. In this study, we introduce Carbon Health's software stack, share our microservices journey, and analyze 107 incidents. Based on these incidents, we share insights and lessons learned on microservice decomposition. Finally, we suggest that starting with monolithic modularization as an initial step toward microservice decomposition may help reduce incidents and contribute to building more resilient software.

Related papers

Centrality Change Proneness: an Early Indicator of Microservice Architectural Degradation [48.55946052680251]
The study of temporal networks has emerged as a way to describe and analyze evolving networks.<n>Previous research has explored how software metrics such as size, complexity, and quality are related to microservice centrality.<n>This study investigates whether temporal centrality metrics can provide insight into the early detection of architectural degradation.
arXiv Detail & Related papers (2025-06-09T12:22:12Z)
SoK: Microservice Architectures from a Dependability Perspective [0.8287206589886882]
Microservice architecture splits monolithic applications into smaller services that interact using lightweight communication schemes.<n>We explore the known faults and vulnerabilities that microservice architecture might suffer from, and the recent scientific literature that addresses them.
arXiv Detail & Related papers (2025-03-05T11:12:58Z)
LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues [62.12404317786005]
EvoCoder is a continuous learning framework for issue code reproduction. Our results show a 20% improvement in issue reproduction rates over existing SOTA methods.
arXiv Detail & Related papers (2024-11-21T08:49:23Z)
An Empirical Study on Challenges of Event Management in Microservice Architectures [3.0184596495288263]
This paper provides the first comprehensive characterization of event management practices and challenges. We find that developers encounter many problems, including large event payloads, auditing event flows, and ordering constraints processing events. This suggests that developers are not sufficiently served by stateof-the-practice technologies.
arXiv Detail & Related papers (2024-08-01T10:19:37Z)
Microservice Vulnerability Analysis: A Literature Review with Empirical Insights [2.883578416080909]
We identify, analyze, and report 126 security vulnerabilities inherent in microservice architectures. This comprehensive analysis enables us to (i) propose a taxonomy that categorizes microservice vulnerabilities based on the distinctive features of microservice architectures. We also conduct an empirical analysis by performing vulnerability scans on four diverse microservice benchmark applications.
arXiv Detail & Related papers (2024-07-31T08:13:42Z)
Microservices-based Software Systems Reengineering: State-of-the-Art and Future Directions [17.094721366340735]
Designing software compatible with cloud-based Microservice Architectures (MSAs) is vital due to the performance, scalability, and availability limitations. We provide a comprehensive survey of current research into ways of identifying services in systems that can be redeployed as Static, dynamic, and hybrid approaches have been explored.
arXiv Detail & Related papers (2024-07-18T21:59:05Z)
Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs) The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation. We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z)
Understanding the Issues, Their Causes and Solutions in Microservices Systems: An Empirical Study [11.536360998310576]
Technical Debt, Continuous Integration, Exception Handling, Service Execution and Communication are the most dominant issues in systems. We found 177 types of solutions that can be applied to fix the identified issues.
arXiv Detail & Related papers (2023-02-03T18:08:03Z)
FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [52.85536740465277]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment. We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function. We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z)
Reducing Catastrophic Forgetting in Self Organizing Maps with Internally-Induced Generative Replay [67.50637511633212]
A lifelong learning agent is able to continually learn from potentially infinite streams of pattern sensory data. One major historic difficulty in building agents that adapt is that neural systems struggle to retain previously-acquired knowledge when learning from new samples. This problem is known as catastrophic forgetting (interference) and remains an unsolved problem in the domain of machine learning to this day.
arXiv Detail & Related papers (2021-12-09T07:11:14Z)
Variable-Shot Adaptation for Online Meta-Learning [123.47725004094472]
We study the problem of learning new tasks from a small, fixed number of examples, by meta-learning across static data from a set of previous tasks. We find that meta-learning solves the full task set with fewer overall labels and greater cumulative performance, compared to standard supervised methods. These results suggest that meta-learning is an important ingredient for building learning systems that continuously learn and improve over a sequence of problems.
arXiv Detail & Related papers (2020-12-14T18:05:24Z)
Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance. We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems. We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.