Continuous Observability Assurance in Cloud-Native Applications
- URL: http://arxiv.org/abs/2503.08552v1
- Date: Tue, 11 Mar 2025 15:43:26 GMT
- Title: Continuous Observability Assurance in Cloud-Native Applications
- Authors: Maria C. Borges, Sebastian Werner,
- Abstract summary: We build on previous work and integrate our observability experiment tool OXN into a novel method for continuous observability assurance.<n>We demonstrate its use and discuss future directions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: When faults occur in microservice applications -- as they inevitably do -- developers depend on observability data to quickly identify and diagnose the issue. To collect such data, microservices need to be instrumented and the respective infrastructure configured. This task is often underestimated and error-prone, typically relying on many ad-hoc decisions. However, some of these decisions can significantly affect how quickly faults are detected and also impact the cost and performance of the application. Given its importance, we emphasize the need for a method to guide the observability design process. In this paper, we build on previous work and integrate our observability experiment tool OXN into a novel method for continuous observability assurance. We demonstrate its use and discuss future directions.
Related papers
- GAL-MAD: Towards Explainable Anomaly Detection in Microservice Applications Using Graph Attention Networks [1.0136215038345013]
Anomalies stemming from network and performance issues must be swiftly identified and addressed.
Existing anomaly detection techniques often rely on statistical models or machine learning methods.
We propose a novel anomaly detection model called Graph Attention and LSTM-based Microservice Anomaly Detection (GAL-MAD)
arXiv Detail & Related papers (2025-03-31T10:11:31Z) - Interactive Agents to Overcome Ambiguity in Software Engineering [61.40183840499932]
AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions.<n>Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes.<n>We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
arXiv Detail & Related papers (2025-02-18T17:12:26Z) - Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use.<n>MeCo captures high-level cognitive signals in the representation space, guiding when to invoke tools.<n>Our experiments show that MeCo accurately detects LLMs' internal cognitive signals and significantly improves tool-use decision-making.
arXiv Detail & Related papers (2025-02-18T15:45:01Z) - Fast and Efficient What-If Analyses of Invocation Overhead and Transactional Boundaries to Support the Migration to Microservices [0.3222802562733786]
Microservice architecture improves agility and maintainability of software systems.
Decomposing existing software into out-of-process components can have a severe impact on non-functional properties.
What-if analyses allow to explore different scenarios and to develop the service boundaries in an iterative and incremental way.
arXiv Detail & Related papers (2025-01-30T09:42:56Z) - OXN -- Automated Observability Assessments for Cloud-Native Applications [0.0]
We present a proof-of-concept implementation of an experiment tool - Observability eXperiment eNgine (OXN)
OXN is able to inject arbitrary faults into an application, similar to Chaos Engineering, but also possesses the unique capability to modify the observability configuration.
arXiv Detail & Related papers (2024-07-12T19:04:13Z) - On the Identification of Temporally Causal Representation with Instantaneous Dependence [50.14432597910128]
Temporally causal representation learning aims to identify the latent causal process from time series observations.
Most methods require the assumption that the latent causal processes do not have instantaneous relations.
We propose an textbfIDentification framework for instantanetextbfOus textbfLatent dynamics.
arXiv Detail & Related papers (2024-05-24T08:08:05Z) - Informed and Assessable Observability Design Decisions in Cloud-native Microservice Applications [0.0]
observability is important to ensure reliability of microservice applications.
Architects need to understand observability-related trade-offs in order to weigh between different observability design alternatives.
We argue for a systematic method to arrive at informed and continuously assessable observability design decisions.
arXiv Detail & Related papers (2024-03-01T16:12:20Z) - The PetShop Dataset -- Finding Causes of Performance Issues across Microservices [3.87228935312714]
This paper introduces a dataset specifically designed for evaluating root cause analyses in microservice-based applications.
The dataset encompasses latency, requests, and availability metrics emitted in 5-minute intervals from a distributed application.
In addition to normal operation metrics, the dataset includes 68 injected performance issues, which increase latency and reduce availability throughout the system.
arXiv Detail & Related papers (2023-11-08T16:30:12Z) - Clairvoyance: A Pipeline Toolkit for Medical Time Series [95.22483029602921]
Time-series learning is the bread and butter of data-driven *clinical decision support*
Clairvoyance proposes a unified, end-to-end, autoML-friendly pipeline that serves as a software toolkit.
Clairvoyance is the first to demonstrate viability of a comprehensive and automatable pipeline for clinical time-series ML.
arXiv Detail & Related papers (2023-10-28T12:08:03Z) - A2Log: Attentive Augmented Log Anomaly Detection [53.06341151551106]
Anomaly detection becomes increasingly important for the dependability and serviceability of IT services.
Existing unsupervised methods need anomaly examples to obtain a suitable decision boundary.
We develop A2Log, which is an unsupervised anomaly detection method consisting of two steps: Anomaly scoring and anomaly decision.
arXiv Detail & Related papers (2021-09-20T13:40:21Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.