Related papers: OXN -- Automated Observability Assessments for Cloud-Native Applications

OXN -- Automated Observability Assessments for Cloud-Native Applications

URL: http://arxiv.org/abs/2407.09644v1
Date: Fri, 12 Jul 2024 19:04:13 GMT
Title: OXN -- Automated Observability Assessments for Cloud-Native Applications
Authors: Maria C. Borges, Joshua Bauer, Sebastian Werner,
Abstract summary: We present a proof-of-concept implementation of an experiment tool - Observability eXperiment eNgine (OXN) OXN is able to inject arbitrary faults into an application, similar to Chaos Engineering, but also possesses the unique capability to modify the observability configuration.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Observability is important to ensure the reliability of microservice applications. These applications are often prone to failures, since they have many independent services deployed on heterogeneous environments. When employed "correctly", observability can help developers identify and troubleshoot faults quickly. However, instrumenting and configuring the observability of a microservice application is not trivial but tool-dependent and tied to costs. Practitioners need to understand observability-related trade-offs in order to weigh between different observability design alternatives. Still, these architectural design decisions are not supported by systematic methods and typically just rely on "professional intuition". To assess observability design trade-offs with concrete evidence, we advocate for conducting experiments that compare various design alternatives. Achieving a systematic and repeatable experiment process necessitates automation. We present a proof-of-concept implementation of an experiment tool - Observability eXperiment eNgine (OXN). OXN is able to inject arbitrary faults into an application, similar to Chaos Engineering, but also possesses the unique capability to modify the observability configuration, allowing for the straightforward assessment of design decisions that were previously left unexplored.

Related papers

Continuous Observability Assurance in Cloud-Native Applications [0.0]
We build on previous work and integrate our observability experiment tool OXN into a novel method for continuous observability assurance. We demonstrate its use and discuss future directions.
arXiv Detail & Related papers (2025-03-11T15:43:26Z)
Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use. MeCo captures high-level cognitive signals in the representation space, guiding when to invoke tools. Our experiments show that MeCo accurately detects LLMs' internal cognitive signals and significantly improves tool-use decision-making.
arXiv Detail & Related papers (2025-02-18T15:45:01Z)
AExGym: Benchmarks and Environments for Adaptive Experimentation [7.948144726705323]
We present a benchmark for adaptive experimentation based on real-world datasets. We highlight prominent practical challenges to operationalizing adaptivity: non-stationarity, batched/delayed feedback, multiple outcomes and objectives, and external validity.
arXiv Detail & Related papers (2024-08-08T15:32:12Z)
Informed and Assessable Observability Design Decisions in Cloud-native Microservice Applications [0.0]
observability is important to ensure reliability of microservice applications. Architects need to understand observability-related trade-offs in order to weigh between different observability design alternatives. We argue for a systematic method to arrive at informed and continuously assessable observability design decisions.
arXiv Detail & Related papers (2024-03-01T16:12:20Z)
Discovering Decision Manifolds to Assure Trusted Autonomous Systems [0.0]
We propose an optimization-based search technique for capturing the range of correct and incorrect responses a system could exhibit. This manifold provides a more detailed understanding of system reliability than traditional testing or Monte Carlo simulations. In this proof-of-concept, we apply our method to a software-in-the-loop evaluation of an autonomous vehicle.
arXiv Detail & Related papers (2024-02-12T16:55:58Z)
Flexible and Robust Counterfactual Explanations with Minimal Satisfiable Perturbations [56.941276017696076]
We propose a conceptually simple yet effective solution named Counterfactual Explanations with Minimal Satisfiable Perturbations (CEMSP) CEMSP constrains changing values of abnormal features with the help of their semantically meaningful normal ranges. Compared to existing methods, we conduct comprehensive experiments on both synthetic and real-world datasets to demonstrate that our method provides more robust explanations while preserving flexibility.
arXiv Detail & Related papers (2023-09-09T04:05:56Z)
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners [85.03486419424647]
KnowNo is a framework for measuring and aligning the uncertainty of large language models. KnowNo builds on the theory of conformal prediction to provide statistical guarantees on task completion.
arXiv Detail & Related papers (2023-07-04T21:25:12Z)
R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility Across Random User Intents [14.455036827804541]
Large language models show impressive results at predicting structured text such as code, but also commonly introduce errors and hallucinations in their output. We propose Randomized Utility-driven Synthesis of Uncertain REgions (R-U-SURE) R-U-SURE is an approach for building uncertainty-aware suggestions based on a decision-theoretic model of goal-conditioned utility.
arXiv Detail & Related papers (2023-03-01T18:46:40Z)
Generalization of Neural Combinatorial Solvers Through the Lens of Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features. We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features. Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound. Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z)
CoreDiag: Eliminating Redundancy in Constraint Sets [68.8204255655161]
We present a new algorithm which can be exploited for the determination of minimal cores (minimal non-redundant constraint sets) The algorithm is especially useful for distributed knowledge engineering scenarios where the degree of redundancy can become high. In order to show the applicability of our approach, we present an empirical study conducted with commercial configuration knowledge bases.
arXiv Detail & Related papers (2021-02-24T09:16:10Z)
Leveraging Expert Consistency to Improve Algorithmic Decision Support [62.61153549123407]
We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap. We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert. Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
arXiv Detail & Related papers (2021-01-24T05:40:29Z)
NADS: Neural Architecture Distribution Search for Uncertainty Awareness [79.18710225716791]
Machine learning (ML) systems often encounter Out-of-Distribution (OoD) errors when dealing with testing data coming from a distribution different from training data. Existing OoD detection approaches are prone to errors and even sometimes assign higher likelihoods to OoD samples. We propose Neural Architecture Distribution Search (NADS) to identify common building blocks among all uncertainty-aware architectures.
arXiv Detail & Related papers (2020-06-11T17:39:07Z)
Collaborative Inference for Efficient Remote Monitoring [34.27630312942825]
A naive approach to resolve this on the model level is to use simpler architectures. We propose an alternative solution by decomposing the predictive model as the sum of a simple function which serves as a local monitoring tool. A sign requirement is imposed on the latter to ensure that the local monitoring function is safe.
arXiv Detail & Related papers (2020-02-12T01:57:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.