Related papers: Monitoring and Observability of Machine Learning Systems: Current Practices and Gaps

Monitoring and Observability of Machine Learning Systems: Current Practices and Gaps

URL: http://arxiv.org/abs/2510.24142v1
Date: Tue, 28 Oct 2025 07:31:08 GMT
Title: Monitoring and Observability of Machine Learning Systems: Current Practices and Gaps
Authors: Joran Leest, Ilias Gerostathopoulos, Patricia Lago, Claudia Raibulet,
Abstract summary: Production machine learning (ML) systems fail silently -- not with crashes, but through wrong decisions.<n>Observability is recognized as critical for ML operations, but there is a lack empirical evidence of what practitioners actually capture.<n>This study presents empirical results on ML observability in practice through seven focus group sessions in several domains.
Score: 7.21017530180659
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Production machine learning (ML) systems fail silently -- not with crashes, but through wrong decisions. While observability is recognized as critical for ML operations, there is a lack empirical evidence of what practitioners actually capture. This study presents empirical results on ML observability in practice through seven focus group sessions in several domains. We catalog the information practitioners systematically capture across ML systems and their environment and map how they use it to validate models, detect and diagnose faults, and explain observed degradations. Finally, we identify gaps in current practice and outline implications for tooling design and research to establish ML observability practices.

Related papers

From Tea Leaves to System Maps: Context-awareness in Monitoring Operational Machine Learning Models [10.17792666432021]
This paper presents a systematic review to characterize and structure the various types of contextual information in this domain.<n>We introduce the Contextual System--Aspect--Representation (C-SAR) framework, a conceptual model that synthesizes our findings.<n>We also identify 20 recurring and potentially reusable patterns of specific system, aspect, and representation triplets, and map them to the monitoring activities they support.
arXiv Detail & Related papers (2025-06-12T14:49:42Z)
MLLMs are Deeply Affected by Modality Bias [158.64371871084478]
Recent advances in Multimodal Large Language Models (MLLMs) have shown promising results in integrating diverse modalities such as texts and images.<n>MLLMs are heavily influenced by modality bias, often relying on language while under-utilizing other modalities like visual inputs.<n>This paper argues that MLLMs are deeply affected by modality bias, highlighting its manifestations across various tasks.
arXiv Detail & Related papers (2025-05-24T11:49:31Z)
Instance-Level Data-Use Auditing of Visual ML Models [49.862257986549885]
Growing trend of legal disputes over the unauthorized use of data in machine learning (ML) systems highlights the need for reliable data-use auditing mechanisms.<n>We present the first proactive, instance-level, data-use auditing method designed to enable data owners to audit the use of their individual data instances in ML models.
arXiv Detail & Related papers (2025-03-28T13:28:57Z)
LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition? [59.81732629438753]
We propose LLaVA-RadZ, a simple yet effective framework for zero-shot medical disease recognition via utilizing the existing MLLM features.<n>Specifically, we design an end-to-end training strategy, termed Decoding-Side Feature Alignment Training (DFAT) to take advantage of the characteristics of the MLLM decoder architecture.<n>We also introduce a Domain Knowledge Anchoring Module (DKAM) to exploit the intrinsic medical knowledge of large models.
arXiv Detail & Related papers (2025-03-10T16:05:40Z)
Naming the Pain in Machine Learning-Enabled Systems Engineering [8.092979562919878]
Machine learning (ML)-enabled systems are being increasingly adopted by companies. This paper aims to deliver a comprehensive overview of the current status quo of engineering ML-enabled systems.
arXiv Detail & Related papers (2024-05-20T06:59:20Z)
ML-Enabled Systems Model Deployment and Monitoring: Status Quo and Problems [7.280443300122617]
We conducted an international survey to gather practitioner insights on how ML-enabled systems are engineered. We analyzed the status quo and problems reported for the model deployment and monitoring phases. Our results help provide a better understanding of the adopted practices and problems in practice.
arXiv Detail & Related papers (2024-02-08T00:25:30Z)
Machine Vision Therapy: Multimodal Large Language Models Can Enhance Visual Robustness via Denoising In-Context Learning [67.0609518552321]
We propose to conduct Machine Vision Therapy which aims to rectify the noisy predictions from vision models. By fine-tuning with the denoised labels, the learning model performance can be boosted in an unsupervised manner.
arXiv Detail & Related papers (2023-12-05T07:29:14Z)
Comparing Differentiable Logics for Learning Systems: A Research Preview [0.0]
Research on formal verification of machine learning (ML) systems indicates that learning from data alone often fails to capture underlying background knowledge. A promising approach for creating ML models that inherently satisfy constraints is to encode background knowledge as logical constraints that guide the learning process via so-called differentiable logics. In this research preview, we compare and evaluate various logics from the literature in weakly-supervised contexts, presenting our findings and highlighting open problems for future work.
arXiv Detail & Related papers (2023-11-16T11:33:08Z)
Detecting Shortcut Learning for Fair Medical AI using Shortcut Testing [62.9062883851246]
Machine learning holds great promise for improving healthcare, but it is critical to ensure that its use will not propagate or amplify health disparities. One potential driver of algorithmic unfairness, shortcut learning, arises when ML models base predictions on improper correlations in the training data. Using multi-task learning, we propose the first method to assess and mitigate shortcut learning as a part of the fairness assessment of clinical ML systems.
arXiv Detail & Related papers (2022-07-21T09:35:38Z)
Learning Physical Concepts in Cyber-Physical Systems: A Case Study [72.74318982275052]
We provide an overview of the current state of research regarding methods for learning physical concepts in time series data. We also analyze the most important methods from the current state of the art using the example of a three-tank system.
arXiv Detail & Related papers (2021-11-28T14:24:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.