Related papers: ML-Enabled Systems Model Deployment and Monitoring: Status Quo and Problems

ML-Enabled Systems Model Deployment and Monitoring: Status Quo and Problems

URL: http://arxiv.org/abs/2402.05333v1
Date: Thu, 8 Feb 2024 00:25:30 GMT
Title: ML-Enabled Systems Model Deployment and Monitoring: Status Quo and Problems
Authors: Eduardo Zimelewicz, Marcos Kalinowski, Daniel Mendez, G\"orkem Giray, Antonio Pedro Santos Alves, Niklas Lavesson, Kelly Azevedo, Hugo Villamizar, Tatiana Escovedo, Helio Lopes, Stefan Biffl, Juergen Musil, Michael Felderer, Stefan Wagner, Teresa Baldassarre, Tony Gorschek
Abstract summary: We conducted an international survey to gather practitioner insights on how ML-enabled systems are engineered. We analyzed the status quo and problems reported for the model deployment and monitoring phases. Our results help provide a better understanding of the adopted practices and problems in practice.
Score: 7.280443300122617
License: http://creativecommons.org/licenses/by/4.0/
Abstract: [Context] Systems incorporating Machine Learning (ML) models, often called ML-enabled systems, have become commonplace. However, empirical evidence on how ML-enabled systems are engineered in practice is still limited, especially for activities surrounding ML model dissemination. [Goal] We investigate contemporary industrial practices and problems related to ML model dissemination, focusing on the model deployment and the monitoring of ML life cycle phases. [Method] We conducted an international survey to gather practitioner insights on how ML-enabled systems are engineered. We gathered a total of 188 complete responses from 25 countries. We analyze the status quo and problems reported for the model deployment and monitoring phases. We analyzed contemporary practices using bootstrapping with confidence intervals and conducted qualitative analyses on the reported problems applying open and axial coding procedures. [Results] Practitioners perceive the model deployment and monitoring phases as relevant and difficult. With respect to model deployment, models are typically deployed as separate services, with limited adoption of MLOps principles. Reported problems include difficulties in designing the architecture of the infrastructure for production deployment and legacy application integration. Concerning model monitoring, many models in production are not monitored. The main monitored aspects are inputs, outputs, and decisions. Reported problems involve the absence of monitoring practices, the need to create custom monitoring tools, and the selection of suitable metrics. [Conclusion] Our results help provide a better understanding of the adopted practices and problems in practice and support guiding ML deployment and monitoring research in a problem-driven manner.

Related papers

From Tea Leaves to System Maps: Context-awareness in Monitoring Operational Machine Learning Models [10.17792666432021]
This paper presents a systematic review to characterize and structure the various types of contextual information in this domain.<n>We introduce the Contextual System--Aspect--Representation (C-SAR) framework, a conceptual model that synthesizes our findings.<n>We also identify 20 recurring and potentially reusable patterns of specific system, aspect, and representation triplets, and map them to the monitoring activities they support.
arXiv Detail & Related papers (2025-06-12T14:49:42Z)
MLLM-CL: Continual Learning for Multimodal Large Language Models [62.90736445575181]
We introduce MLLM-CL, a novel benchmark encompassing domain and ability continual learning.<n>Our approach can integrate domain-specific knowledge and functional abilities with minimal forgetting, significantly outperforming existing methods.
arXiv Detail & Related papers (2025-06-05T17:58:13Z)
Evaluating Large Language Models for Real-World Engineering Tasks [75.97299249823972]
This paper introduces a curated database comprising over 100 questions derived from authentic, production-oriented engineering scenarios.<n>Using this dataset, we evaluate four state-of-the-art Large Language Models (LLMs)<n>Our results show that LLMs demonstrate strengths in basic temporal and structural reasoning but struggle significantly with abstract reasoning, formal modeling, and context-sensitive engineering logic.
arXiv Detail & Related papers (2025-05-12T14:05:23Z)
VACT: A Video Automatic Causal Testing System and a Benchmark [55.53300306960048]
VACT is an **automated** framework for modeling, evaluating, and measuring the causal understanding of VGMs in real-world scenarios. We introduce multi-level causal evaluation metrics to provide a detailed analysis of the causal performance of VGMs.
arXiv Detail & Related papers (2025-03-08T10:54:42Z)
Benchmarking Large and Small MLLMs [71.78055760441256]
Large multimodal language models (MLLMs) have achieved remarkable advancements in understanding and generating multimodal content. However, their deployment faces significant challenges, including slow inference, high computational cost, and impracticality for on-device applications. Small MLLMs, exemplified by the LLava-series models and Phi-3-Vision, offer promising alternatives with faster inference, reduced deployment costs, and the ability to handle domain-specific scenarios.
arXiv Detail & Related papers (2025-01-04T07:44:49Z)
Towards Trustworthy Machine Learning in Production: An Overview of the Robustness in MLOps Approach [0.0]
In recent years, AI researchers and practitioners have introduced principles and guidelines to build systems that make reliable and trustworthy decisions. In practice, a fundamental challenge arises when the system needs to be operationalized and deployed to evolve and operate in real-life environments continuously. To address this challenge, Machine Learning Operations (MLOps) have emerged as a potential recipe for standardizing ML solutions in deployment.
arXiv Detail & Related papers (2024-10-28T09:34:08Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making. We present a process-based benchmark MR-Ben that demands a meta-reasoning skill. Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
Using Quality Attribute Scenarios for ML Model Test Case Generation [3.9111051646728527]
Current practice for machine learning (ML) model testing prioritizes testing for model performance. This paper presents an approach based on quality attribute (QA) scenarios to elicit and define system- and model-relevant test cases. The QA-based approach has been integrated into MLTE, a process and tool to support ML model test and evaluation.
arXiv Detail & Related papers (2024-06-12T18:26:42Z)
Naming the Pain in Machine Learning-Enabled Systems Engineering [8.092979562919878]
Machine learning (ML)-enabled systems are being increasingly adopted by companies. This paper aims to deliver a comprehensive overview of the current status quo of engineering ML-enabled systems.
arXiv Detail & Related papers (2024-05-20T06:59:20Z)
Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks. However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs. We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z)
A Review of Physics-Informed Machine Learning Methods with Applications to Condition Monitoring and Anomaly Detection [1.124958340749622]
PIML is the incorporation of known physical laws and constraints into machine learning algorithms. This study presents a comprehensive overview of PIML techniques in the context of condition monitoring.
arXiv Detail & Related papers (2024-01-22T11:29:44Z)
Status Quo and Problems of Requirements Engineering for Machine Learning: Results from an International Survey [7.164324501049983]
Requirements Engineering (RE) can help address many problems when engineering Machine Learning-enabled systems. We conducted a survey to gather practitioner insights into the status quo and problems of RE in ML-enabled systems. We found significant differences in RE practices within ML projects.
arXiv Detail & Related papers (2023-10-10T15:53:50Z)
Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks. We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset. The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z)
Panoramic Learning with A Standardized Machine Learning Formalism [116.34627789412102]
This paper presents a standardized equation of the learning objective, that offers a unifying understanding of diverse ML algorithms. It also provides guidance for mechanic design of new ML solutions, and serves as a promising vehicle towards panoramic learning with all experiences.
arXiv Detail & Related papers (2021-08-17T17:44:38Z)
Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. We have developed a proven systems engineering approach for machine learning development and deployment. Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z)
Monitoring and explainability of models in production [58.720142291102135]
Monitoring deployed models is crucial for continued provision of high quality machine learning enabled services. We discuss the challenges to successful implementation of solutions in each of these areas with some recent examples of production ready solutions using open source tools.
arXiv Detail & Related papers (2020-07-13T10:37:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.