Related papers: Amazon SageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models

Amazon SageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models

URL: http://arxiv.org/abs/2111.13657v1
Date: Fri, 26 Nov 2021 18:35:38 GMT
Title: Amazon SageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models
Authors: David Nigenda, Zohar Karnin, Muhammad Bilal Zafar, Raghu Ramesha, Alan Tan, Michele Donini, Krishnaram Kenthapadi
Abstract summary: We present Amazon SageMaker Model Monitor, a fully managed service that continuously monitors the quality of machine learning models hosted on Amazon SageMaker. Our system automatically detects data, concept, bias, and feature attribution drift in models in real-time and provides alerts so that model owners can take corrective actions.
Score: 15.013638492229376
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the increasing adoption of machine learning (ML) models and systems in high-stakes settings across different industries, guaranteeing a model's performance after deployment has become crucial. Monitoring models in production is a critical aspect of ensuring their continued performance and reliability. We present Amazon SageMaker Model Monitor, a fully managed service that continuously monitors the quality of machine learning models hosted on Amazon SageMaker. Our system automatically detects data, concept, bias, and feature attribution drift in models in real-time and provides alerts so that model owners can take corrective actions and thereby maintain high quality models. We describe the key requirements obtained from customers, system design and architecture, and methodology for detecting different types of drift. Further, we provide quantitative evaluations followed by use cases, insights, and lessons learned from more than 1.5 years of production deployment.

Related papers

GRAM: A Generative Foundation Reward Model for Reward Generalization [48.63394690265176]
We develop a generative reward model that is first trained via large-scale unsupervised learning and then fine-tuned via supervised learning.<n>This model generalizes well across several tasks, including response ranking, reinforcement learning from human feedback, and task adaptation with fine-tuning.
arXiv Detail & Related papers (2025-06-17T04:34:27Z)
Beyond Accuracy: What Matters in Designing Well-Behaved Models? [53.252827682118955]
We show that vision-language models exhibit high fairness on ImageNet-1k classification and strong robustness against domain changes. We conclude our study by introducing the QUBA score, a novel metric that ranks models across multiple dimensions of quality.
arXiv Detail & Related papers (2025-03-21T12:54:18Z)
Complementary Learning for Real-World Model Failure Detection [15.779651238128562]
We introduce complementary learning, where we use learned characteristics from different training paradigms to detect model errors. We demonstrate our approach by learning semantic and predictive motion labels in point clouds in a supervised and self-supervised manner. We perform a large-scale qualitative analysis and present LidarCODA, the first dataset with labeled anomalies in lidar point clouds.
arXiv Detail & Related papers (2024-07-19T13:36:35Z)
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios. We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z)
Beimingwu: A Learnware Dock System [42.54363998206648]
This paper describes Beimingwu, the first open-source learnware dock system providing foundational support for future research of learnware paradigm. The system significantly streamlines the model development for new user tasks, thanks to its integrated architecture and engine design. Notably, this is possible even for users with limited data and minimal expertise in machine learning, without compromising the raw data's security.
arXiv Detail & Related papers (2024-01-24T09:27:51Z)
QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement. QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights. We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z)
On-device Training: A First Overview on Existing Systems [6.551096686706628]
Efforts have been made to deploy some models on resource-constrained devices as well. This work targets to summarize and analyze state-of-the-art systems research that allows such on-device model training capabilities.
arXiv Detail & Related papers (2022-12-01T19:22:29Z)
A monitoring framework for deployed machine learning models with supply chain examples [2.904613270228912]
We describe a framework for monitoring machine learning models; and, (2) its implementation for a big data supply chain application. We use our implementation to study drift in model features, predictions, and performance on three real data sets.
arXiv Detail & Related papers (2022-11-11T14:31:38Z)
Synthetic Model Combination: An Instance-wise Approach to Unsupervised Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data. Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z)
Domain-aware Control-oriented Neural Models for Autonomous Underwater Vehicles [2.4779082385578337]
We present control-oriented parametric models with varying levels of domain-awareness. We employ universal differential equations to construct data-driven blackbox and graybox representations of the AUV dynamics.
arXiv Detail & Related papers (2022-08-15T17:01:14Z)
Model-Based Visual Planning with Self-Supervised Functional Distances [104.83979811803466]
We present a self-supervised method for model-based visual goal reaching. Our approach learns entirely using offline, unlabeled data. We find that this approach substantially outperforms both model-free and model-based prior methods.
arXiv Detail & Related papers (2020-12-30T23:59:09Z)
Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy. We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space. We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology [53.063411515511056]
We propose a process model for the development of machine learning applications. The first phase combines business and data understanding as data availability oftentimes affects the feasibility of the project. The sixth phase covers state-of-the-art approaches for monitoring and maintenance of a machine learning applications.
arXiv Detail & Related papers (2020-03-11T08:25:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.