AlerTiger: Deep Learning for AI Model Health Monitoring at LinkedIn
- URL: http://arxiv.org/abs/2306.01977v1
- Date: Sat, 3 Jun 2023 01:21:58 GMT
- Title: AlerTiger: Deep Learning for AI Model Health Monitoring at LinkedIn
- Authors: Zhentao Xu, Ruoying Wang, Girish Balaji, Manas Bundele, Xiaofei Liu,
Leo Liu, Tie Wang
- Abstract summary: AlerTiger helps AI teams across the company monitor their AI models' health.
System consists of four major steps: model statistics generation, deep-learning-based anomaly detection, anomaly post-processing, and user alerting.
- Score: 4.020770981811131
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Data-driven companies use AI models extensively to develop products and
intelligent business solutions, making the health of these models crucial for
business success. Model monitoring and alerting in industries pose unique
challenges, including a lack of clear model health metrics definition, label
sparsity, and fast model iterations that result in short-lived models and
features. As a product, there are also requirements for scalability,
generalizability, and explainability. To tackle these challenges, we propose
AlerTiger, a deep-learning-based MLOps model monitoring system that helps AI
teams across the company monitor their AI models' health by detecting anomalies
in models' input features and output score over time. The system consists of
four major steps: model statistics generation, deep-learning-based anomaly
detection, anomaly post-processing, and user alerting. Our solution generates
three categories of statistics to indicate AI model health, offers a two-stage
deep anomaly detection solution to address label sparsity and attain the
generalizability of monitoring new models, and provides holistic reports for
actionable alerts. This approach has been deployed to most of LinkedIn's
production AI models for over a year and has identified several model issues
that later led to significant business metric gains after fixing.
Related papers
- Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - Learning-based Models for Vulnerability Detection: An Extensive Study [3.1317409221921144]
We extensively and comprehensively investigate two types of state-of-the-art learning-based approaches.
We experimentally demonstrate the priority of sequence-based models and the limited abilities of both graph-based models.
arXiv Detail & Related papers (2024-08-14T13:01:30Z) - A Reliable Framework for Human-in-the-Loop Anomaly Detection in Time Series [17.08674819906415]
We introduce HILAD, a novel framework designed to foster a dynamic and bidirectional collaboration between humans and AI.
Through our visual interface, HILAD empowers domain experts to detect, interpret, and correct unexpected model behaviors at scale.
arXiv Detail & Related papers (2024-05-06T07:44:07Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - A Hybrid Approach for Smart Alert Generation [28.38472792385083]
Anomaly detection is an important task in network management.
deploying intelligent alert systems in real-world large-scale networking systems is challenging.
We propose a hybrid model for an alert system that combines statistical models with a whitelist mechanism.
arXiv Detail & Related papers (2023-06-02T14:52:32Z) - Safe AI for health and beyond -- Monitoring to transform a health
service [51.8524501805308]
We will assess the infrastructure required to monitor the outputs of a machine learning algorithm.
We will present two scenarios with examples of monitoring and updates of models.
arXiv Detail & Related papers (2023-03-02T17:27:45Z) - Data-Driven and SE-assisted AI Model Signal-Awareness Enhancement and
Introspection [61.571331422347875]
We propose a data-driven approach to enhance models' signal-awareness.
We combine the SE concept of code complexity with the AI technique of curriculum learning.
We achieve up to 4.8x improvement in model signal awareness.
arXiv Detail & Related papers (2021-11-10T17:58:18Z) - A Simple and Interpretable Predictive Model for Healthcare [0.0]
Deep learning models are currently dominating most state-of-the-art solutions for disease prediction.
These deep learning models, with trainable parameters running into millions, require huge amounts of compute and data to train and deploy.
We develop a simpler yet interpretable non-deep learning based model for application to EHR data.
arXiv Detail & Related papers (2020-07-27T08:13:37Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.