Related papers: Designing monitoring strategies for deployed machine learning algorithms: navigating performativity through a causal lens

Designing monitoring strategies for deployed machine learning algorithms: navigating performativity through a causal lens

URL: http://arxiv.org/abs/2311.11463v2
Date: Mon, 26 Feb 2024 07:51:44 GMT
Title: Designing monitoring strategies for deployed machine learning algorithms: navigating performativity through a causal lens
Authors: Jean Feng, Adarsh Subbaswamy, Alexej Gossmann, Harvineet Singh, Berkman Sahiner, Mi-Ok Kim, Gene Pennello, Nicholas Petrick, Romain Pirracchio, Fan Xia
Abstract summary: The aim of this work is to highlight the relatively under-appreciated complexity of designing a monitoring strategy. We consider an ML-based risk prediction algorithm for predicting unplanned readmissions. Results from this case study emphasize the seemingly simple (and obvious) fact that not all monitoring systems are created equal.
Score: 6.329470650220206
License: http://creativecommons.org/licenses/by/4.0/
Abstract: After a machine learning (ML)-based system is deployed, monitoring its performance is important to ensure the safety and effectiveness of the algorithm over time. When an ML algorithm interacts with its environment, the algorithm can affect the data-generating mechanism and be a major source of bias when evaluating its standalone performance, an issue known as performativity. Although prior work has shown how to validate models in the presence of performativity using causal inference techniques, there has been little work on how to monitor models in the presence of performativity. Unlike the setting of model validation, there is much less agreement on which performance metrics to monitor. Different monitoring criteria impact how interpretable the resulting test statistic is, what assumptions are needed for identifiability, and the speed of detection. When this choice is further coupled with the decision to use observational versus interventional data, ML deployment teams are faced with a multitude of monitoring options. The aim of this work is to highlight the relatively under-appreciated complexity of designing a monitoring strategy and how causal reasoning can provide a systematic framework for choosing between these options. As a motivating example, we consider an ML-based risk prediction algorithm for predicting unplanned readmissions. Bringing together tools from causal inference and statistical process control, we consider six monitoring procedures (three candidate monitoring criteria and two data sources) and investigate their operating characteristics in simulation studies. Results from this case study emphasize the seemingly simple (and obvious) fact that not all monitoring systems are created equal, which has real-world impacts on the design and documentation of ML monitoring systems.

Related papers

Meta-Statistical Learning: Supervised Learning of Statistical Inference [59.463430294611626]
This work demonstrates that the tools and principles driving the success of large language models (LLMs) can be repurposed to tackle distribution-level tasks. We propose meta-statistical learning, a framework inspired by multi-instance learning that reformulates statistical inference tasks as supervised learning problems.
arXiv Detail & Related papers (2025-02-17T18:04:39Z)
Model Monitoring in the Absence of Labeled Data via Feature Attributions Distributions [5.167069404528051]
This thesis explores machine learning model monitoring ML before the predictions impact real-world decisions or users. The thesis is structured around two main themes: (i) AI alignment, measuring if AI models behave in a manner consistent with human values and (ii) performance monitoring, measuring if the models achieve specific accuracy goals or desires.
arXiv Detail & Related papers (2025-01-18T14:07:37Z)
A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy. We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods. By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z)
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts. We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z)
Monitoring Algorithmic Fairness under Partial Observations [3.790015813774933]
runtime verification techniques have been introduced to monitor the algorithmic fairness of deployed systems. Previous monitoring techniques assume full observability of the states of the monitored system. We extend fairness monitoring to systems modeled as partially observed Markov chains.
arXiv Detail & Related papers (2023-08-01T07:35:54Z)
Alignment-based conformance checking over probabilistic events [4.060731229044571]
We introduce a weighted trace model and weighted alignment cost function, and a custom threshold parameter that controls the level of confidence on the event data. The resulting algorithm considers activities of lower but sufficiently high probability that better align with the process model.
arXiv Detail & Related papers (2022-09-09T14:07:37Z)
Lightweight Automated Feature Monitoring for Data Streams [1.4658400971135652]
We propose a flexible system, Feature Monitoring (FM), that detects data drifts in such data sets. It monitors all features that are used by the system, while providing an interpretable features ranking whenever an alarm occurs. This illustrates how FM eliminates the need to add custom signals to detect specific types of problems and that monitoring the available space of features is often enough.
arXiv Detail & Related papers (2022-07-18T14:38:11Z)
Learning Robust Output Control Barrier Functions from Safe Expert Demonstrations [50.37808220291108]
This paper addresses learning safe output feedback control laws from partial observations of expert demonstrations. We first propose robust output control barrier functions (ROCBFs) as a means to guarantee safety. We then formulate an optimization problem to learn ROCBFs from expert demonstrations that exhibit safe system behavior.
arXiv Detail & Related papers (2021-11-18T23:21:00Z)
Benchmarking Safety Monitors for Image Classifiers with Machine Learning [0.0]
High-accurate machine learning (ML) image classifiers cannot guarantee that they will not fail at operation. The use of fault tolerance mechanisms such as safety monitors is a promising direction to keep the system in a safe state. This paper aims at establishing a baseline framework for benchmarking monitors for ML image classifiers.
arXiv Detail & Related papers (2021-10-04T07:52:23Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms. We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance. We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z)
Anticipating the Long-Term Effect of Online Learning in Control [75.6527644813815]
AntLer is a design algorithm for learning-based control laws that anticipates learning. We show that AntLer approximates an optimal solution arbitrarily accurately with probability one.
arXiv Detail & Related papers (2020-07-24T07:00:14Z)
How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance. We formulate a quality measure for the data set, which we refer to as $rho$-gap. We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z)
Collaborative Inference for Efficient Remote Monitoring [34.27630312942825]
A naive approach to resolve this on the model level is to use simpler architectures. We propose an alternative solution by decomposing the predictive model as the sum of a simple function which serves as a local monitoring tool. A sign requirement is imposed on the latter to ensure that the local monitoring function is safe.
arXiv Detail & Related papers (2020-02-12T01:57:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.