Designing monitoring strategies for deployed machine learning
algorithms: navigating performativity through a causal lens
- URL: http://arxiv.org/abs/2311.11463v2
- Date: Mon, 26 Feb 2024 07:51:44 GMT
- Title: Designing monitoring strategies for deployed machine learning
algorithms: navigating performativity through a causal lens
- Authors: Jean Feng, Adarsh Subbaswamy, Alexej Gossmann, Harvineet Singh,
Berkman Sahiner, Mi-Ok Kim, Gene Pennello, Nicholas Petrick, Romain
Pirracchio, Fan Xia
- Abstract summary: The aim of this work is to highlight the relatively under-appreciated complexity of designing a monitoring strategy.
We consider an ML-based risk prediction algorithm for predicting unplanned readmissions.
Results from this case study emphasize the seemingly simple (and obvious) fact that not all monitoring systems are created equal.
- Score: 6.329470650220206
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: After a machine learning (ML)-based system is deployed, monitoring its
performance is important to ensure the safety and effectiveness of the
algorithm over time. When an ML algorithm interacts with its environment, the
algorithm can affect the data-generating mechanism and be a major source of
bias when evaluating its standalone performance, an issue known as
performativity. Although prior work has shown how to validate models in the
presence of performativity using causal inference techniques, there has been
little work on how to monitor models in the presence of performativity. Unlike
the setting of model validation, there is much less agreement on which
performance metrics to monitor. Different monitoring criteria impact how
interpretable the resulting test statistic is, what assumptions are needed for
identifiability, and the speed of detection. When this choice is further
coupled with the decision to use observational versus interventional data, ML
deployment teams are faced with a multitude of monitoring options. The aim of
this work is to highlight the relatively under-appreciated complexity of
designing a monitoring strategy and how causal reasoning can provide a
systematic framework for choosing between these options. As a motivating
example, we consider an ML-based risk prediction algorithm for predicting
unplanned readmissions. Bringing together tools from causal inference and
statistical process control, we consider six monitoring procedures (three
candidate monitoring criteria and two data sources) and investigate their
operating characteristics in simulation studies. Results from this case study
emphasize the seemingly simple (and obvious) fact that not all monitoring
systems are created equal, which has real-world impacts on the design and
documentation of ML monitoring systems.
Related papers
- Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Monitoring Algorithmic Fairness under Partial Observations [3.790015813774933]
runtime verification techniques have been introduced to monitor the algorithmic fairness of deployed systems.
Previous monitoring techniques assume full observability of the states of the monitored system.
We extend fairness monitoring to systems modeled as partially observed Markov chains.
arXiv Detail & Related papers (2023-08-01T07:35:54Z) - Alignment-based conformance checking over probabilistic events [4.060731229044571]
We introduce a weighted trace model and weighted alignment cost function, and a custom threshold parameter that controls the level of confidence on the event data.
The resulting algorithm considers activities of lower but sufficiently high probability that better align with the process model.
arXiv Detail & Related papers (2022-09-09T14:07:37Z) - Lightweight Automated Feature Monitoring for Data Streams [1.4658400971135652]
We propose a flexible system, Feature Monitoring (FM), that detects data drifts in such data sets.
It monitors all features that are used by the system, while providing an interpretable features ranking whenever an alarm occurs.
This illustrates how FM eliminates the need to add custom signals to detect specific types of problems and that monitoring the available space of features is often enough.
arXiv Detail & Related papers (2022-07-18T14:38:11Z) - Learning Robust Output Control Barrier Functions from Safe Expert Demonstrations [50.37808220291108]
This paper addresses learning safe output feedback control laws from partial observations of expert demonstrations.
We first propose robust output control barrier functions (ROCBFs) as a means to guarantee safety.
We then formulate an optimization problem to learn ROCBFs from expert demonstrations that exhibit safe system behavior.
arXiv Detail & Related papers (2021-11-18T23:21:00Z) - Benchmarking Safety Monitors for Image Classifiers with Machine Learning [0.0]
High-accurate machine learning (ML) image classifiers cannot guarantee that they will not fail at operation.
The use of fault tolerance mechanisms such as safety monitors is a promising direction to keep the system in a safe state.
This paper aims at establishing a baseline framework for benchmarking monitors for ML image classifiers.
arXiv Detail & Related papers (2021-10-04T07:52:23Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms.
We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance.
We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z) - Anticipating the Long-Term Effect of Online Learning in Control [75.6527644813815]
AntLer is a design algorithm for learning-based control laws that anticipates learning.
We show that AntLer approximates an optimal solution arbitrarily accurately with probability one.
arXiv Detail & Related papers (2020-07-24T07:00:14Z) - How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance.
We formulate a quality measure for the data set, which we refer to as $rho$-gap.
We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z) - Collaborative Inference for Efficient Remote Monitoring [34.27630312942825]
A naive approach to resolve this on the model level is to use simpler architectures.
We propose an alternative solution by decomposing the predictive model as the sum of a simple function which serves as a local monitoring tool.
A sign requirement is imposed on the latter to ensure that the local monitoring function is safe.
arXiv Detail & Related papers (2020-02-12T01:57:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.