Prediction-Powered Risk Monitoring of Deployed Models for Detecting Harmful Distribution Shifts
- URL: http://arxiv.org/abs/2602.02229v1
- Date: Mon, 02 Feb 2026 15:32:14 GMT
- Title: Prediction-Powered Risk Monitoring of Deployed Models for Detecting Harmful Distribution Shifts
- Authors: Guangyi Zhang, Yunlong Cai, Guanding Yu, Osvaldo Simeone,
- Abstract summary: We propose prediction-powered risk monitoring (PPRM), a semi-supervised risk-monitoring approach based on prediction-powered inference (PPI)<n>PPRM constructs anytime-valid lower bounds on the running risk by combining synthetic labels with a small set of true labels.<n>We demonstrate the effectiveness of PPRM through extensive experiments on image classification, large language model (LLM) and telecommunications monitoring tasks.
- Score: 51.37000123503367
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of monitoring model performance in dynamic environments where labeled data are limited. To this end, we propose prediction-powered risk monitoring (PPRM), a semi-supervised risk-monitoring approach based on prediction-powered inference (PPI). PPRM constructs anytime-valid lower bounds on the running risk by combining synthetic labels with a small set of true labels. Harmful shifts are detected via a threshold-based comparison with an upper bound on the nominal risk, satisfying assumption-free finite-sample guarantees in the probability of false alarm. We demonstrate the effectiveness of PPRM through extensive experiments on image classification, large language model (LLM), and telecommunications monitoring tasks.
Related papers
- CVPL: A Geometric Framework for Post-Hoc Linkage Risk Assessment in Protected Tabular Data [0.0]
Formal privacy metrics provide compliance-oriented guarantees but often fail to quantify actual linkability in released datasets.<n>CVPL represents linkage analysis as an operator pipeline comprising blocking, vectorization, latent projection, and similarity evaluation.<n> Empirical validation on 10,000 records across 19 configurations demonstrates that formal k-anonymity compliance may coexist with substantial empirical linkability.
arXiv Detail & Related papers (2026-02-11T16:39:07Z) - On Continuous Monitoring of Risk Violations under Unknown Shift [46.65571623109494]
We propose a general framework for the real-time monitoring of risk violations in evolving data streams.<n>Our method operates under minimal assumptions on the nature of encountered shifts.<n>We illustrate the effectiveness of our approach by monitoring risks in outlier detection and set prediction under a variety of shifts.
arXiv Detail & Related papers (2025-06-19T15:52:24Z) - Risk-Averse Certification of Bayesian Neural Networks [70.44969603471903]
We propose a Risk-Averse Certification framework for Bayesian neural networks called RAC-BNN.<n>Our method leverages sampling and optimisation to compute a sound approximation of the output set of a BNN.<n>We validate RAC-BNN on a range of regression and classification benchmarks and compare its performance with a state-of-the-art method.
arXiv Detail & Related papers (2024-11-29T14:22:51Z) - Data-Adaptive Tradeoffs among Multiple Risks in Distribution-Free Prediction [55.77015419028725]
We develop methods that permit valid control of risk when threshold and tradeoff parameters are chosen adaptively.
Our methodology supports monotone and nearly-monotone risks, but otherwise makes no distributional assumptions.
arXiv Detail & Related papers (2024-03-28T17:28:06Z) - Forking Uncertainties: Reliable Prediction and Model Predictive Control
with Sequence Models via Conformal Risk Control [40.918012779935246]
We introduce a novel post-hoc calibration procedure that operates on the predictions produced by any pre-designed probabilistic forecaster to yield reliable error bars.
Unlike the state of the art, PTS-CRC can satisfy reliability definitions beyond coverage.
We experimentally validate the performance of PTS-CRC prediction and control by studying a number of use cases in the context of wireless networking.
arXiv Detail & Related papers (2023-10-16T11:35:41Z) - How to Trust Your Diffusion Model: A Convex Optimization Approach to
Conformal Risk Control [9.811982443156063]
We focus on image-to-image regression tasks and we present a generalization of the Risk-Controlling Prediction Sets (RCPS) procedure.
Ours relies on a novel convex optimization approach that allows for multidimensional risk control while provably minimizing the mean interval length.
We illustrate our approach on two real-world image denoising problems: on natural images of faces as well as on computed tomography (CT) scans of the abdomen.
arXiv Detail & Related papers (2023-02-07T23:01:16Z) - Robust Control for Dynamical Systems With Non-Gaussian Noise via Formal
Abstractions [59.605246463200736]
We present a novel controller synthesis method that does not rely on any explicit representation of the noise distributions.
First, we abstract the continuous control system into a finite-state model that captures noise by probabilistic transitions between discrete states.
We use state-of-the-art verification techniques to provide guarantees on the interval Markov decision process and compute a controller for which these guarantees carry over to the original control system.
arXiv Detail & Related papers (2023-01-04T10:40:30Z) - Monitoring machine learning (ML)-based risk prediction algorithms in the
presence of confounding medical interventions [4.893345190925178]
Performance monitoring of machine learning (ML)-based risk prediction models in healthcare is complicated by the issue of confounding medical interventions (CMI)
A simple approach is to ignore CMI and monitor only the untreated patients, whose outcomes remain unaltered.
We show that valid inference is still possible if one monitors conditional performance and if either conditional exchangeability or time-constant selection bias hold.
arXiv Detail & Related papers (2022-11-17T18:54:34Z) - Risk-Sensitive Sequential Action Control with Multi-Modal Human
Trajectory Forecasting for Safe Crowd-Robot Interaction [55.569050872780224]
We present an online framework for safe crowd-robot interaction based on risk-sensitive optimal control, wherein the risk is modeled by the entropic risk measure.
Our modular approach decouples the crowd-robot interaction into learning-based prediction and model-based control.
A simulation study and a real-world experiment show that the proposed framework can accomplish safe and efficient navigation while avoiding collisions with more than 50 humans in the scene.
arXiv Detail & Related papers (2020-09-12T02:02:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.