Incremental Uncertainty-aware Performance Monitoring with Active Labeling Intervention
- URL: http://arxiv.org/abs/2505.07023v1
- Date: Sun, 11 May 2025 15:35:55 GMT
- Title: Incremental Uncertainty-aware Performance Monitoring with Active Labeling Intervention
- Authors: Alexander Koebler, Thomas Decker, Ingo Thon, Volker Tresp, Florian Buettner,
- Abstract summary: We propose Incremental Uncertainty-aware Performance Monitoring (IUPM), a label-free method that estimates performance changes by modeling gradual shifts using optimal transport.<n>IUPM quantifies the uncertainty in the performance prediction and introduces an active labeling procedure to restore a reliable estimate under a limited labeling budget.<n>Our experiments show that IUPM outperforms existing performance estimation baselines in various gradual shift scenarios.
- Score: 64.12447263206381
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of monitoring machine learning models under gradual distribution shifts, where circumstances change slowly over time, often leading to unnoticed yet significant declines in accuracy. To address this, we propose Incremental Uncertainty-aware Performance Monitoring (IUPM), a novel label-free method that estimates performance changes by modeling gradual shifts using optimal transport. In addition, IUPM quantifies the uncertainty in the performance prediction and introduces an active labeling procedure to restore a reliable estimate under a limited labeling budget. Our experiments show that IUPM outperforms existing performance estimation baselines in various gradual shift scenarios and that its uncertainty awareness guides label acquisition more effectively compared to other strategies.
Related papers
- Leveraging Data Augmentation and Siamese Learning for Predictive Process Monitoring [45.023332704223755]
SiamSA-PPM is a self-supervised learning framework that combines Siamese learning with Statistical Augmentation for Predictive Process Monitoring.<n>We show that SiamSA-PPM achieves competitive or superior performance compared to the SOTA in both next activity and final outcome prediction tasks.
arXiv Detail & Related papers (2025-07-24T10:57:20Z) - Active Test-time Vision-Language Navigation [60.69722522420299]
ATENA is a test-time active learning framework that enables a practical human-robot interaction via episodic feedback on uncertain navigation outcomes.<n>In particular, ATENA learns to increase certainty in successful episodes and decrease it in failed ones, improving uncertainty calibration.<n>In addition, we propose a self-active learning strategy that enables an agent to evaluate its navigation outcomes based on confident predictions.
arXiv Detail & Related papers (2025-06-07T02:24:44Z) - Reliably detecting model failures in deployment without labels [10.006585036887929]
This paper formalizes and addresses the problem of post-deployment deterioration (PDD) monitoring.<n>We propose D3M, a practical and efficient monitoring algorithm based on the disagreement of predictive models.<n> Empirical results on both standard benchmark and a real-world large-scale internal medicine dataset demonstrate the effectiveness of the framework.
arXiv Detail & Related papers (2025-06-05T13:56:18Z) - SAUP: Situation Awareness Uncertainty Propagation on LLM Agent [52.444674213316574]
Large language models (LLMs) integrated into multistep agent systems enable complex decision-making processes across various applications.<n>Existing uncertainty estimation methods primarily focus on final-step outputs, which fail to account for cumulative uncertainty over the multistep decision-making process and the dynamic interactions between agents and their environments.<n>We propose SAUP, a novel framework that propagates uncertainty through each step of an LLM-based agent's reasoning process.
arXiv Detail & Related papers (2024-12-02T01:31:13Z) - Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Weak Supervision Performance Evaluation via Partial Identification [46.73061437177238]
Programmatic Weak Supervision (PWS) enables supervised model training without direct access to ground truth labels.
We present a novel method to address this challenge by framing model evaluation as a partial identification problem.
Our approach derives reliable bounds on key metrics without requiring labeled data, overcoming core limitations in current weak supervision evaluation techniques.
arXiv Detail & Related papers (2023-12-07T07:15:11Z) - Augmenting Unsupervised Reinforcement Learning with Self-Reference [63.68018737038331]
Humans possess the ability to draw on past experiences explicitly when learning new tasks.
We propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information.
Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark.
arXiv Detail & Related papers (2023-11-16T09:07:34Z) - Characterizing Out-of-Distribution Error via Optimal Transport [15.284665509194134]
Methods of predicting a model's performance on OOD data without labels are important for machine learning safety.
We introduce a novel method for estimating model performance by leveraging optimal transport theory.
We show that our approaches significantly outperform existing state-of-the-art methods with an up to 3x lower prediction error.
arXiv Detail & Related papers (2023-05-25T01:37:13Z) - Architectural patterns for handling runtime uncertainty of data-driven
models in safety-critical perception [1.7616042687330642]
We present additional architectural patterns for handling uncertainty estimation.
We evaluate the four patterns qualitatively and quantitatively with respect to safety and performance gains.
We conclude that the consideration of context information of the driving situation makes it possible to accept more or less uncertainty depending on the inherent risk of the situation.
arXiv Detail & Related papers (2022-06-14T13:31:36Z) - Improving Regression Uncertainty Estimation Under Statistical Change [7.734726150561088]
We propose and implement a loss function for regression uncertainty estimation based on the Bayesian Validation Metric framework.
A series of experiments on in-distribution data show that the proposed method is competitive with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-09-16T20:32:58Z) - On the Practicality of Deterministic Epistemic Uncertainty [106.06571981780591]
deterministic uncertainty methods (DUMs) achieve strong performance on detecting out-of-distribution data.
It remains unclear whether DUMs are well calibrated and can seamlessly scale to real-world applications.
arXiv Detail & Related papers (2021-07-01T17:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.