Related papers: Detecting Performance Degradation under Data Shift in Pathology Vision-Language Model

Detecting Performance Degradation under Data Shift in Pathology Vision-Language Model

URL: http://arxiv.org/abs/2601.00716v1
Date: Fri, 02 Jan 2026 15:12:06 GMT
Title: Detecting Performance Degradation under Data Shift in Pathology Vision-Language Model
Authors: Hao Guan, Li Zhou,
Abstract summary: Vision-Language Models have demonstrated strong potential in medical image analysis and disease diagnosis.<n>Their performance may deteriorate when the input data distribution shifts from that observed during development.<n>In this study, we investigate performance degradation detection under data shift in a state-of-the-art pathology VLM.
Score: 3.7387218556204154
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-Language Models have demonstrated strong potential in medical image analysis and disease diagnosis. However, after deployment, their performance may deteriorate when the input data distribution shifts from that observed during development. Detecting such performance degradation is essential for clinical reliability, yet remains challenging for large pre-trained VLMs operating without labeled data. In this study, we investigate performance degradation detection under data shift in a state-of-the-art pathology VLM. We examine both input-level data shift and output-level prediction behavior to understand their respective roles in monitoring model reliability. To facilitate systematic analysis of input data shift, we develop DomainSAT, a lightweight toolbox with a graphical interface that integrates representative shift detection algorithms and enables intuitive exploration of data shift. Our analysis shows that while input data shift detection is effective at identifying distributional changes and providing early diagnostic signals, it does not always correspond to actual performance degradation. Motivated by this observation, we further study output-based monitoring and introduce a label-free, confidence-based degradation indicator that directly captures changes in model prediction confidence. We find that this indicator exhibits a close relationship with performance degradation and serves as an effective complement to input shift detection. Experiments on a large-scale pathology dataset for tumor classification demonstrate that combining input data shift detection and output confidence-based indicators enables more reliable detection and interpretation of performance degradation in VLMs under data shift. These findings provide a practical and complementary framework for monitoring the reliability of foundation models in digital pathology.

Related papers

Correcting False Alarms from Unseen: Adapting Graph Anomaly Detectors at Test Time [60.341117019125214]
We propose a lightweight and plug-and-play Test-time adaptation framework for correcting Unseen Normal pattErns in graph anomaly detection (GAD)<n>To address semantic confusion, a graph aligner is employed to align the shifted data to the original one at the graph attribute level.<n>Extensive experiments on 10 real-world datasets demonstrate that TUNE significantly enhances the generalizability of pre-trained GAD models to both synthetic and real unseen normal patterns.
arXiv Detail & Related papers (2025-11-10T12:10:05Z)
Revisiting Multivariate Time Series Forecasting with Missing Values [65.30332997607141]
Missing values are common in real-world time series.<n>Current approaches have developed an imputation-then-prediction framework that uses imputation modules to fill in missing values, followed by forecasting on the imputed data.<n>This framework overlooks a critical issue: there is no ground truth for the missing values, making the imputation process susceptible to errors that can degrade prediction accuracy.<n>We introduce Consistency-Regularized Information Bottleneck (CRIB), a novel framework built on the Information Bottleneck principle.
arXiv Detail & Related papers (2025-09-27T20:57:48Z)
Reliably Detecting Model Failures in Deployment Without Labels [14.069153343960734]
This paper formalizes and addresses the problem of post-deployment deterioration (PDD) monitoring.<n>We propose D3M, a practical and efficient monitoring algorithm based on the disagreement of predictive models.<n> Empirical results on both standard benchmark and a real-world large-scale internal medicine dataset demonstrate the effectiveness of the framework.
arXiv Detail & Related papers (2025-06-05T13:56:18Z)
Representation Learning for Wearable-Based Applications in the Case of Missing Data [20.37256375888501]
multimodal sensor data in real-world environments is still challenging due to low data quality and limited data annotations. We investigate representation learning for imputing missing wearable data and compare it with state-of-the-art statistical approaches. Our study provides insights for the design and development of masking-based self-supervised learning tasks.
arXiv Detail & Related papers (2024-01-08T08:21:37Z)
Causal Disentanglement Hidden Markov Model for Fault Diagnosis [55.90917958154425]
We propose a Causal Disentanglement Hidden Markov model (CDHM) to learn the causality in the bearing fault mechanism. Specifically, we make full use of the time-series data and progressively disentangle the vibration signal into fault-relevant and fault-irrelevant factors. To expand the scope of the application, we adopt unsupervised domain adaptation to transfer the learned disentangled representations to other working environments.
arXiv Detail & Related papers (2023-08-06T05:58:45Z)
Self-Supervised Graph Transformer for Deepfake Detection [1.8133635752982105]
Deepfake detection methods have shown promising results in recognizing forgeries within a given dataset. Deepfake detection system must remain impartial to forgery types, appearance, and quality for guaranteed generalizable detection performance. This study introduces a deepfake detection framework, leveraging a self-supervised pre-training model that delivers exceptional generalization ability.
arXiv Detail & Related papers (2023-07-27T17:22:41Z)
DSV: An Alignment Validation Loss for Self-supervised Outlier Model Selection [23.253175824487652]
Self-supervised learning (SSL) has proven effective in solving various problems by generating internal supervisory signals. Unsupervised anomaly detection, which faces the high cost of obtaining true labels, is an area that can greatly benefit from SSL. We propose DSV (Discordance and Separability Validation), an unsupervised validation loss to select high-performing detection models with effective augmentation HPs.
arXiv Detail & Related papers (2023-07-13T02:45:29Z)
Personalized Anomaly Detection in PPG Data using Representation Learning and Biometric Identification [3.8036939971290007]
Photoplethysmography signals hold significant potential for continuous fitness-health monitoring. Photoplethysmography signals, typically acquired from wearable devices, hold significant potential for continuous fitness-health monitoring. This paper introduces a two-stage framework leveraging representation learning and personalization to improve anomaly detection performance in PPG data.
arXiv Detail & Related papers (2023-07-12T18:05:05Z)
Energy-based Out-of-Distribution Detection for Graph Neural Networks [76.0242218180483]
We propose a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSafe. GNNSafe achieves up to $17.0%$ AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area.
arXiv Detail & Related papers (2023-02-06T16:38:43Z)
PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows. Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z)
Learning Informative Health Indicators Through Unsupervised Contrastive Learning [5.193936395510582]
This study proposes a novel, versatile and unsupervised approach to learn health indicators. The approach is evaluated on two tasks and case studies with different characteristics. Our results show that the proposed methodology effectively learns a health indicator that follows the wear of milling machines.
arXiv Detail & Related papers (2022-08-28T21:04:42Z)
Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution. We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator. Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.