A multimodal slice discovery framework for systematic failure detection and explanation in medical image classification
- URL: http://arxiv.org/abs/2602.24183v1
- Date: Fri, 27 Feb 2026 17:06:37 GMT
- Title: A multimodal slice discovery framework for systematic failure detection and explanation in medical image classification
- Authors: Yixuan Liu, Kanwal K. Bhatia, Ahmed E. Fetit,
- Abstract summary: Existing auditing approaches rely on unimodal features or metadata-based subgroup analyses.<n>We introduce the first automated auditing framework that extends slice discovery methods to multimodal representations.<n> Comprehensive experiments were conducted under common failure scenarios using the MIMIC-CXR-JPG dataset.
- Score: 2.173091573209431
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite advances in machine learning-based medical image classifiers, the safety and reliability of these systems remain major concerns in practical settings. Existing auditing approaches mainly rely on unimodal features or metadata-based subgroup analyses, which are limited in interpretability and often fail to capture hidden systematic failures. To address these limitations, we introduce the first automated auditing framework that extends slice discovery methods to multimodal representations specifically for medical applications. Comprehensive experiments were conducted under common failure scenarios using the MIMIC-CXR-JPG dataset, demonstrating the framework's strong capability in both failure discovery and explanation generation. Our results also show that multimodal information generally allows more comprehensive and effective auditing of classifiers, while unimodal variants beyond image-only inputs exhibit strong potential in scenarios where resources are constrained.
Related papers
- ProbeLLM: Automating Principled Diagnosis of LLM Failures [89.44131968886184]
We propose ProbeLLM, a benchmark-agnostic automated probing framework that elevates weakness discovery from individual failures to structured failure modes.<n>By restricting probing to verifiable test cases and leveraging tool-augmented generation and verification, ProbeLLM grounds failure discovery in reliable evidence.
arXiv Detail & Related papers (2026-02-13T14:33:13Z) - Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection [76.91230292971115]
Large language model (LLM)-based multi-agent systems (MAS) have shown strong capabilities in solving complex tasks.<n>XG-Guard is an explainable and fine-grained safeguarding framework for detecting malicious agents in MAS.
arXiv Detail & Related papers (2025-12-21T13:46:36Z) - When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning [22.39245479538899]
We introduce modality sabotage, a diagnostic failure mode in which a high-confidence unimodal error overrides other evidence and misleads the fused result.<n>A model-agnostic evaluation layer treats each modality as an agent, producing candidate labels and a brief self-assessment used for auditing.<n>A simple fusion mechanism aggregates these outputs, exposing contributors (modalities supporting correct outcomes) and saboteurs (modalities that mislead)
arXiv Detail & Related papers (2025-11-04T18:20:13Z) - Graph Neural Network-Based Semi-Supervised Open-Set Fault Diagnosis for Marine Machinery Systems [0.42970700836450487]
This paper proposes a semi-supervised open-set fault diagnosis (SOFD) framework that enhances and extends the applicability of deep learning models in open-set fault diagnosis scenarios.<n>The framework includes a reliability subset construction process, which uses a multi-layer fusion feature representation extracted by a supervised feature learning model to select an unlabeled test subset.<n>The labeled training set and pseudo-labeled test subset are then fed into a semi-supervised diagnosis model to learn discriminative features for each class, enabling accurate classification of known faults and effective detection of unknown samples.
arXiv Detail & Related papers (2025-11-03T06:06:25Z) - Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline [56.790045049514326]
Two major forms of deception dominate: human-crafted misinformation and AI-generated content.<n>We propose Unified Multimodal Fake Content Detection (UMFDet), a framework designed to handle both forms of deception.<n>UMFDet achieves robust and consistent performance across both misinformation types, outperforming specialized baselines.
arXiv Detail & Related papers (2025-09-30T09:26:32Z) - BenchReAD: A systematic benchmark for retinal anomaly detection [23.15668882564837]
We introduce a benchmark for retinal anomaly detection, which is comprehensive and systematic in terms of data and algorithm.<n>We find that a fully supervised approach leveraging disentangled representations of abnormalities (DRA) achieves the best performance but suffers from significant drops in performance when encountering certain unseen anomalies.<n>Inspired by the memory bank mechanisms in one-class supervised learning, we propose NFM-DRA, which integrates DRA with a Normal Feature Memory to mitigate the performance degradation.
arXiv Detail & Related papers (2025-07-14T17:13:08Z) - Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation [30.697291934309206]
multimodal data is rare in real-world applications due to a lack of medical equipment and concerns about data privacy.<n>Traditional deep learning methods typically address these issues by learning representations in latent space.<n>Authors propose the Essence-Point and Disentangle Representation Learning (EDRL) strategy, which integrates a self-distillation mechanism into an end-to-end framework.
arXiv Detail & Related papers (2025-03-07T10:58:38Z) - Regularized Contrastive Partial Multi-view Outlier Detection [76.77036536484114]
We propose a novel method named Regularized Contrastive Partial Multi-view Outlier Detection (RCPMOD)
In this framework, we utilize contrastive learning to learn view-consistent information and distinguish outliers by the degree of consistency.
Experimental results on four benchmark datasets demonstrate that our proposed approach could outperform state-of-the-art competitors.
arXiv Detail & Related papers (2024-08-02T14:34:27Z) - TVDiag: A Task-oriented and View-invariant Failure Diagnosis Framework with Multimodal Data [11.373761837547852]
Microservice-based systems often suffer from reliability issues due to their intricate interactions and expanding scale.<n>Traditional failure diagnosis methods that use single-modal data can hardly cover all failure scenarios due to the restricted information.<n>We propose textitTVDiag, a multimodal failure diagnosis framework for locating culprit microservice instances and identifying their failure types.
arXiv Detail & Related papers (2024-07-29T05:26:57Z) - A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [89.92916473403108]
This paper proposes a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.<n>The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.<n>We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - Understanding metric-related pitfalls in image analysis validation [59.15220116166561]
This work provides the first comprehensive common point of access to information on pitfalls related to validation metrics in image analysis.
Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy.
arXiv Detail & Related papers (2023-02-03T14:57:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.