Boundary-Aware Adversarial Filtering for Reliable Diagnosis under Extreme Class Imbalance
- URL: http://arxiv.org/abs/2511.17629v1
- Date: Wed, 19 Nov 2025 02:15:58 GMT
- Title: Boundary-Aware Adversarial Filtering for Reliable Diagnosis under Extreme Class Imbalance
- Authors: Yanxuan Yu, Michael S. Hughes, Julien Lee, Jiacheng Zhou, Andrew F. Laine,
- Abstract summary: We propose AF-SMOTE, a mathematically motivated augmentation framework that first synthesizes minority points and then filters them by an adversarial discriminator and a boundary utility model.<n>We prove that, under mild assumptions on the decision boundary smoothness and class-conditional densities, our filtering step monotonically improves a surrogate of F_beta.<n>On MIMIC-IV proxy label prediction and canonical fraud detection benchmarks, AF-SMOTE attains higher recall and average precision than strong oversampling baselines.
- Score: 1.2948544197525087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study classification under extreme class imbalance where recall and calibration are both critical, for example in medical diagnosis scenarios. We propose AF-SMOTE, a mathematically motivated augmentation framework that first synthesizes minority points and then filters them by an adversarial discriminator and a boundary utility model. We prove that, under mild assumptions on the decision boundary smoothness and class-conditional densities, our filtering step monotonically improves a surrogate of F_beta (for beta >= 1) while not inflating Brier score. On MIMIC-IV proxy label prediction and canonical fraud detection benchmarks, AF-SMOTE attains higher recall and average precision than strong oversampling baselines (SMOTE, ADASYN, Borderline-SMOTE, SVM-SMOTE), and yields the best calibration. We further validate these gains across multiple additional datasets beyond MIMIC-IV. Our successful application of AF-SMOTE to a healthcare dataset using a proxy label demonstrates in a disease-agnostic way its practical value in clinical situations, where missing true positive cases in rare diseases can have severe consequences.
Related papers
- Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification [60.18369393468405]
Existing verifiers usually underperform owing to a lack of domain knowledge and limited calibration.<n>GLEAN compiles expert-curated protocols into trajectory-informed, well-calibrated correctness signals.<n>We empirically validate GLEAN with agentic clinical diagnosis across three diseases from the MIMIC-IV dataset.
arXiv Detail & Related papers (2026-03-03T09:36:43Z) - Suppressing Prior-Comparison Hallucinations in Radiology Report Generation via Semantically Decoupled Latent Steering [94.37535002230504]
We develop a training-free, inference-time control framework termed Semantically Decoupled Latent Steering.<n>Our approach constructs a semantic-free intervention vector via large language model (LLM)-driven semantic decomposition.<n>We show that our approach significantly reduces the probability of historical hallucinations.
arXiv Detail & Related papers (2026-02-27T04:49:01Z) - Similarity-as-Evidence: Calibrating Overconfident VLMs for Interpretable and Label-Efficient Medical Active Learning [10.264467364282865]
Similarity-as-Evidence (SaE) calibrates text-image similarities by introducing a Similarity Evidence Head (SEH)<n>SaE attains state-of-the-art macro-averaged accuracy of 82.57% on medical imaging datasets with a 20% label budget.
arXiv Detail & Related papers (2026-02-21T15:21:54Z) - X-Mark: Saliency-Guided Robust Dataset Ownership Verification for Medical Imaging [67.85884025186755]
High-quality medical imaging datasets are essential for training deep learning models, but their unauthorized use raises serious copyright and ethical concerns.<n>Medical imaging presents a unique challenge for existing dataset ownership verification methods designed for natural images.<n>We propose X-Mark, a sample-specific clean-label watermarking method for chest x-ray copyright protection.
arXiv Detail & Related papers (2026-02-10T00:03:43Z) - A Quad-Step Approach to Uncertainty-Aware Deep Learning for Skin Cancer Classification [13.993637404760355]
Deep learning models have shown promise in automating skin cancer classification.<n>However, challenges remain due to data scarcity and limited uncertainty awareness.<n>This study presents a comprehensive evaluation of DL-based skin lesion classification on the HAM10000 dataset.
arXiv Detail & Related papers (2025-06-12T02:29:16Z) - AI Alignment in Medical Imaging: Unveiling Hidden Biases Through Counterfactual Analysis [16.21270312974956]
We introduce a novel statistical framework to evaluate the dependency of medical imaging ML models on sensitive attributes, such as demographics.<n>We present a practical algorithm that combines conditional latent diffusion models with statistical hypothesis testing to identify and quantify such biases.
arXiv Detail & Related papers (2025-04-28T09:28:25Z) - A Novel Double Pruning method for Imbalanced Data using Information Entropy and Roulette Wheel Selection for Breast Cancer Diagnosis [2.8661021832561757]
The SMOTEBoost method generates synthetic data to balance the dataset, but it may overlook crucial overlapping regions near the decision boundary.<n>This paper proposes RE-SMOTEBoost, an enhanced version of SMOTEBoost, designed to overcome these limitations.<n>It incorporates a filtering mechanism based on information entropy to reduce noise, and borderline cases and improve the quality of generated data.
arXiv Detail & Related papers (2025-03-15T19:34:15Z) - EVolutionary Independent DEtermiNistiC Explanation [5.127310126394387]
This paper introduces the Evolutionary Independent Deterministic Explanation (EVIDENCE) theory.<n>EVIDENCE offers a deterministic, model-independent method for extracting significant signals from black-box models.<n> Practical applications of EVIDENCE include improving diagnostic accuracy in healthcare and enhancing audio signal analysis.
arXiv Detail & Related papers (2025-01-20T12:05:14Z) - Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites:
A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area.
We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions.
We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z) - Towards Reliable Medical Image Segmentation by Modeling Evidential Calibrated Uncertainty [57.023423137202485]
Concerns regarding the reliability of medical image segmentation persist among clinicians.<n>We introduce DEviS, an easily implementable foundational model that seamlessly integrates into various medical image segmentation networks.<n>By leveraging subjective logic theory, we explicitly model probability and uncertainty for medical image segmentation.
arXiv Detail & Related papers (2023-01-01T05:02:46Z) - BSM loss: A superior way in modeling aleatory uncertainty of
fine_grained classification [0.0]
We propose a modified Bootstrapping loss(BS loss) function with Mixup data augmentation strategy.
Our experiments indicated that BS loss with Mixup(BSM) model can halve the Expected Error(ECE) compared to standard data augmentation.
BSM model is able to perceive the semantic distance of out-of-domain data, demonstrating high potential in real-world clinical practice.
arXiv Detail & Related papers (2022-06-09T13:06:51Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Federated Deep AUC Maximization for Heterogeneous Data with a Constant
Communication Complexity [77.78624443410216]
We propose improved FDAM algorithms for detecting heterogeneous chest data.
A result of this paper is that the communication of the proposed algorithm is strongly independent of the number of machines and also independent of the accuracy level.
Experiments have demonstrated the effectiveness of our FDAM algorithm on benchmark datasets and on medical chest Xray images from different organizations.
arXiv Detail & Related papers (2021-02-09T04:05:19Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.