Curating Naturally Adversarial Datasets for Learning-Enabled Medical
Cyber-Physical Systems
- URL: http://arxiv.org/abs/2309.00543v2
- Date: Tue, 7 Nov 2023 14:18:34 GMT
- Title: Curating Naturally Adversarial Datasets for Learning-Enabled Medical
Cyber-Physical Systems
- Authors: Sydney Pugh, Ivan Ruchkin, Insup Lee, James Weimer
- Abstract summary: Existing research focuses on robustness to synthetic adversarial examples, crafted by adding imperceptible perturbations to clean input data.
We propose a method to curate datasets comprised of natural adversarial examples to evaluate model robustness.
- Score: 5.349773727704873
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models have shown promising predictive accuracy for time-series
healthcare applications. However, ensuring the robustness of these models is
vital for building trustworthy AI systems. Existing research predominantly
focuses on robustness to synthetic adversarial examples, crafted by adding
imperceptible perturbations to clean input data. However, these synthetic
adversarial examples do not accurately reflect the most challenging real-world
scenarios, especially in the context of healthcare data. Consequently,
robustness to synthetic adversarial examples may not necessarily translate to
robustness against naturally occurring adversarial examples, which is highly
desirable for trustworthy AI. We propose a method to curate datasets comprised
of natural adversarial examples to evaluate model robustness. The method relies
on probabilistic labels obtained from automated weakly-supervised labeling that
combines noisy and cheap-to-obtain labeling heuristics. Based on these labels,
our method adversarially orders the input data and uses this ordering to
construct a sequence of increasingly adversarial datasets. Our evaluation on
six medical case studies and three non-medical case studies demonstrates the
efficacy and statistical validity of our approach to generating naturally
adversarial datasets
Related papers
- Synthetic Simplicity: Unveiling Bias in Medical Data Augmentation [0.7499722271664144]
Synthetic data is becoming increasingly integral in data-scarce fields such as medical imaging.
downstream neural networks often exploit spurious distinctions between real and synthetic data when there is a strong correlation between the data source and the task label.
This exploitation manifests as textitsimplicity bias, where models overly rely on superficial features rather than genuine task-related complexities.
arXiv Detail & Related papers (2024-07-31T15:14:17Z) - Reliability in Semantic Segmentation: Can We Use Synthetic Data? [69.28268603137546]
We show for the first time how synthetic data can be specifically generated to assess comprehensively the real-world reliability of semantic segmentation models.
This synthetic data is employed to evaluate the robustness of pretrained segmenters.
We demonstrate how our approach can be utilized to enhance the calibration and OOD detection capabilities of segmenters.
arXiv Detail & Related papers (2023-12-14T18:56:07Z) - The Real Deal Behind the Artificial Appeal: Inferential Utility of Tabular Synthetic Data [40.165159490379146]
We show that the rate of false-positive findings (type 1 error) will be unacceptably high, even when the estimates are unbiased.
Despite the use of a previously proposed correction factor, this problem persists for deep generative models.
arXiv Detail & Related papers (2023-12-13T02:04:41Z) - A Discrepancy Aware Framework for Robust Anomaly Detection [51.710249807397695]
We present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies.
Our method leverages an appearance-agnostic cue to guide the decoder in identifying defects, thereby alleviating its reliance on synthetic appearance.
Under the simple synthesis strategies, it outperforms existing methods by a large margin. Furthermore, it also achieves the state-of-the-art localization performance.
arXiv Detail & Related papers (2023-10-11T15:21:40Z) - On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model,
Data, and Training [109.9218185711916]
Aspect-based sentiment analysis (ABSA) aims at automatically inferring the specific sentiment polarities toward certain aspects of products or services behind social media texts or reviews.
We propose to enhance the ABSA robustness by systematically rethinking the bottlenecks from all possible angles, including model, data, and training.
arXiv Detail & Related papers (2023-04-19T11:07:43Z) - Synthetic Data in Healthcare [10.555189948915492]
We present the cases for physical and statistical simulations for creating data and the proposed applications in healthcare and medicine.
We discuss that while synthetics can promote privacy, equity, safety and continual and causal learning, they also run the risk of introducing flaws, blind spots and propagating or exaggerating biases.
arXiv Detail & Related papers (2023-04-06T17:23:39Z) - Data AUDIT: Identifying Attribute Utility- and Detectability-Induced
Bias in Task Models [8.420252576694583]
We present a first technique for the rigorous, quantitative screening of medical image datasets.
Our method decomposes the risks associated with dataset attributes in terms of their detectability and utility.
Using our method, we show our screening method reliably identifies nearly imperceptible bias-inducing artifacts.
arXiv Detail & Related papers (2023-04-06T16:50:15Z) - Investigating Bias with a Synthetic Data Generator: Empirical Evidence
and Philosophical Interpretation [66.64736150040093]
Machine learning applications are becoming increasingly pervasive in our society.
Risk is that they will systematically spread the bias embedded in data.
We propose to analyze biases by introducing a framework for generating synthetic data with specific types of bias and their combinations.
arXiv Detail & Related papers (2022-09-13T11:18:50Z) - BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot
Detection [63.447493500066045]
This work proposes a data driven learning model for the synthesis of keystroke biometric data.
The proposed method is compared with two statistical approaches based on Universal and User-dependent models.
Our experimental framework considers a dataset with 136 million keystroke events from 168 thousand subjects.
arXiv Detail & Related papers (2022-07-27T09:26:15Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.