Measuring Adversarial Datasets
- URL: http://arxiv.org/abs/2311.03566v1
- Date: Mon, 6 Nov 2023 22:08:16 GMT
- Title: Measuring Adversarial Datasets
- Authors: Yuanchen Bai, Raoyi Huang, Vijay Viswanathan, Tzu-Sheng Kuo,
Tongshuang Wu
- Abstract summary: Researchers have curated various adversarial datasets for capturing model deficiencies that cannot be revealed in standard benchmark datasets.
There is still no methodology to measure the intended and unintended consequences of those adversarial transformations.
We conducted a systematic survey of existing quantifiable metrics that describe text instances in NLP tasks.
- Score: 28.221635644616523
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the era of widespread public use of AI systems across various domains,
ensuring adversarial robustness has become increasingly vital to maintain
safety and prevent undesirable errors. Researchers have curated various
adversarial datasets (through perturbations) for capturing model deficiencies
that cannot be revealed in standard benchmark datasets. However, little is
known about how these adversarial examples differ from the original data
points, and there is still no methodology to measure the intended and
unintended consequences of those adversarial transformations. In this research,
we conducted a systematic survey of existing quantifiable metrics that describe
text instances in NLP tasks, among dimensions of difficulty, diversity, and
disagreement. We selected several current adversarial effect datasets and
compared the distributions between the original and their adversarial
counterparts. The results provide valuable insights into what makes these
datasets more challenging from a metrics perspective and whether they align
with underlying assumptions.
Related papers
- Evidential Deep Partial Multi-View Classification With Discount Fusion [24.139495744683128]
We propose a novel framework called Evidential Deep Partial Multi-View Classification (EDP-MVC)
We use K-means imputation to address missing views, creating a complete set of multi-view data.
The potential conflicts and uncertainties within this imputed data can affect the reliability of downstream inferences.
arXiv Detail & Related papers (2024-08-23T14:50:49Z) - Regularized Contrastive Partial Multi-view Outlier Detection [76.77036536484114]
We propose a novel method named Regularized Contrastive Partial Multi-view Outlier Detection (RCPMOD)
In this framework, we utilize contrastive learning to learn view-consistent information and distinguish outliers by the degree of consistency.
Experimental results on four benchmark datasets demonstrate that our proposed approach could outperform state-of-the-art competitors.
arXiv Detail & Related papers (2024-08-02T14:34:27Z) - Fin-Fed-OD: Federated Outlier Detection on Financial Tabular Data [11.027356898413139]
Anomaly detection in real-world scenarios poses challenges due to dynamic and often unknown anomaly distributions.
This paper addresses the question of enhancing outlier detection within individual organizations without compromising data confidentiality.
We propose a novel method leveraging representation learning and federated learning techniques to improve the detection of unknown anomalies.
arXiv Detail & Related papers (2024-04-23T11:22:04Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - A Systematic Study on Quantifying Bias in GAN-Augmented Data [0.0]
Generative adversarial networks (GANs) have recently become a popular data augmentation technique used by machine learning practitioners.
They have been shown to suffer from the so-called mode collapse failure mode, which makes them vulnerable to exacerbating biases on already skewed datasets.
This study is a systematic effort focused on the evaluation of state-of-the-art metrics that can potentially quantify biases in GAN-augmented data.
arXiv Detail & Related papers (2023-08-23T22:19:48Z) - Conditional Feature Importance for Mixed Data [1.6114012813668934]
We develop a conditional predictive impact (CPI) framework with knockoff sampling.
We show that our proposed workflow controls type I error, achieves high power and is in line with results given by other conditional FI measures.
Our findings highlight the necessity of developing statistically adequate, specialized methods for mixed data.
arXiv Detail & Related papers (2022-10-06T16:52:38Z) - Assaying Out-Of-Distribution Generalization in Transfer Learning [103.57862972967273]
We take a unified view of previous work, highlighting message discrepancies that we address empirically.
We fine-tune over 31k networks, from nine different architectures in the many- and few-shot setting.
arXiv Detail & Related papers (2022-07-19T12:52:33Z) - Towards Unbiased Visual Emotion Recognition via Causal Intervention [63.74095927462]
We propose a novel Emotion Recognition Network (IERN) to alleviate the negative effects brought by the dataset bias.
A series of designed tests validate the effectiveness of IERN, and experiments on three emotion benchmarks demonstrate that IERN outperforms other state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-26T10:40:59Z) - Not All Datasets Are Born Equal: On Heterogeneous Data and Adversarial
Examples [46.625818815798254]
We argue that machine learning models trained on heterogeneous data are as susceptible to adversarial manipulations as those trained on homogeneous data.
We introduce a generic optimization framework for identifying adversarial perturbations in heterogeneous input spaces.
Our results demonstrate that despite the constraints imposed on input validity in heterogeneous datasets, machine learning models trained using such data are still equally susceptible to adversarial examples.
arXiv Detail & Related papers (2020-10-07T05:24:23Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - Learning Overlapping Representations for the Estimation of
Individualized Treatment Effects [97.42686600929211]
Estimating the likely outcome of alternatives from observational data is a challenging problem.
We show that algorithms that learn domain-invariant representations of inputs are often inappropriate.
We develop a deep kernel regression algorithm and posterior regularization framework that substantially outperforms the state-of-the-art on a variety of benchmarks data sets.
arXiv Detail & Related papers (2020-01-14T12:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.