Not All Datasets Are Born Equal: On Heterogeneous Data and Adversarial
Examples
- URL: http://arxiv.org/abs/2010.03180v2
- Date: Thu, 2 Sep 2021 08:02:52 GMT
- Title: Not All Datasets Are Born Equal: On Heterogeneous Data and Adversarial
Examples
- Authors: Yael Mathov, Eden Levy, Ziv Katzir, Asaf Shabtai, Yuval Elovici
- Abstract summary: We argue that machine learning models trained on heterogeneous data are as susceptible to adversarial manipulations as those trained on homogeneous data.
We introduce a generic optimization framework for identifying adversarial perturbations in heterogeneous input spaces.
Our results demonstrate that despite the constraints imposed on input validity in heterogeneous datasets, machine learning models trained using such data are still equally susceptible to adversarial examples.
- Score: 46.625818815798254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work on adversarial learning has focused mainly on neural networks and
domains where those networks excel, such as computer vision, or audio
processing. The data in these domains is typically homogeneous, whereas
heterogeneous tabular datasets domains remain underexplored despite their
prevalence. When searching for adversarial patterns within heterogeneous input
spaces, an attacker must simultaneously preserve the complex domain-specific
validity rules of the data, as well as the adversarial nature of the identified
samples. As such, applying adversarial manipulations to heterogeneous datasets
has proved to be a challenging task, and no generic attack method was suggested
thus far. We, however, argue that machine learning models trained on
heterogeneous tabular data are as susceptible to adversarial manipulations as
those trained on continuous or homogeneous data such as images. To support our
claim, we introduce a generic optimization framework for identifying
adversarial perturbations in heterogeneous input spaces. We define
distribution-aware constraints for preserving the consistency of the
adversarial examples and incorporate them by embedding the heterogeneous input
into a continuous latent space. Due to the nature of the underlying datasets We
focus on $\ell_0$ perturbations, and demonstrate their applicability in real
life. We demonstrate the effectiveness of our approach using three datasets
from different content domains. Our results demonstrate that despite the
constraints imposed on input validity in heterogeneous datasets, machine
learning models trained using such data are still equally susceptible to
adversarial examples.
Related papers
- Approaching Metaheuristic Deep Learning Combos for Automated Data Mining [0.5419570023862531]
This work proposes a means of combining meta-heuristic methods with conventional classifiers and neural networks in order to perform automated data mining.
Experiments on the MNIST dataset for handwritten digit recognition were performed.
It was empirically observed that using a ground truth labeled dataset's validation accuracy is inadequate for correcting labels of other previously unseen data instances.
arXiv Detail & Related papers (2024-10-16T10:28:22Z) - Enhancing Anomaly Detection via Generating Diversified and Hard-to-distinguish Synthetic Anomalies [7.021105583098609]
Recent approaches have focused on leveraging domain-specific transformations or perturbations to generate synthetic anomalies from normal samples.
We introduce a novel domain-agnostic method that employs a set of conditional perturbators and a discriminator.
We demonstrate the superiority of our method over state-of-the-art benchmarks.
arXiv Detail & Related papers (2024-09-16T08:15:23Z) - Investigating Imperceptibility of Adversarial Attacks on Tabular Data: An Empirical Analysis [1.6693963355435217]
Adversarial attacks are a potential threat to machine learning models.
These attacks cause incorrect predictions through imperceptible perturbations to the input data.
This study proposes a set of key properties and corresponding metrics to assess the imperceptibility of adversarial attacks.
arXiv Detail & Related papers (2024-07-16T07:55:25Z) - Combining propensity score methods with variational autoencoders for
generating synthetic data in presence of latent sub-groups [0.0]
Heterogeneity might be known, e.g., as indicated by sub-groups labels, or might be unknown and reflected only in properties of distributions, such as bimodality or skewness.
We investigate how such heterogeneity can be preserved and controlled when obtaining synthetic data from variational autoencoders (VAEs), i.e., a generative deep learning technique.
arXiv Detail & Related papers (2023-12-12T22:49:24Z) - Adversarial Examples Might be Avoidable: The Role of Data Concentration in Adversarial Robustness [39.883465335244594]
We show that concentration on small-volume subsets of the input space determines whether a robust classifier exists.
We further demonstrate that, for a data distribution concentrated on a union of low-dimensional linear subspaces, utilizing structure in data naturally leads to classifiers that enjoy data-dependent polyhedral guarantees.
arXiv Detail & Related papers (2023-09-28T01:39:47Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Heteroskedastic and Imbalanced Deep Learning with Adaptive
Regularization [55.278153228758434]
Real-world datasets are heteroskedastic and imbalanced.
Addressing heteroskedasticity and imbalance simultaneously is under-explored.
We propose a data-dependent regularization technique for heteroskedastic datasets.
arXiv Detail & Related papers (2020-06-29T01:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.