Related papers: Dirty and Clean-Label attack detection using GAN discriminators

Dirty and Clean-Label attack detection using GAN discriminators

URL: http://arxiv.org/abs/2506.01224v2
Date: Tue, 03 Jun 2025 23:21:56 GMT
Title: Dirty and Clean-Label attack detection using GAN discriminators
Authors: John W. Smutny,
Abstract summary: This research uses GAN discriminators to protect a single class against mislabeled and different levels of modified images.<n>The results suggest that after training on a single class, GAN discriminator s confidence scores can provide a threshold to identify mislabeled images.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Gathering enough images to train a deep computer vision model is a constant challenge. Unfortunately, collecting images from unknown sources can leave your model s behavior at risk of being manipulated by a dirty-label or clean-label attack unless the images are properly inspected. Manually inspecting each image-label pair is impractical and common poison-detection methods that involve re-training your model can be time consuming. This research uses GAN discriminators to protect a single class against mislabeled and different levels of modified images. The effect of said perturbation on a basic convolutional neural network classifier is also included for reference. The results suggest that after training on a single class, GAN discriminator s confidence scores can provide a threshold to identify mislabeled images and identify 100% of the tested poison starting at a perturbation epsilon magnitude of 0.20, after decision threshold calibration using in-class samples. Developers can use this report as a basis to train their own discriminators to protect high valued classes in their CV models.

Related papers

GRASP-PsONet: Gradient-based Removal of Spurious Patterns for PsOriasis Severity Classification [0.0]
We propose a framework to automatically flag problematic training images that introduce spurious correlations.<n>Removing 8.2% of flagged images improves model AUC-ROC by 5% (85% to 90%) on a held out test set.<n>When applied to a subset of training data rated by two dermatologists, the method identifies over 90% of cases with inter-rater disagreement.
arXiv Detail & Related papers (2025-06-27T03:42:09Z)
An analysis of data variation and bias in image-based dermatological datasets for machine learning classification [2.039829968340841]
In clinical dermatology, classification models can detect malignant lesions on patients' skin using only RGB images as input.<n>Most learning-based methods employ data acquired from dermoscopic datasets on training, which are large and validated by a gold standard.<n>This work aims to evaluate the gap between dermoscopic and clinical samples and understand how the dataset variations impact training.
arXiv Detail & Related papers (2025-01-15T17:18:46Z)
Neural Fingerprints for Adversarial Attack Detection [2.7309692684728613]
A well known vulnerability of deep learning models is their susceptibility to adversarial examples. Many algorithms have been proposed to address this problem, falling generally into one of two categories. We argue that in a white-box setting, where the attacker knows the configuration and weights of the network and the detector, they can overcome the detector. This problem is common in security applications where even a very good model is not sufficient to ensure safety.
arXiv Detail & Related papers (2024-11-07T08:43:42Z)
Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks. It is quite beneficial and challenging to detect poisoned samples from a mixed dataset. We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z)
Unsupervised Contrastive Analysis for Salient Pattern Detection using Conditional Diffusion Models [13.970483987621135]
Contrastive Analysis (CA) aims to identify patterns in images that allow distinguishing between a background (BG) dataset and a target (TG) dataset (i.e. unhealthy subjects) Recent works on this topic rely on variational autoencoders (VAE) or contrastive learning strategies to learn the patterns that separate TG samples from BG samples in a supervised manner. We employ a self-supervised contrastive encoder to learn a latent representation encoding only common patterns from input images, using samples exclusively from the BG dataset during training, and approximating the distribution of the target patterns by leveraging data augmentation techniques.
arXiv Detail & Related papers (2024-06-02T15:19:07Z)
Activation Gradient based Poisoned Sample Detection Against Backdoor Attacks [35.42528584450334]
We develop an innovative poisoned sample detection approach, called Activation Gradient based Poisoned sample Detection (AGPD) First, we calculate GCDs of all classes from the model trained on the untrustworthy dataset. Then, we identify the target class(es) based on the difference on GCD dispersion between target and clean classes. Last, we filter out poisoned samples within the identified target class(es) based on the clear separation between poisoned and clean samples.
arXiv Detail & Related papers (2023-12-11T09:17:33Z)
DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models [79.71665540122498]
We propose a method for detecting unauthorized data usage by planting the injected content into the protected dataset. Specifically, we modify the protected images by adding unique contents on these images using stealthy image warping functions. By analyzing whether the model has memorized the injected content, we can detect models that had illegally utilized the unauthorized data.
arXiv Detail & Related papers (2023-07-06T16:27:39Z)
Mole Recruitment: Poisoning of Image Classifiers via Selective Batch Sampling [41.29604559362772]
We present a data poisoning attack that confounds machine learning models without any manipulation of the image or label. This is achieved by simply leveraging the most confounding natural samples found within the training data itself. We define moles as the training samples of a class that appear most similar to samples of another class.
arXiv Detail & Related papers (2023-03-30T00:59:37Z)
Training set cleansing of backdoor poisoning by self-supervised representation learning [0.0]
A backdoor or Trojan attack is an important type of data poisoning attack against deep neural network (DNN) We show that supervised training may build stronger association between the backdoor pattern and the associated target class than that between normal features and the true class of origin. We propose to use unsupervised representation learning to avoid emphasising backdoor-poisoned training samples and learn a similar feature embedding for samples of the same class.
arXiv Detail & Related papers (2022-10-19T03:29:58Z)
Seamless Iterative Semi-Supervised Correction of Imperfect Labels in Microscopy Images [57.42492501915773]
In-vitro tests are an alternative to animal testing for the toxicity of medical devices. Human fatigue plays a role in error making, making the use of deep learning appealing. We propose Seamless Iterative Semi-Supervised correction of Imperfect labels (SISSI) Our method successfully provides an adaptive early learning correction technique for object detection.
arXiv Detail & Related papers (2022-08-05T18:52:20Z)
Incorporating Semi-Supervised and Positive-Unlabeled Learning for Boosting Full Reference Image Quality Assessment [73.61888777504377]
Full-reference (FR) image quality assessment (IQA) evaluates the visual quality of a distorted image by measuring its perceptual difference with pristine-quality reference. Unlabeled data can be easily collected from an image degradation or restoration process, making it encouraging to exploit unlabeled training data to boost FR-IQA performance. In this paper, we suggest to incorporate semi-supervised and positive-unlabeled (PU) learning for exploiting unlabeled data while mitigating the adverse effect of outliers.
arXiv Detail & Related papers (2022-04-19T09:10:06Z)
Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle. In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize. Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z)
Salvage Reusable Samples from Noisy Data for Robust Learning [70.48919625304]
We propose a reusable sample selection and correction approach, termed as CRSSC, for coping with label noise in training deep FG models with web images. Our key idea is to additionally identify and correct reusable samples, and then leverage them together with clean examples to update the networks.
arXiv Detail & Related papers (2020-08-06T02:07:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.