Related papers: Solving Trojan Detection Competitions with Linear Weight Classification

Solving Trojan Detection Competitions with Linear Weight Classification

URL: http://arxiv.org/abs/2411.03445v1
Date: Tue, 05 Nov 2024 19:00:34 GMT
Title: Solving Trojan Detection Competitions with Linear Weight Classification
Authors: Todd Huster, Peter Lin, Razvan Stefanescu, Emmanuel Ekwedike, Ritu Chadha,
Abstract summary: We introduce a detector that works remarkably well across many of the existing datasets and domains. We evaluate this algorithm on a diverse set of Trojan detection benchmarks and domains.
Score: 1.24275433420322
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural networks can conceal malicious Trojan backdoors that allow a trigger to covertly change the model behavior. Detecting signs of these backdoors, particularly without access to any triggered data, is the subject of ongoing research and open challenges. In one common formulation of the problem, we are given a set of clean and poisoned models and need to predict whether a given test model is clean or poisoned. In this paper, we introduce a detector that works remarkably well across many of the existing datasets and domains. It is obtained by training a binary classifier on a large number of models' weights after performing a few different pre-processing steps including feature selection and standardization, reference model weights subtraction, and model alignment prior to detection. We evaluate this algorithm on a diverse set of Trojan detection benchmarks and domains and examine the cases where the approach is most and least effective.

Related papers

Towards Robust Object Detection: Identifying and Removing Backdoors via Module Inconsistency Analysis [5.8634235309501435]
We propose a backdoor defense framework tailored to object detection models. By quantifying and analyzing inconsistencies, we develop an algorithm to detect backdoors. Experiments with state-of-the-art two-stage object detectors show our method achieves a 90% improvement in backdoor removal rate.
arXiv Detail & Related papers (2024-09-24T12:58:35Z)
Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [58.60915132222421]
We introduce an approach that is both general and parameter-efficient for face forgery detection. We design a forgery-style mixture formulation that augments the diversity of forgery source domains. We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z)
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures. This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z)
TEN-GUARD: Tensor Decomposition for Backdoor Attack Detection in Deep Neural Networks [3.489779105594534]
We introduce a novel approach to backdoor detection using two tensor decomposition methods applied to network activations. This has a number of advantages relative to existing detection methods, including the ability to analyze multiple models at the same time. Results show that our method detects backdoored networks more accurately and efficiently than current state-of-the-art methods.
arXiv Detail & Related papers (2024-01-06T03:08:28Z)
Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection [97.93062818228015]
We propose to integrate the reconstruction-based functionality into a novel self-supervised predictive architectural building block. Our block is equipped with a loss that minimizes the reconstruction error with respect to the masked area in the receptive field. We demonstrate the generality of our block by integrating it into several state-of-the-art frameworks for anomaly detection on image and video.
arXiv Detail & Related papers (2021-11-17T13:30:31Z)
Trigger Hunting with a Topological Prior for Trojan Detection [16.376009231934884]
This paper tackles the problem of Trojan detection, namely, identifying Trojaned models. One popular approach is reverse engineering, recovering the triggers on a clean image by manipulating the model's prediction. One major challenge of reverse engineering approach is the enormous search space of triggers. We propose innovative priors such as diversity and topological simplicity to not only increase the chances of finding the appropriate triggers but also improve the quality of the found triggers.
arXiv Detail & Related papers (2021-10-15T19:47:00Z)
Online Defense of Trojaned Models using Misattributions [18.16378666013071]
This paper proposes a new approach to detecting neural Trojans on Deep Neural Networks during inference. We evaluate our approach on several benchmarks, including models trained on MNIST, Fashion MNIST, and German Traffic Sign Recognition Benchmark.
arXiv Detail & Related papers (2021-03-29T19:53:44Z)
Cassandra: Detecting Trojaned Networks from Adversarial Perturbations [92.43879594465422]
In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models. We propose a method to verify if a pre-trained model is Trojaned or benign. Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
arXiv Detail & Related papers (2020-07-28T19:00:40Z)
Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch. We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types. In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders [51.691585766702744]
We propose a variant of Adversarial Autoencoder which uses a mirrored Wasserstein loss in the discriminator to enforce better semantic-level reconstruction. We put forward an alternative measure of anomaly score to replace the reconstruction-based metric. Our method outperforms the current state-of-the-art methods for anomaly detection on several OOD detection benchmarks.
arXiv Detail & Related papers (2020-03-24T08:26:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.