Related papers: Detecting Backdoors in Pre-trained Encoders

Detecting Backdoors in Pre-trained Encoders

URL: http://arxiv.org/abs/2303.15180v1
Date: Thu, 23 Mar 2023 19:04:40 GMT
Title: Detecting Backdoors in Pre-trained Encoders
Authors: Shiwei Feng, Guanhong Tao, Siyuan Cheng, Guangyu Shen, Xiangzhe Xu, Yingqi Liu, Kaiyuan Zhang, Shiqing Ma, Xiangyu Zhang
Abstract summary: We propose DECREE, the first backdoor detection approach for pre-trained encoders. We show the effectiveness of our method on image encoders pre-trained on ImageNet and OpenAI's CLIP 400 million image-text pairs.
Score: 25.105186092387633
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-supervised learning in computer vision trains on unlabeled data, such as images or (image, text) pairs, to obtain an image encoder that learns high-quality embeddings for input data. Emerging backdoor attacks towards encoders expose crucial vulnerabilities of self-supervised learning, since downstream classifiers (even further trained on clean data) may inherit backdoor behaviors from encoders. Existing backdoor detection methods mainly focus on supervised learning settings and cannot handle pre-trained encoders especially when input labels are not available. In this paper, we propose DECREE, the first backdoor detection approach for pre-trained encoders, requiring neither classifier headers nor input labels. We evaluate DECREE on over 400 encoders trojaned under 3 paradigms. We show the effectiveness of our method on image encoders pre-trained on ImageNet and OpenAI's CLIP 400 million image-text pairs. Our method consistently has a high detection accuracy even if we have only limited or no access to the pre-training dataset.

Related papers

DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders [6.698677477097004]
Self-supervised learning (SSL) is pervasively exploited in training high-quality upstream encoders with a large amount of unlabeled data. backdoor attacks merely via polluting a small portion of training data. We propose a novel detection mechanism, DeDe, which detects the activation of the backdoor mapping with the cooccurrence of victim encoder and trigger inputs.
arXiv Detail & Related papers (2024-11-25T07:26:22Z)
Inside the Black Box: Detecting Data Leakage in Pre-trained Language Encoders [68.00224057755773]
We focus on the membership leakage of pre-training data exposed through downstream models adapted from pre-trained language encoders. Our evaluations reveal, for the first time, the existence of membership leakage even when only the black-box output of the downstream model is exposed.
arXiv Detail & Related papers (2024-08-20T17:55:15Z)
Downstream-agnostic Adversarial Examples [66.8606539786026]
AdvEncoder is first framework for generating downstream-agnostic universal adversarial examples based on pre-trained encoder. Unlike traditional adversarial example works, the pre-trained encoder only outputs feature vectors rather than classification labels. Our results show that an attacker can successfully attack downstream tasks without knowing either the pre-training dataset or the downstream dataset.
arXiv Detail & Related papers (2023-07-23T10:16:47Z)
Pre-trained Encoders in Self-Supervised Learning Improve Secure and Privacy-preserving Supervised Learning [63.45532264721498]
Self-supervised learning is an emerging technique to pre-train encoders using unlabeled data. We perform first systematic, principled measurement study to understand whether and when a pretrained encoder can address the limitations of secure or privacy-preserving supervised learning algorithms.
arXiv Detail & Related papers (2022-12-06T21:35:35Z)
PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning [69.70602220716718]
We propose PoisonedEncoder, a data poisoning attack to contrastive learning. In particular, an attacker injects carefully crafted poisoning inputs into the unlabeled pre-training data. We evaluate five defenses against PoisonedEncoder, including one pre-processing, three in-processing, and one post-processing defenses.
arXiv Detail & Related papers (2022-05-13T00:15:44Z)
Watermarking Pre-trained Encoders in Contrastive Learning [9.23485246108653]
The pre-trained encoders are an important intellectual property that needs to be carefully protected. It is challenging to migrate existing watermarking techniques from the classification tasks to the contrastive learning scenario. We introduce a task-agnostic loss function to effectively embed into the encoder a backdoor as the watermark.
arXiv Detail & Related papers (2022-01-20T15:14:31Z)
StolenEncoder: Stealing Pre-trained Encoders [62.02156378126672]
We propose the first attack called StolenEncoder to steal pre-trained image encoders. Our results show that the encoders stolen by StolenEncoder have similar functionality with the target encoders.
arXiv Detail & Related papers (2022-01-15T17:04:38Z)
Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z)
EncoderMI: Membership Inference against Pre-trained Encoders in Contrastive Learning [27.54202989524394]
We proposeMI, the first membership inference method against image encoders pre-trained by contrastive learning. We evaluateMI on image encoders pre-trained on multiple datasets by ourselves as well as the Contrastive Language-Image Pre-training (CLIP) image encoder, which is pre-trained on 400 million (image, text) pairs collected from the Internet and released by OpenAI.
arXiv Detail & Related papers (2021-08-25T03:00:45Z)
BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning [29.113263683850015]
Self-supervised learning in computer vision aims to pre-train an image encoder using a large amount of unlabeled images or (image, text) pairs. We propose BadEncoder, the first backdoor attack to self-supervised learning.
arXiv Detail & Related papers (2021-08-01T02:22:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.