Related papers: Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing

Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing

URL: http://arxiv.org/abs/2411.12508v1
Date: Tue, 19 Nov 2024 13:50:08 GMT
Title: Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing
Authors: Ruyi Ding, Tong Zhou, Lili Su, Aidong Adam Ding, Xiaolin Xu, Yunsi Fei,
Abstract summary: Adapting pre-trained deep learning models to customized tasks has become a popular choice for developers. probing--training a downstream head on a pre-trained encoder--has been widely adopted in transfer learning. Such generalizability of pre-trained encoders raises concerns about the potential misuse of probing for harmful intentions. We introduceLock, a novel applicability authorization method designed to protect pre-trained encoders from malicious probing.
Score: 14.290156958543845
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adapting pre-trained deep learning models to customized tasks has become a popular choice for developers to cope with limited computational resources and data volume. More specifically, probing--training a downstream head on a pre-trained encoder--has been widely adopted in transfer learning, which helps to prevent overfitting and catastrophic forgetting. However, such generalizability of pre-trained encoders raises concerns about the potential misuse of probing for harmful intentions, such as discriminatory speculation and warfare applications. In this work, we introduce EncoderLock, a novel applicability authorization method designed to protect pre-trained encoders from malicious probing, i.e., yielding poor performance on specified prohibited domains while maintaining their utility in authorized ones. Achieving this balance is challenging because of the opposite optimization objectives and the variety of downstream heads that adversaries can utilize adaptively. To address these challenges, EncoderLock employs two techniques: domain-aware weight selection and updating to restrict applications on prohibited domains/tasks, and self-challenging training scheme that iteratively strengthens resistance against any potential downstream classifiers that adversaries may apply. Moreover, recognizing the potential lack of data from prohibited domains in practical scenarios, we introduce three EncoderLock variants with different levels of data accessibility: supervised (prohibited domain data with labels), unsupervised (prohibited domain data without labels), and zero-shot (no data or labels available). We verify EncoderLock's effectiveness and practicality with a real-world pre-trained Vision Transformer (ViT) encoder from Facebook. These results underscore the valuable contributions EncoderLock brings to the development of responsible AI.

Related papers

Fooling the Decoder: An Adversarial Attack on Quantum Error Correction [49.48516314472825]
In this work, we target a basic RL surface code decoder (DeepQ) to create the first adversarial attack on quantum error correction. We demonstrate an attack that reduces the logical qubit lifetime in memory experiments by up to five orders of magnitude. This attack highlights the susceptibility of machine learning-based QEC and underscores the importance of further research into robust QEC methods.
arXiv Detail & Related papers (2025-04-28T10:10:05Z)
Ungeneralizable Examples [70.76487163068109]
Current approaches to creating unlearnable data involve incorporating small, specially designed noises. We extend the concept of unlearnable data to conditional data learnability and introduce textbfUntextbfGeneralizable textbfExamples (UGEs) UGEs exhibit learnability for authorized users while maintaining unlearnability for potential hackers.
arXiv Detail & Related papers (2024-04-22T09:29:14Z)
Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples [28.947545367473086]
We propose a two-stage adversarial fine-tuning approach aimed at enhancing the robustness of downstream models. Our experiments, conducted across ten self-supervised training methods and six datasets, demonstrate that Gen-AF attains high testing accuracy and robust testing accuracy against state-of-the-art DAEs.
arXiv Detail & Related papers (2024-03-16T04:23:46Z)
Downstream-agnostic Adversarial Examples [66.8606539786026]
AdvEncoder is first framework for generating downstream-agnostic universal adversarial examples based on pre-trained encoder. Unlike traditional adversarial example works, the pre-trained encoder only outputs feature vectors rather than classification labels. Our results show that an attacker can successfully attack downstream tasks without knowing either the pre-training dataset or the downstream dataset.
arXiv Detail & Related papers (2023-07-23T10:16:47Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
Pre-trained Encoders in Self-Supervised Learning Improve Secure and Privacy-preserving Supervised Learning [63.45532264721498]
Self-supervised learning is an emerging technique to pre-train encoders using unlabeled data. We perform first systematic, principled measurement study to understand whether and when a pretrained encoder can address the limitations of secure or privacy-preserving supervised learning algorithms.
arXiv Detail & Related papers (2022-12-06T21:35:35Z)
PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning [69.70602220716718]
We propose PoisonedEncoder, a data poisoning attack to contrastive learning. In particular, an attacker injects carefully crafted poisoning inputs into the unlabeled pre-training data. We evaluate five defenses against PoisonedEncoder, including one pre-processing, three in-processing, and one post-processing defenses.
arXiv Detail & Related papers (2022-05-13T00:15:44Z)
Watermarking Pre-trained Encoders in Contrastive Learning [9.23485246108653]
The pre-trained encoders are an important intellectual property that needs to be carefully protected. It is challenging to migrate existing watermarking techniques from the classification tasks to the contrastive learning scenario. We introduce a task-agnostic loss function to effectively embed into the encoder a backdoor as the watermark.
arXiv Detail & Related papers (2022-01-20T15:14:31Z)
StolenEncoder: Stealing Pre-trained Encoders [62.02156378126672]
We propose the first attack called StolenEncoder to steal pre-trained image encoders. Our results show that the encoders stolen by StolenEncoder have similar functionality with the target encoders.
arXiv Detail & Related papers (2022-01-15T17:04:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.