Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks
- URL: http://arxiv.org/abs/2310.05862v2
- Date: Mon, 10 Jun 2024 21:01:11 GMT
- Title: Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks
- Authors: Wenhan Yang, Jingdong Gao, Baharan Mirzasoleiman,
- Abstract summary: Contrastive Language-Image Pre-training (CLIP) on large image-caption datasets has achieved remarkable success in zero-shot classification.
CLIP is more vulnerable to targeted data poisoning and backdoor attacks, compared to supervised learning.
We propose a strong defense, SAFECLIP, to safely pre-train CLIP against targeted data poisoning and backdoor attacks.
- Score: 46.504428925984406
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contrastive Language-Image Pre-training (CLIP) on large image-caption datasets has achieved remarkable success in zero-shot classification and enabled transferability to new domains. However, CLIP is extremely more vulnerable to targeted data poisoning and backdoor attacks, compared to supervised learning. Perhaps surprisingly, poisoning 0.0001% of CLIP pre-training data is enough to make targeted data poisoning attacks successful. This is four orders of magnitude smaller than what is required to poison supervised models. Despite this vulnerability, existing methods are very limited in defending CLIP models during pre-training. In this work, we propose a strong defense, SAFECLIP, to safely pre-train CLIP against targeted data poisoning and backdoor attacks. SAFECLIP warms up the model by applying unimodal contrastive learning (CL) on image and text modalities separately. Then, it divides the data into safe and risky sets, by applying a Gaussian Mixture Model to the cosine similarity of image-caption pair representations. SAFECLIP pre-trains the model by applying the CLIP loss to the safe set and applying unimodal CL to image and text modalities of the risky set separately. By gradually increasing the size of the safe set during pre-training, SAFECLIP effectively breaks targeted data poisoning and backdoor attacks without harming the CLIP performance. Our extensive experiments on CC3M, Visual Genome, and MSCOCO demonstrate that SAFECLIP significantly reduces the success rate of targeted data poisoning attacks from 93.75% to 0% and that of various backdoor attacks from up to 100% to 0%, without harming CLIP's performance.
Related papers
- Persistent Pre-Training Poisoning of LLMs [71.53046642099142]
Our work evaluates for the first time whether language models can also be compromised during pre-training.
We pre-train a series of LLMs from scratch to measure the impact of a potential poisoning adversary.
Our main result is that poisoning only 0.1% of a model's pre-training dataset is sufficient for three out of four attacks to persist through post-training.
arXiv Detail & Related papers (2024-10-17T16:27:13Z) - BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP [55.33331463515103]
BadCLIP is built on a novel and effective mechanism in backdoor attacks on CLIP.
It consists of a learnable trigger applied to images and a trigger-aware context generator, such that the trigger can change text features via trigger-aware prompts.
arXiv Detail & Related papers (2023-11-26T14:24:13Z) - A Comprehensive Study of Privacy Risks in Curriculum Learning [25.57099711643689]
Training a machine learning model with data following a meaningful order has been proven to be effective in accelerating the training process.
The key enabling technique is curriculum learning (CL), which has seen great success and has been deployed in areas like image and text classification.
Yet, how CL affects the privacy of machine learning is unclear.
arXiv Detail & Related papers (2023-10-16T07:06:38Z) - Demystifying CLIP Data [86.34045746910114]
Contrastive Language-Image Pre-training (CLIP) has advanced research and applications in computer vision.
We introduce Metadata-Curated Language-Image Pre-training (MetaCLIP)
MetaCLIP takes a raw data pool and metadata (derived from CLIP's concepts) and yields a balanced subset over the metadata distribution.
arXiv Detail & Related papers (2023-09-28T17:59:56Z) - Robust Contrastive Language-Image Pre-training against Data Poisoning
and Backdoor Attacks [52.26631767748843]
We propose ROCLIP, the first effective method for robust pre-training multimodal vision-language models against targeted data poisoning and backdoor attacks.
ROCLIP effectively breaks the association between poisoned image-caption pairs by considering a relatively large and varying pool of random captions.
Our experiments show that ROCLIP renders state-of-the-art targeted data poisoning and backdoor attacks ineffective during pre-training CLIP models.
arXiv Detail & Related papers (2023-03-13T04:49:46Z) - CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive
Learning [63.72975421109622]
CleanCLIP is a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks.
CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning.
arXiv Detail & Related papers (2023-03-06T17:48:32Z) - BEAS: Blockchain Enabled Asynchronous & Secure Federated Machine
Learning [0.0]
We present BEAS, the first blockchain-based framework for N-party Federated Learning.
It provides strict privacy guarantees of training data using gradient pruning.
Anomaly detection protocols are used to minimize the risk of data-poisoning attacks.
We also define a novel protocol to prevent premature convergence in heterogeneous learning environments.
arXiv Detail & Related papers (2022-02-06T17:11:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.