Related papers: Effective Backdoor Mitigation in Vision-Language Models Depends on the Pre-training Objective

Effective Backdoor Mitigation in Vision-Language Models Depends on the Pre-training Objective

URL: http://arxiv.org/abs/2311.14948v4
Date: Sat, 11 Jan 2025 00:17:44 GMT
Title: Effective Backdoor Mitigation in Vision-Language Models Depends on the Pre-training Objective
Authors: Sahil Verma, Gantavya Bhatt, Avi Schwarzschild, Soumye Singhal, Arnav Mohanty Das, Chirag Shah, John P Dickerson, Pin-Yu Chen, Jeff Bilmes,
Abstract summary: Modern machine learning models are vulnerable to adversarial and backdoor attacks.<n>Such risks are heightened by the prevalent practice of collecting massive, internet-sourced datasets for training multimodal models.<n>CleanCLIP is the current state-of-the-art approach to mitigate the effects of backdooring in multimodal models.
Score: 71.39995120597999
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Despite the advanced capabilities of contemporary machine learning (ML) models, they remain vulnerable to adversarial and backdoor attacks. This vulnerability is particularly concerning in real-world deployments, where compromised models may exhibit unpredictable behavior in critical scenarios. Such risks are heightened by the prevalent practice of collecting massive, internet-sourced datasets for training multimodal models, as these datasets may harbor backdoors. Various techniques have been proposed to mitigate the effects of backdooring in multimodal models, such as CleanCLIP, which is the current state-of-the-art approach. In this work, we demonstrate that the efficacy of CleanCLIP in mitigating backdoors is highly dependent on the particular objective used during model pre-training. We observe that stronger pre-training objectives that lead to higher zero-shot classification performance correlate with harder to remove backdoors behaviors. We show this by training multimodal models on two large datasets consisting of 3 million (CC3M) and 6 million (CC6M) datapoints, under various pre-training objectives, followed by poison removal using CleanCLIP. We find that CleanCLIP, even with extensive hyperparameter tuning, is ineffective in poison removal when stronger pre-training objectives are used. Our findings underscore critical considerations for ML practitioners who train models using large-scale web-curated data and are concerned about potential backdoor threats.

Related papers

Backdoor Cleaning without External Guidance in MLLM Fine-tuning [76.82121084745785]
Believe Your Eyes (BYE) is a data filtering framework that leverages attention entropy patterns as self-supervised signals to identify and filter backdoor samples.<n>It achieves near-zero attack success rates while maintaining clean-task performance.
arXiv Detail & Related papers (2025-05-22T17:11:58Z)
Towards Adversarially Robust Deep Metric Learning [0.8702432681310401]
Deep neural networks are prone to adversarial attacks and could be easily fooled by adversarial examples. Existing works fail to thoroughly inspect the robustness of DML models. We propose a new defense, the Ensemble Adversarial Training (EAT), which exploits ensemble learning and adversarial training.
arXiv Detail & Related papers (2025-01-02T03:15:25Z)
Behavior Backdoor for Deep Learning Models [95.50787731231063]
We take the first step towards behavioral backdoor'' attack, which is defined as a behavior-triggered backdoor model training procedure. We propose the first pipeline of implementing behavior backdoor, i.e., the Quantification Backdoor (QB) attack. Experiments have been conducted on different models, datasets, and tasks, demonstrating the effectiveness of this novel backdoor attack.
arXiv Detail & Related papers (2024-12-02T10:54:02Z)
Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z)
DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks [26.24490960002264]
We propose a general and effective loss function DeCE (Deceptive Cross-Entropy) to enhance the security of Code Language Models. Our experiments across various code synthesis datasets, models, and poisoning ratios demonstrate the applicability and effectiveness of DeCE.
arXiv Detail & Related papers (2024-07-12T03:18:38Z)
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack. When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
End-to-End Anti-Backdoor Learning on Images and Time Series [34.02071390659078]
Backdoor attacks present a substantial security concern for deep learning models. This paper builds upon Anti-Backdoor Learning (ABL) and introduces an innovative method, End-to-End Anti-Backdoor Learning (E2ABL) E2ABL accomplishes end-to-end training through an additional classification head linked to a Deep Neural Network (DNN)
arXiv Detail & Related papers (2024-01-06T13:34:07Z)
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses. We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z)
Avoid Adversarial Adaption in Federated Learning by Multi-Metric Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources. FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks. We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously. MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z)
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning [63.72975421109622]
CleanCLIP is a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks. CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning.
arXiv Detail & Related papers (2023-03-06T17:48:32Z)
On the Evaluation of User Privacy in Deep Neural Networks using Timing Side Channel [14.350301915592027]
We identify and report a novel data-dependent timing side-channel leakage (termed Class Leakage) in Deep Learning (DL) implementations. We demonstrate a practical inference-time attack where an adversary with user privilege and hard-label blackbox access to an ML can exploit Class Leakage. We develop an easy-to-implement countermeasure by making a constant-time branching operation that alleviates the Class Leakage.
arXiv Detail & Related papers (2022-08-01T19:38:16Z)
RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target. RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead. Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z)
On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern. In general, adversarial training is believed to defend against backdoor attacks. We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.