FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models
- URL: http://arxiv.org/abs/2404.08631v1
- Date: Fri, 12 Apr 2024 17:50:40 GMT
- Title: FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models
- Authors: Yanting Wang, Wei Zou, Jinyuan Jia,
- Abstract summary: We propose FCert, the first certified defense against data poisoning attacks to few-shot classification.
Our experimental results show our FCert: 1) maintains classification accuracy without attacks, 2) outperforms existing certified defenses for data poisoning attacks, and 3) is efficient and general.
- Score: 38.019489232264796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot classification with foundation models (e.g., CLIP, DINOv2, PaLM-2) enables users to build an accurate classifier with a few labeled training samples (called support samples) for a classification task. However, an attacker could perform data poisoning attacks by manipulating some support samples such that the classifier makes the attacker-desired, arbitrary prediction for a testing input. Empirical defenses cannot provide formal robustness guarantees, leading to a cat-and-mouse game between the attacker and defender. Existing certified defenses are designed for traditional supervised learning, resulting in sub-optimal performance when extended to few-shot classification. In our work, we propose FCert, the first certified defense against data poisoning attacks to few-shot classification. We show our FCert provably predicts the same label for a testing input under arbitrary data poisoning attacks when the total number of poisoned support samples is bounded. We perform extensive experiments on benchmark few-shot classification datasets with foundation models released by OpenAI, Meta, and Google in both vision and text domains. Our experimental results show our FCert: 1) maintains classification accuracy without attacks, 2) outperforms existing state-of-the-art certified defenses for data poisoning attacks, and 3) is efficient and general.
Related papers
- Towards Fair Classification against Poisoning Attacks [52.57443558122475]
We study the poisoning scenario where the attacker can insert a small fraction of samples into training data.
We propose a general and theoretically guaranteed framework which accommodates traditional defense methods to fair classification against poisoning attacks.
arXiv Detail & Related papers (2022-10-18T00:49:58Z) - WeDef: Weakly Supervised Backdoor Defense for Text Classification [48.19967241668793]
Existing backdoor defense methods are only effective for limited trigger types.
We propose a novel weakly supervised backdoor defense framework WeDef.
We show that WeDef is effective against popular trigger-based attacks.
arXiv Detail & Related papers (2022-05-24T05:53:11Z) - Post-Training Detection of Backdoor Attacks for Two-Class and
Multi-Attack Scenarios [22.22337220509128]
Backdoor attacks (BAs) are an emerging threat to deep neural network classifiers.
We propose a detection framework based on BP reverse-engineering and a novel it expected transferability (ET) statistic.
arXiv Detail & Related papers (2022-01-20T22:21:38Z) - Certified Robustness of Nearest Neighbors against Data Poisoning Attacks [31.85264586217373]
We show that the intrinsic majority vote mechanisms in kNN and rNN already provide certified robustness guarantees against general data poisoning attacks.
Our results serve as standard baselines for future certified defenses against data poisoning attacks.
arXiv Detail & Related papers (2020-12-07T15:04:48Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - A Framework of Randomized Selection Based Certified Defenses Against
Data Poisoning Attacks [28.593598534525267]
This paper proposes a framework of random selection based certified defenses against data poisoning attacks.
We prove that the random selection schemes that satisfy certain conditions are robust against data poisoning attacks.
Our framework allows users to improve robustness by leveraging prior knowledge about the training set and the poisoning model.
arXiv Detail & Related papers (2020-09-18T10:38:12Z) - Deep Partition Aggregation: Provable Defense against General Poisoning
Attacks [136.79415677706612]
Adrial poisoning attacks distort training data in order to corrupt the test-time behavior of a classifier.
We propose two novel provable defenses against poisoning attacks.
DPA is a certified defense against a general poisoning threat model.
SS-DPA is a certified defense against label-flipping attacks.
arXiv Detail & Related papers (2020-06-26T03:16:31Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.