TCAB: A Large-Scale Text Classification Attack Benchmark
- URL: http://arxiv.org/abs/2210.12233v1
- Date: Fri, 21 Oct 2022 20:22:45 GMT
- Title: TCAB: A Large-Scale Text Classification Attack Benchmark
- Authors: Kalyani Asthana, Zhouhang Xie, Wencong You, Adam Noack, Jonathan
Brophy, Sameer Singh, Daniel Lowd
- Abstract summary: The Text Classification Attack Benchmark (TCAB) is a dataset for analyzing, understanding, detecting, and labeling adversarial attacks against text classifiers.
TCAB includes 1.5 million attack instances, generated by twelve adversarial attacks targeting three classifiers trained on six source datasets for sentiment analysis and abuse detection in English.
In addition to the primary tasks of detecting and labeling attacks, TCAB can also be used for attack localization, attack target labeling, and attack characterization.
- Score: 36.102015445585785
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We introduce the Text Classification Attack Benchmark (TCAB), a dataset for
analyzing, understanding, detecting, and labeling adversarial attacks against
text classifiers. TCAB includes 1.5 million attack instances, generated by
twelve adversarial attacks targeting three classifiers trained on six source
datasets for sentiment analysis and abuse detection in English. Unlike standard
text classification, text attacks must be understood in the context of the
target classifier that is being attacked, and thus features of the target
classifier are important as well. TCAB includes all attack instances that are
successful in flipping the predicted label; a subset of the attacks are also
labeled by human annotators to determine how frequently the primary semantics
are preserved. The process of generating attacks is automated, so that TCAB can
easily be extended to incorporate new text attacks and better classifiers as
they are developed. In addition to the primary tasks of detecting and labeling
attacks, TCAB can also be used for attack localization, attack target labeling,
and attack characterization. TCAB code and dataset are available at
https://react-nlp.github.io/tcab/.
Related papers
- FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models [38.019489232264796]
We propose FCert, the first certified defense against data poisoning attacks to few-shot classification.
Our experimental results show our FCert: 1) maintains classification accuracy without attacks, 2) outperforms existing certified defenses for data poisoning attacks, and 3) is efficient and general.
arXiv Detail & Related papers (2024-04-12T17:50:40Z) - OrderBkd: Textual backdoor attack through repositioning [0.0]
Third-party datasets and pre-trained machine learning models pose a threat to NLP systems.
Existing backdoor attacks involve poisoning the data samples such as insertion of tokens or sentence paraphrasing.
Our main difference from the previous work is that we use the reposition of a two words in a sentence as a trigger.
arXiv Detail & Related papers (2024-02-12T14:53:37Z) - Adversarial Clean Label Backdoor Attacks and Defenses on Text
Classification Systems [23.201773332458693]
Clean-label (CL) attacks are relatively unexplored in NLP.
CL attacks are more resilient to data sanitization and manual relabeling methods than label flipping (LF) attacks.
We show that an adversary can significantly bring down the data requirements for a CL attack to as low as 20% of the data otherwise required.
arXiv Detail & Related papers (2023-05-31T07:23:46Z) - Attacking Important Pixels for Anchor-free Detectors [47.524554948433995]
Existing adversarial attacks on object detection focus on attacking anchor-based detectors.
We propose the first adversarial attack dedicated to anchor-free detectors.
Our proposed methods achieve state-of-the-art attack performance and transferability on both object detection and human pose estimation tasks.
arXiv Detail & Related papers (2023-01-26T23:03:03Z) - Object-fabrication Targeted Attack for Object Detection [54.10697546734503]
adversarial attack for object detection contains targeted attack and untargeted attack.
New object-fabrication targeted attack mode can mislead detectors tofabricate extra false objects with specific target labels.
arXiv Detail & Related papers (2022-12-13T08:42:39Z) - Zero-Query Transfer Attacks on Context-Aware Object Detectors [95.18656036716972]
Adversarial attacks perturb images such that a deep neural network produces incorrect classification results.
A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check.
We present the first approach for generating context-consistent adversarial attacks that can evade the context-consistency check.
arXiv Detail & Related papers (2022-03-29T04:33:06Z) - Identifying Adversarial Attacks on Text Classifiers [32.958568467774704]
In this paper, we analyze adversarial text to determine which methods were used to create it.
Our first contribution is an extensive dataset for attack detection and labeling.
As our second contribution, we use this dataset to develop and benchmark a number of classifiers for attack identification.
arXiv Detail & Related papers (2022-01-21T06:16:04Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Universal Adversarial Attacks with Natural Triggers for Text
Classification [30.74579821832117]
We develop adversarial attacks that appear closer to natural English phrases and yet confuse classification systems.
Our attacks effectively reduce model accuracy on classification tasks while being less identifiable than prior models.
arXiv Detail & Related papers (2020-05-01T01:58:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.