Related papers: TCAB: A Large-Scale Text Classification Attack Benchmark

TCAB: A Large-Scale Text Classification Attack Benchmark

URL: http://arxiv.org/abs/2210.12233v1
Date: Fri, 21 Oct 2022 20:22:45 GMT
Title: TCAB: A Large-Scale Text Classification Attack Benchmark
Authors: Kalyani Asthana, Zhouhang Xie, Wencong You, Adam Noack, Jonathan Brophy, Sameer Singh, Daniel Lowd
Abstract summary: The Text Classification Attack Benchmark (TCAB) is a dataset for analyzing, understanding, detecting, and labeling adversarial attacks against text classifiers. TCAB includes 1.5 million attack instances, generated by twelve adversarial attacks targeting three classifiers trained on six source datasets for sentiment analysis and abuse detection in English. In addition to the primary tasks of detecting and labeling attacks, TCAB can also be used for attack localization, attack target labeling, and attack characterization.
Score: 36.102015445585785
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We introduce the Text Classification Attack Benchmark (TCAB), a dataset for analyzing, understanding, detecting, and labeling adversarial attacks against text classifiers. TCAB includes 1.5 million attack instances, generated by twelve adversarial attacks targeting three classifiers trained on six source datasets for sentiment analysis and abuse detection in English. Unlike standard text classification, text attacks must be understood in the context of the target classifier that is being attacked, and thus features of the target classifier are important as well. TCAB includes all attack instances that are successful in flipping the predicted label; a subset of the attacks are also labeled by human annotators to determine how frequently the primary semantics are preserved. The process of generating attacks is automated, so that TCAB can easily be extended to incorporate new text attacks and better classifiers as they are developed. In addition to the primary tasks of detecting and labeling attacks, TCAB can also be used for attack localization, attack target labeling, and attack characterization. TCAB code and dataset are available at https://react-nlp.github.io/tcab/.

Related papers

FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models [38.019489232264796]
We propose FCert, the first certified defense against data poisoning attacks to few-shot classification. Our experimental results show our FCert: 1) maintains classification accuracy without attacks, 2) outperforms existing certified defenses for data poisoning attacks, and 3) is efficient and general.
arXiv Detail & Related papers (2024-04-12T17:50:40Z)
OrderBkd: Textual backdoor attack through repositioning [0.0]
Third-party datasets and pre-trained machine learning models pose a threat to NLP systems. Existing backdoor attacks involve poisoning the data samples such as insertion of tokens or sentence paraphrasing. Our main difference from the previous work is that we use the reposition of a two words in a sentence as a trigger.
arXiv Detail & Related papers (2024-02-12T14:53:37Z)
Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks [58.10730906004818]
Typographic attacks, adding misleading text to images, can deceive vision-language models (LVLMs) Our experiments show these attacks significantly reduce classification performance by up to 60%.
arXiv Detail & Related papers (2024-02-01T14:41:20Z)
Adversarial Clean Label Backdoor Attacks and Defenses on Text Classification Systems [23.201773332458693]
Clean-label (CL) attacks are relatively unexplored in NLP. CL attacks are more resilient to data sanitization and manual relabeling methods than label flipping (LF) attacks. We show that an adversary can significantly bring down the data requirements for a CL attack to as low as 20% of the data otherwise required.
arXiv Detail & Related papers (2023-05-31T07:23:46Z)
Attacking Important Pixels for Anchor-free Detectors [47.524554948433995]
Existing adversarial attacks on object detection focus on attacking anchor-based detectors. We propose the first adversarial attack dedicated to anchor-free detectors. Our proposed methods achieve state-of-the-art attack performance and transferability on both object detection and human pose estimation tasks.
arXiv Detail & Related papers (2023-01-26T23:03:03Z)
Object-fabrication Targeted Attack for Object Detection [54.10697546734503]
adversarial attack for object detection contains targeted attack and untargeted attack. New object-fabrication targeted attack mode can mislead detectors tofabricate extra false objects with specific target labels.
arXiv Detail & Related papers (2022-12-13T08:42:39Z)
Zero-Query Transfer Attacks on Context-Aware Object Detectors [95.18656036716972]
Adversarial attacks perturb images such that a deep neural network produces incorrect classification results. A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check. We present the first approach for generating context-consistent adversarial attacks that can evade the context-consistency check.
arXiv Detail & Related papers (2022-03-29T04:33:06Z)
Identifying Adversarial Attacks on Text Classifiers [32.958568467774704]
In this paper, we analyze adversarial text to determine which methods were used to create it. Our first contribution is an extensive dataset for attack detection and labeling. As our second contribution, we use this dataset to develop and benchmark a number of classifiers for attack identification.
arXiv Detail & Related papers (2022-01-21T06:16:04Z)
Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label. Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm. Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z)
Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data. We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level. Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
Universal Adversarial Attacks with Natural Triggers for Text Classification [30.74579821832117]
We develop adversarial attacks that appear closer to natural English phrases and yet confuse classification systems. Our attacks effectively reduce model accuracy on classification tasks while being less identifiable than prior models.
arXiv Detail & Related papers (2020-05-01T01:58:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.