Adversarial Examples for Extreme Multilabel Text Classification
- URL: http://arxiv.org/abs/2112.07512v1
- Date: Tue, 14 Dec 2021 16:20:37 GMT
- Title: Adversarial Examples for Extreme Multilabel Text Classification
- Authors: Mohammadreza Qaraei and Rohit Babbar
- Abstract summary: Extreme Multilabel Text Classification (XMTC) is a text classification problem in which the output space is extremely large.
The robustness of deep learning based XMTC models against adversarial examples has been largely underexplored.
We show that XMTC models are highly vulnerable to positive-targeted attacks but more robust to negative-targeted ones.
- Score: 2.1549398927094874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extreme Multilabel Text Classification (XMTC) is a text classification
problem in which, (i) the output space is extremely large, (ii) each data point
may have multiple positive labels, and (iii) the data follows a strongly
imbalanced distribution. With applications in recommendation systems and
automatic tagging of web-scale documents, the research on XMTC has been focused
on improving prediction accuracy and dealing with imbalanced data. However, the
robustness of deep learning based XMTC models against adversarial examples has
been largely underexplored.
In this paper, we investigate the behaviour of XMTC models under adversarial
attacks. To this end, first, we define adversarial attacks in multilabel text
classification problems. We categorize attacking multilabel text classifiers as
(a) positive-targeted, where the target positive label should fall out of top-k
predicted labels, and (b) negative-targeted, where the target negative label
should be among the top-k predicted labels. Then, by experiments on APLC-XLNet
and AttentionXML, we show that XMTC models are highly vulnerable to
positive-targeted attacks but more robust to negative-targeted ones.
Furthermore, our experiments show that the success rate of positive-targeted
adversarial attacks has an imbalanced distribution. More precisely, tail
classes are highly vulnerable to adversarial attacks for which an attacker can
generate adversarial samples with high similarity to the actual data-points. To
overcome this problem, we explore the effect of rebalanced loss functions in
XMTC where not only do they increase accuracy on tail classes, but they also
improve the robustness of these classes against adversarial attacks. The code
for our experiments is available at https://github.com/xmc-aalto/adv-xmtc
Related papers
- Federated Learning Under Attack: Exposing Vulnerabilities through Data
Poisoning Attacks in Computer Networks [17.857547954232754]
Federated Learning (FL) is a machine learning approach that enables multiple decentralized devices or edge servers to collaboratively train a shared model without exchanging raw data.
During the training and sharing of model updates between clients and servers, data and models are susceptible to different data-poisoning attacks.
We considered two types of data-poisoning attacks, label flipping (LF) and feature poisoning (FP), and applied them with a novel approach.
arXiv Detail & Related papers (2024-03-05T14:03:15Z) - DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data.
Recent attack methods can achieve a relatively high attack success rate (ASR)
We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z) - JointMatch: A Unified Approach for Diverse and Collaborative
Pseudo-Labeling to Semi-Supervised Text Classification [65.268245109828]
Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data.
Existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation.
We propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning.
arXiv Detail & Related papers (2023-10-23T05:43:35Z) - Adversarial Attacks Neutralization via Data Set Randomization [3.655021726150369]
Adversarial attacks on deep learning models pose a serious threat to their reliability and security.
We propose a new defense mechanism that is rooted on hyperspace projection.
We show that our solution increases the robustness of deep learning models against adversarial attacks.
arXiv Detail & Related papers (2023-06-21T10:17:55Z) - Adversarial Attacks are a Surprisingly Strong Baseline for Poisoning
Few-Shot Meta-Learners [28.468089304148453]
We attack amortized meta-learners, which allows us to craft colluding sets of inputs that fool the system's learning algorithm.
We show that in a white box setting, these attacks are very successful and can cause the target model's predictions to become worse than chance.
We explore two hypotheses to explain this: 'overfitting' by the attack, and mismatch between the model on which the attack is generated and that to which the attack is transferred.
arXiv Detail & Related papers (2022-11-23T14:55:44Z) - Detection and Mitigation of Byzantine Attacks in Distributed Training [24.951227624475443]
An abnormal Byzantine behavior of the worker nodes can derail the training and compromise the quality of the inference.
Recent work considers a wide range of attack models and has explored robust aggregation and/or computational redundancy to correct the distorted gradients.
In this work, we consider attack models ranging from strong ones: $q$ omniscient adversaries with full knowledge of the defense protocol that can change from iteration to iteration to weak ones: $q$ randomly chosen adversaries with limited collusion abilities.
arXiv Detail & Related papers (2022-08-17T05:49:52Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Robustness May Be at Odds with Fairness: An Empirical Study on
Class-wise Accuracy [85.20742045853738]
CNNs are widely known to be vulnerable to adversarial attacks.
We propose an empirical study on the class-wise accuracy and robustness of adversarially trained models.
We find that there exists inter-class discrepancy for accuracy and robustness even when the training dataset has an equal number of samples for each class.
arXiv Detail & Related papers (2020-10-26T06:32:32Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.