Enhancing Model Robustness By Incorporating Adversarial Knowledge Into
Semantic Representation
- URL: http://arxiv.org/abs/2102.11584v1
- Date: Tue, 23 Feb 2021 09:47:45 GMT
- Title: Enhancing Model Robustness By Incorporating Adversarial Knowledge Into
Semantic Representation
- Authors: Jinfeng Li, Tianyu Du, Xiangyu Liu, Rong Zhang, Hui Xue, Shouling Ji
- Abstract summary: AdvGraph is a novel defense which enhances the robustness of Chinese-based NLP models.
It incorporates adversarial knowledge into the semantic representation of the input.
- Score: 42.23608639683468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite that deep neural networks (DNNs) have achieved enormous success in
many domains like natural language processing (NLP), they have also been proven
to be vulnerable to maliciously generated adversarial examples. Such inherent
vulnerability has threatened various real-world deployed DNNs-based
applications. To strength the model robustness, several countermeasures have
been proposed in the English NLP domain and obtained satisfactory performance.
However, due to the unique language properties of Chinese, it is not trivial to
extend existing defenses to the Chinese domain. Therefore, we propose AdvGraph,
a novel defense which enhances the robustness of Chinese-based NLP models by
incorporating adversarial knowledge into the semantic representation of the
input. Extensive experiments on two real-world tasks show that AdvGraph
exhibits better performance compared with previous work: (i) effective - it
significantly strengthens the model robustness even under the adaptive attacks
setting without negative impact on model performance over legitimate input;
(ii) generic - its key component, i.e., the representation of connotative
adversarial knowledge is task-agnostic, which can be reused in any
Chinese-based NLP models without retraining; and (iii) efficient - it is a
light-weight defense with sub-linear computational complexity, which can
guarantee the efficiency required in practical scenarios.
Related papers
- Enhancing adversarial robustness in Natural Language Inference using explanations [41.46494686136601]
We cast the spotlight on the underexplored task of Natural Language Inference (NLI)
We validate the usage of natural language explanation as a model-agnostic defence strategy through extensive experimentation.
We research the correlation of widely used language generation metrics with human perception, in order for them to serve as a proxy towards robust NLI models.
arXiv Detail & Related papers (2024-09-11T17:09:49Z) - Adversarial Attacks and Defense for Conversation Entailment Task [0.49157446832511503]
Large language models are vulnerable to low-cost adversarial attacks.
We fine-tune a transformer model to accurately discern the truthfulness of hypotheses.
We introduce an embedding perturbation loss method to bolster the model's robustness.
arXiv Detail & Related papers (2024-05-01T02:49:18Z) - RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content [62.685566387625975]
Current mitigation strategies, while effective, are not resilient under adversarial attacks.
This paper introduces Resilient Guardrails for Large Language Models (RigorLLM), a novel framework designed to efficiently moderate harmful and unsafe inputs.
arXiv Detail & Related papers (2024-03-19T07:25:02Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - Dynamic Transformers Provide a False Sense of Efficiency [75.39702559746533]
Multi-exit models make a trade-off between efficiency and accuracy, where the saving of computation comes from an early exit.
We propose a simple yet effective attacking framework, SAME, which is specially tailored to reduce the efficiency of the multi-exit models.
Experiments on the GLUE benchmark show that SAME can effectively diminish the efficiency gain of various multi-exit models by 80% on average.
arXiv Detail & Related papers (2023-05-20T16:41:48Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - AED: An black-box NLP classifier model attacker [8.15167980163668]
Deep Neural Networks (DNNs) have been successful in solving real-world tasks in domains such as connected and automated vehicles, disease, and job hiring.
There is a growing concern regarding the potential bias and robustness of these DNN models.
We propose a word-level NLP classifier attack model called "AED," which stands for Attention mechanism enabled post-model Explanation.
arXiv Detail & Related papers (2021-12-22T04:25:23Z) - Evaluating the Robustness of Neural Language Models to Input
Perturbations [7.064032374579076]
In this study, we design and implement various types of character-level and word-level perturbation methods to simulate noisy input texts.
We investigate the ability of high-performance language models such as BERT, XLNet, RoBERTa, and ELMo in handling different types of input perturbations.
The results suggest that language models are sensitive to input perturbations and their performance can decrease even when small changes are introduced.
arXiv Detail & Related papers (2021-08-27T12:31:17Z) - Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial
Attacks [10.913817907524454]
We propose a Knowledge Enhanced Machine Learning Pipeline (KEMLP) to integrate domain knowledge into a graphical model.
In particular, we develop KEMLP by integrating a diverse set of weak auxiliary models based on their logical relationships to the main DNN model.
We show that compared with adversarial training and other baselines, KEMLP achieves higher robustness against physical attacks, $mathcalL_p$ bounded attacks, unforeseen attacks, and natural corruptions.
arXiv Detail & Related papers (2021-06-11T08:37:53Z) - Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood
Ensemble [163.3333439344695]
Dirichlet Neighborhood Ensemble (DNE) is a randomized smoothing method for training a robust model to defense substitution-based attacks.
DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data.
We demonstrate through extensive experimentation that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
arXiv Detail & Related papers (2020-06-20T18:01:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.