NatLogAttack: A Framework for Attacking Natural Language Inference
Models with Natural Logic
- URL: http://arxiv.org/abs/2307.02849v1
- Date: Thu, 6 Jul 2023 08:32:14 GMT
- Title: NatLogAttack: A Framework for Attacking Natural Language Inference
Models with Natural Logic
- Authors: Zi'ou Zheng and Xiaodan Zhu
- Abstract summary: Adversarial attacks have proven to be an important tool to help evaluate the Achilles' heel of the victim models.
We propose NatLogAttack to perform systematic attacks centring around natural logic.
We show that compared to the existing attack models, NatLogAttack generates better adversarial examples with fewer visits to the victim models.
- Score: 20.75385153947842
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reasoning has been a central topic in artificial intelligence from the
beginning. The recent progress made on distributed representation and neural
networks continues to improve the state-of-the-art performance of natural
language inference. However, it remains an open question whether the models
perform real reasoning to reach their conclusions or rely on spurious
correlations. Adversarial attacks have proven to be an important tool to help
evaluate the Achilles' heel of the victim models. In this study, we explore the
fundamental problem of developing attack models based on logic formalism. We
propose NatLogAttack to perform systematic attacks centring around natural
logic, a classical logic formalism that is traceable back to Aristotle's
syllogism and has been closely developed for natural language inference. The
proposed framework renders both label-preserving and label-flipping attacks. We
show that compared to the existing attack models, NatLogAttack generates better
adversarial examples with fewer visits to the victim models. The victim models
are found to be more vulnerable under the label-flipping setting. NatLogAttack
provides a tool to probe the existing and future NLI models' capacity from a
key viewpoint and we hope more logic-based attacks will be further explored for
understanding the desired property of reasoning.
Related papers
- A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning [73.77088902676306]
We take a closer look at the self-verification abilities of large language models (LLMs) in the context of logical reasoning.
Our main findings suggest that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self-verification methods.
arXiv Detail & Related papers (2023-11-14T07:13:10Z) - Case-Based Reasoning with Language Models for Classification of Logical
Fallacies [3.511369967593153]
We propose a Case-Based Reasoning method that classifies new cases of logical fallacy.
Our experiments indicate that Case-Based Reasoning improves the accuracy and generalizability of language models.
arXiv Detail & Related papers (2023-01-27T17:49:16Z) - NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as
Artificial Adversaries? [61.58261351116679]
We introduce a two-stage adversarial example generation framework (NaturalAdversaries) for natural language understanding tasks.
It is adaptable to both black-box and white-box adversarial attacks based on the level of access to the model parameters.
Our results indicate these adversaries generalize across domains, and offer insights for future research on improving robustness of neural text classification models.
arXiv Detail & Related papers (2022-11-08T16:37:34Z) - A Unified Evaluation of Textual Backdoor Learning: Frameworks and
Benchmarks [72.7373468905418]
We develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning.
We also propose CUBE, a simple yet strong clustering-based defense baseline.
arXiv Detail & Related papers (2022-06-17T02:29:23Z) - Logically Consistent Adversarial Attacks for Soft Theorem Provers [110.17147570572939]
We propose a generative adversarial framework for probing and improving language models' reasoning capabilities.
Our framework successfully generates adversarial attacks and identifies global weaknesses.
In addition to effective probing, we show that training on the generated samples improves the target model's performance.
arXiv Detail & Related papers (2022-04-29T19:10:12Z) - Can Rationalization Improve Robustness? [39.741059642044874]
We investigate whether neural NLP models can provide robustness to adversarial attacks in addition to their interpretable nature.
We generate various types of 'AddText' attacks for both token and sentence-level rationalization tasks.
Our experiments reveal that the rationale models show the promise to improve robustness, while they struggle in certain scenarios.
arXiv Detail & Related papers (2022-04-25T17:02:42Z) - SparseFed: Mitigating Model Poisoning Attacks in Federated Learning with
Sparsification [24.053704318868043]
In model poisoning attacks, the attacker reduces the model's performance on targeted sub-tasks by uploading "poisoned" updates.
We introduce algoname, a novel defense that uses global top-k update sparsification and device-level clipping gradient to mitigate model poisoning attacks.
arXiv Detail & Related papers (2021-12-12T16:34:52Z) - Learning to Rationalize for Nonmonotonic Reasoning with Distant
Supervision [44.32874972577682]
We investigate the extent to which neural models can reason about natural language rationales that explain model predictions.
We use pre-trained language models, neural knowledge models, and distant supervision from related tasks.
Our model shows promises at generating post-hoc rationales explaining why an inference is more or less likely given the additional information.
arXiv Detail & Related papers (2020-12-14T23:50:20Z) - Adversarial Attack and Defense of Structured Prediction Models [58.49290114755019]
In this paper, we investigate attacks and defenses for structured prediction tasks in NLP.
The structured output of structured prediction models is sensitive to small perturbations in the input.
We propose a novel and unified framework that learns to attack a structured prediction model using a sequence-to-sequence model.
arXiv Detail & Related papers (2020-10-04T15:54:03Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.