NatLogAttack: A Framework for Attacking Natural Language Inference Models with Natural Logic
- URL: http://arxiv.org/abs/2307.02849v2
- Date: Fri, 11 Oct 2024 00:45:25 GMT
- Title: NatLogAttack: A Framework for Attacking Natural Language Inference Models with Natural Logic
- Authors: Zi'ou Zheng, Xiaodan Zhu,
- Abstract summary: Adversarial attacks have proven to be an important tool to help evaluate the Achilles' heel of the victim models.
We propose NatLogAttack to perform systematic attacks centring around natural logic.
We show that compared to the existing attack models, NatLogAttack generates better adversarial examples with fewer visits to the victim models.
- Score: 25.09127185703912
- License:
- Abstract: Reasoning has been a central topic in artificial intelligence from the beginning. The recent progress made on distributed representation and neural networks continues to improve the state-of-the-art performance of natural language inference. However, it remains an open question whether the models perform real reasoning to reach their conclusions or rely on spurious correlations. Adversarial attacks have proven to be an important tool to help evaluate the Achilles' heel of the victim models. In this study, we explore the fundamental problem of developing attack models based on logic formalism. We propose NatLogAttack to perform systematic attacks centring around natural logic, a classical logic formalism that is traceable back to Aristotle's syllogism and has been closely developed for natural language inference. The proposed framework renders both label-preserving and label-flipping attacks. We show that compared to the existing attack models, NatLogAttack generates better adversarial examples with fewer visits to the victim models. The victim models are found to be more vulnerable under the label-flipping setting. NatLogAttack provides a tool to probe the existing and future NLI models' capacity from a key viewpoint and we hope more logic-based attacks will be further explored for understanding the desired property of reasoning.
Related papers
- Adversarial Attack for Explanation Robustness of Rationalization Models [17.839644167949906]
Rationalization models select a subset of input text as rationale-crucial for humans to understand and trust predictions.
This paper aims to undermine the explainability of rationalization models without altering their predictions, thereby eliciting distrust in these models from human users.
arXiv Detail & Related papers (2024-08-20T12:43:58Z) - A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning [73.77088902676306]
We take a closer look at the self-verification abilities of large language models (LLMs) in the context of logical reasoning.
Our main findings suggest that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self-verification methods.
arXiv Detail & Related papers (2023-11-14T07:13:10Z) - Case-Based Reasoning with Language Models for Classification of Logical
Fallacies [3.511369967593153]
We propose a Case-Based Reasoning method that classifies new cases of logical fallacy.
Our experiments indicate that Case-Based Reasoning improves the accuracy and generalizability of language models.
arXiv Detail & Related papers (2023-01-27T17:49:16Z) - NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as
Artificial Adversaries? [61.58261351116679]
We introduce a two-stage adversarial example generation framework (NaturalAdversaries) for natural language understanding tasks.
It is adaptable to both black-box and white-box adversarial attacks based on the level of access to the model parameters.
Our results indicate these adversaries generalize across domains, and offer insights for future research on improving robustness of neural text classification models.
arXiv Detail & Related papers (2022-11-08T16:37:34Z) - A Unified Evaluation of Textual Backdoor Learning: Frameworks and
Benchmarks [72.7373468905418]
We develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning.
We also propose CUBE, a simple yet strong clustering-based defense baseline.
arXiv Detail & Related papers (2022-06-17T02:29:23Z) - Logically Consistent Adversarial Attacks for Soft Theorem Provers [110.17147570572939]
We propose a generative adversarial framework for probing and improving language models' reasoning capabilities.
Our framework successfully generates adversarial attacks and identifies global weaknesses.
In addition to effective probing, we show that training on the generated samples improves the target model's performance.
arXiv Detail & Related papers (2022-04-29T19:10:12Z) - Can Rationalization Improve Robustness? [39.741059642044874]
We investigate whether neural NLP models can provide robustness to adversarial attacks in addition to their interpretable nature.
We generate various types of 'AddText' attacks for both token and sentence-level rationalization tasks.
Our experiments reveal that the rationale models show the promise to improve robustness, while they struggle in certain scenarios.
arXiv Detail & Related papers (2022-04-25T17:02:42Z) - Learning to Rationalize for Nonmonotonic Reasoning with Distant
Supervision [44.32874972577682]
We investigate the extent to which neural models can reason about natural language rationales that explain model predictions.
We use pre-trained language models, neural knowledge models, and distant supervision from related tasks.
Our model shows promises at generating post-hoc rationales explaining why an inference is more or less likely given the additional information.
arXiv Detail & Related papers (2020-12-14T23:50:20Z) - Adversarial Attack and Defense of Structured Prediction Models [58.49290114755019]
In this paper, we investigate attacks and defenses for structured prediction tasks in NLP.
The structured output of structured prediction models is sensitive to small perturbations in the input.
We propose a novel and unified framework that learns to attack a structured prediction model using a sequence-to-sequence model.
arXiv Detail & Related papers (2020-10-04T15:54:03Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.