Logically Consistent Adversarial Attacks for Soft Theorem Provers
- URL: http://arxiv.org/abs/2205.00047v1
- Date: Fri, 29 Apr 2022 19:10:12 GMT
- Title: Logically Consistent Adversarial Attacks for Soft Theorem Provers
- Authors: Alexander Gaskell, Yishu Miao, Lucia Specia, Francesca Toni
- Abstract summary: We propose a generative adversarial framework for probing and improving language models' reasoning capabilities.
Our framework successfully generates adversarial attacks and identifies global weaknesses.
In addition to effective probing, we show that training on the generated samples improves the target model's performance.
- Score: 110.17147570572939
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent efforts within the AI community have yielded impressive results
towards "soft theorem proving" over natural language sentences using language
models. We propose a novel, generative adversarial framework for probing and
improving these models' reasoning capabilities. Adversarial attacks in this
domain suffer from the logical inconsistency problem, whereby perturbations to
the input may alter the label. Our Logically consistent AdVersarial Attacker,
LAVA, addresses this by combining a structured generative process with a
symbolic solver, guaranteeing logical consistency. Our framework successfully
generates adversarial attacks and identifies global weaknesses common across
multiple target models. Our analyses reveal naive heuristics and
vulnerabilities in these models' reasoning capabilities, exposing an incomplete
grasp of logical deduction under logic programs. Finally, in addition to
effective probing of these models, we show that training on the generated
samples improves the target model's performance.
Related papers
- MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning [73.77088902676306]
We take a closer look at the self-verification abilities of large language models (LLMs) in the context of logical reasoning.
Our main findings suggest that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self-verification methods.
arXiv Detail & Related papers (2023-11-14T07:13:10Z) - Are LLMs Rigorous Logical Reasoner? Empowering Natural Language Proof
Generation with Contrastive Stepwise Decoding [11.385103498440932]
We introduce contrastive decoding to stepwise proof generation, making use of negative reasoning paths to strengthen the model's capacity for logical deduction.
Experiments on EntailmentBank underscore the success of our method in augmenting the proof planning abilities of language models.
arXiv Detail & Related papers (2023-11-12T05:12:49Z) - NatLogAttack: A Framework for Attacking Natural Language Inference Models with Natural Logic [25.09127185703912]
Adversarial attacks have proven to be an important tool to help evaluate the Achilles' heel of the victim models.
We propose NatLogAttack to perform systematic attacks centring around natural logic.
We show that compared to the existing attack models, NatLogAttack generates better adversarial examples with fewer visits to the victim models.
arXiv Detail & Related papers (2023-07-06T08:32:14Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Adversarial Attack and Defense of Structured Prediction Models [58.49290114755019]
In this paper, we investigate attacks and defenses for structured prediction tasks in NLP.
The structured output of structured prediction models is sensitive to small perturbations in the input.
We propose a novel and unified framework that learns to attack a structured prediction model using a sequence-to-sequence model.
arXiv Detail & Related papers (2020-10-04T15:54:03Z) - Certifying Decision Trees Against Evasion Attacks by Program Analysis [9.290879387995401]
We propose a novel technique to verify the security of machine learning models against evasion attacks.
Our approach exploits the interpretability property of decision trees to transform them into imperative programs.
Our experiments show that our technique is both precise and efficient, yielding only a minimal number of false positives.
arXiv Detail & Related papers (2020-07-06T14:18:10Z) - Extending Adversarial Attacks to Produce Adversarial Class Probability
Distributions [1.439518478021091]
We show that we can approximate any probability distribution for the classes while maintaining a high fooling rate.
Our results demonstrate that we can closely approximate any probability distribution for the classes while maintaining a high fooling rate.
arXiv Detail & Related papers (2020-04-14T09:39:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.