Related papers: Logically Consistent Adversarial Attacks for Soft Theorem Provers

Logically Consistent Adversarial Attacks for Soft Theorem Provers

URL: http://arxiv.org/abs/2205.00047v1
Date: Fri, 29 Apr 2022 19:10:12 GMT
Title: Logically Consistent Adversarial Attacks for Soft Theorem Provers
Authors: Alexander Gaskell, Yishu Miao, Lucia Specia, Francesca Toni
Abstract summary: We propose a generative adversarial framework for probing and improving language models' reasoning capabilities. Our framework successfully generates adversarial attacks and identifies global weaknesses. In addition to effective probing, we show that training on the generated samples improves the target model's performance.
Score: 110.17147570572939
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent efforts within the AI community have yielded impressive results towards "soft theorem proving" over natural language sentences using language models. We propose a novel, generative adversarial framework for probing and improving these models' reasoning capabilities. Adversarial attacks in this domain suffer from the logical inconsistency problem, whereby perturbations to the input may alter the label. Our Logically consistent AdVersarial Attacker, LAVA, addresses this by combining a structured generative process with a symbolic solver, guaranteeing logical consistency. Our framework successfully generates adversarial attacks and identifies global weaknesses common across multiple target models. Our analyses reveal naive heuristics and vulnerabilities in these models' reasoning capabilities, exposing an incomplete grasp of logical deduction under logic programs. Finally, in addition to effective probing of these models, we show that training on the generated samples improves the target model's performance.

Related papers

Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage. Models may behave unreliably due to poorly explored failure modes. causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models. Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs. Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z)
A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning [73.77088902676306]
We take a closer look at the self-verification abilities of large language models (LLMs) in the context of logical reasoning. Our main findings suggest that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self-verification methods.
arXiv Detail & Related papers (2023-11-14T07:13:10Z)
Are LLMs Rigorous Logical Reasoner? Empowering Natural Language Proof Generation with Contrastive Stepwise Decoding [11.385103498440932]
We introduce contrastive decoding to stepwise proof generation, making use of negative reasoning paths to strengthen the model's capacity for logical deduction. Experiments on EntailmentBank underscore the success of our method in augmenting the proof planning abilities of language models.
arXiv Detail & Related papers (2023-11-12T05:12:49Z)
NatLogAttack: A Framework for Attacking Natural Language Inference Models with Natural Logic [25.09127185703912]
Adversarial attacks have proven to be an important tool to help evaluate the Achilles' heel of the victim models. We propose NatLogAttack to perform systematic attacks centring around natural logic. We show that compared to the existing attack models, NatLogAttack generates better adversarial examples with fewer visits to the victim models.
arXiv Detail & Related papers (2023-07-06T08:32:14Z)
In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks. Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks. We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z)
Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals. It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation. It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z)
Adversarial Attack and Defense of Structured Prediction Models [58.49290114755019]
In this paper, we investigate attacks and defenses for structured prediction tasks in NLP. The structured output of structured prediction models is sensitive to small perturbations in the input. We propose a novel and unified framework that learns to attack a structured prediction model using a sequence-to-sequence model.
arXiv Detail & Related papers (2020-10-04T15:54:03Z)
Certifying Decision Trees Against Evasion Attacks by Program Analysis [9.290879387995401]
We propose a novel technique to verify the security of machine learning models against evasion attacks. Our approach exploits the interpretability property of decision trees to transform them into imperative programs. Our experiments show that our technique is both precise and efficient, yielding only a minimal number of false positives.
arXiv Detail & Related papers (2020-07-06T14:18:10Z)
Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions [1.439518478021091]
We show that we can approximate any probability distribution for the classes while maintaining a high fooling rate. Our results demonstrate that we can closely approximate any probability distribution for the classes while maintaining a high fooling rate.
arXiv Detail & Related papers (2020-04-14T09:39:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.