Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against
Fact-Verification Systems
- URL: http://arxiv.org/abs/2209.03755v4
- Date: Fri, 16 Jun 2023 11:33:34 GMT
- Title: Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against
Fact-Verification Systems
- Authors: Sahar Abdelnabi and Mario Fritz
- Abstract summary: We show that it is possible to subtly modify claim-salient snippets in the evidence and generate diverse and claim-aligned evidence.
The attacks are also robust against post-hoc modifications of the claim.
These attacks can have harmful implications on the inspectable and human-in-the-loop usage scenarios.
- Score: 80.3811072650087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mis- and disinformation are a substantial global threat to our security and
safety. To cope with the scale of online misinformation, researchers have been
working on automating fact-checking by retrieving and verifying against
relevant evidence. However, despite many advances, a comprehensive evaluation
of the possible attack vectors against such systems is still lacking.
Particularly, the automated fact-verification process might be vulnerable to
the exact disinformation campaigns it is trying to combat. In this work, we
assume an adversary that automatically tampers with the online evidence in
order to disrupt the fact-checking model via camouflaging the relevant evidence
or planting a misleading one. We first propose an exploratory taxonomy that
spans these two targets and the different threat model dimensions. Guided by
this, we design and propose several potential attack methods. We show that it
is possible to subtly modify claim-salient snippets in the evidence and
generate diverse and claim-aligned evidence. Thus, we highly degrade the
fact-checking performance under many different permutations of the taxonomy's
dimensions. The attacks are also robust against post-hoc modifications of the
claim. Our analysis further hints at potential limitations in models' inference
when faced with contradicting evidence. We emphasize that these attacks can
have harmful implications on the inspectable and human-in-the-loop usage
scenarios of such models, and we conclude by discussing challenges and
directions for future defenses.
Related papers
- Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information.
By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples.
We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks [76.35478518372692]
We introduce epsilon-illusory, a novel form of adversarial attack on sequential decision-makers.
Compared to existing attacks, we empirically find epsilon-illusory to be significantly harder to detect with automated methods.
Our findings suggest the need for better anomaly detectors, as well as effective hardware- and system-level defenses.
arXiv Detail & Related papers (2022-07-20T19:49:09Z) - Btech thesis report on adversarial attack detection and purification of
adverserially attacked images [0.0]
This thesis report is on detection and purification of adverserially attacked images.
A deep learning model is trained on certain training examples for various tasks such as classification, regression etc.
arXiv Detail & Related papers (2022-05-09T09:24:11Z) - Synthetic Disinformation Attacks on Automated Fact Verification Systems [53.011635547834025]
We explore the sensitivity of automated fact-checkers to synthetic adversarial evidence in two simulated settings.
We show that these systems suffer significant performance drops against these attacks.
We discuss the growing threat of modern NLG systems as generators of disinformation.
arXiv Detail & Related papers (2022-02-18T19:01:01Z) - Adversarial Attacks for Tabular Data: Application to Fraud Detection and
Imbalanced Data [3.2458203725405976]
Adversarial attacks aim at producing adversarial examples, in other words, slightly modified inputs that induce the AI system to return incorrect outputs.
In this paper we illustrate a novel approach to modify and adapt state-of-the-art algorithms to imbalanced data, in the context of fraud detection.
Experimental results show that the proposed modifications lead to a perfect attack success rate.
When applied to a real-world production system, the proposed techniques shows the possibility of posing a serious threat to the robustness of advanced AI-based fraud detection procedures.
arXiv Detail & Related papers (2021-01-20T08:58:29Z) - An Empirical Review of Adversarial Defenses [0.913755431537592]
Deep neural networks, which form the basis of such systems, are highly susceptible to a specific type of attack, called adversarial attacks.
A hacker can, even with bare minimum computation, generate adversarial examples (images or data points that belong to another class, but consistently fool the model to get misclassified as genuine) and crumble the basis of such algorithms.
We show two effective techniques, namely Dropout and Denoising Autoencoders, and show their success in preventing such attacks from fooling the model.
arXiv Detail & Related papers (2020-12-10T09:34:41Z) - DeSePtion: Dual Sequence Prediction and Adversarial Examples for
Improved Fact-Checking [46.13738685855884]
We show that current systems for fact-checking are vulnerable to three categories of realistic challenges for fact-checking.
We present a system designed to be resilient to these "attacks" using multiple pointer networks for document selection.
We find that in handling these attacks we obtain state-of-the-art results on FEVER, largely due to improved evidence retrieval.
arXiv Detail & Related papers (2020-04-27T15:18:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.