Related papers: Defending Against Disinformation Attacks in Open-Domain Question Answering

Defending Against Disinformation Attacks in Open-Domain Question Answering

URL: http://arxiv.org/abs/2212.10002v3
Date: Mon, 26 Feb 2024 20:52:59 GMT
Title: Defending Against Disinformation Attacks in Open-Domain Question Answering
Authors: Orion Weller, Aleem Khan, Nathaniel Weir, Dawn Lawrie, Benjamin Van Durme
Abstract summary: adversarial poisoning of the search collection can cause large drops in accuracy for production systems. We introduce a method that uses query augmentation to search for a diverse set of passages that could answer the original question but are less likely to have been poisoned.
Score: 39.22018783998232
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work in open-domain question answering (ODQA) has shown that adversarial poisoning of the search collection can cause large drops in accuracy for production systems. However, little to no work has proposed methods to defend against these attacks. To do so, we rely on the intuition that redundant information often exists in large corpora. To find it, we introduce a method that uses query augmentation to search for a diverse set of passages that could answer the original question but are less likely to have been poisoned. We integrate these new passages into the model through the design of a novel confidence method, comparing the predicted answer to its appearance in the retrieved contexts (what we call Confidence from Answer Redundancy, i.e. CAR). Together these methods allow for a simple but effective way to defend against poisoning attacks that provides gains of nearly 20% exact match across varying levels of data poisoning/knowledge conflicts.

Related papers

Poisoning Retrieval Corpora by Injecting Adversarial Passages [79.14287273842878]
We propose a novel attack for dense retrieval systems in which a malicious user generates a small number of adversarial passages. When these adversarial passages are inserted into a large retrieval corpus, we show that this attack is highly effective in fooling these systems. We also benchmark and compare a range of state-of-the-art dense retrievers, both unsupervised and supervised.
arXiv Detail & Related papers (2023-10-29T21:13:31Z)
Mitigating Backdoor Poisoning Attacks through the Lens of Spurious Correlation [43.75579468533781]
backdoors can be implanted through crafting training instances with a specific trigger and a target label. This paper posits that backdoor poisoning attacks exhibit emphspurious correlation between simple text features and classification labels. Our empirical study reveals that the malicious triggers are highly correlated to their target labels.
arXiv Detail & Related papers (2023-05-19T11:18:20Z)
TASA: Deceiving Question Answering Models by Twin Answer Sentences Attack [93.50174324435321]
We present Twin Answer Sentences Attack (TASA), an adversarial attack method for question answering (QA) models. TASA produces fluent and grammatical adversarial contexts while maintaining gold answers.
arXiv Detail & Related papers (2022-10-27T07:16:30Z)
Towards A Conceptually Simple Defensive Approach for Few-shot classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks. We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering. Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z)
ADC: Adversarial attacks against object Detection that evade Context consistency checks [55.8459119462263]
We show that even context consistency checks can be brittle to properly crafted adversarial examples. We propose an adaptive framework to generate examples that subvert such defenses. Our results suggest that how to robustly model context and check its consistency, is still an open problem.
arXiv Detail & Related papers (2021-10-24T00:25:09Z)
TREATED:Towards Universal Defense against Textual Adversarial Attacks [28.454310179377302]
We propose TREATED, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions. Extensive experiments on three competitive neural networks and two widely used datasets show that our method achieves better detection performance than baselines.
arXiv Detail & Related papers (2021-09-13T03:31:20Z)
BAARD: Blocking Adversarial Examples by Testing for Applicability, Reliability and Decidability [12.079529913120593]
Adversarial defenses protect machine learning models from adversarial attacks, but are often tailored to one type of model or attack. We take inspiration from the concept of Applicability Domain in cheminformatics. We propose a simple yet robust triple-stage data-driven framework that checks the input globally and locally.
arXiv Detail & Related papers (2021-05-02T15:24:33Z)
Defensive Few-shot Learning [77.82113573388133]
This paper investigates a new challenging problem called defensive few-shot learning. It aims to learn a robust few-shot model against adversarial attacks. The proposed framework can effectively make the existing few-shot models robust against adversarial attacks.
arXiv Detail & Related papers (2019-11-16T05:57:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.