Verifying the Robustness of Automatic Credibility Assessment
- URL: http://arxiv.org/abs/2303.08032v2
- Date: Fri, 11 Aug 2023 09:59:07 GMT
- Title: Verifying the Robustness of Automatic Credibility Assessment
- Authors: Piotr Przyby{\l}a, Alexander Shvets, Horacio Saggion
- Abstract summary: Text classification methods have been widely investigated as a way to detect content of low credibility.
In some cases insignificant changes in input text can mislead the models.
We introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
- Score: 79.08422736721764
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text classification methods have been widely investigated as a way to detect
content of low credibility: fake news, social media bots, propaganda, etc.
Quite accurate models (likely based on deep neural networks) help in moderating
public electronic platforms and often cause content creators to face rejection
of their submissions or removal of already published texts. Having the
incentive to evade further detection, content creators try to come up with a
slightly modified version of the text (known as an attack with an adversarial
example) that exploit the weaknesses of classifiers and result in a different
output. Here we systematically test the robustness of popular text classifiers
against available attacking techniques and discover that, indeed, in some cases
insignificant changes in input text can mislead the models. We also introduce
BODEGA: a benchmark for testing both victim models and attack methods on four
misinformation detection tasks in an evaluation framework designed to simulate
real use-cases of content moderation. Finally, we manually analyse a subset
adversarial examples and check what kinds of modifications are used in
successful attacks. The BODEGA code and data is openly shared in hope of
enhancing the comparability and replicability of further research in this area
Related papers
- Attacking Misinformation Detection Using Adversarial Examples Generated by Language Models [0.0]
We investigate the challenge of generating adversarial examples to test the robustness of text classification algorithms.
We focus on simulation of content moderation by setting realistic limits on the number of queries an attacker is allowed to attempt.
arXiv Detail & Related papers (2024-10-28T11:46:30Z) - On Adversarial Examples for Text Classification by Perturbing Latent Representations [0.0]
We show that deep learning is vulnerable to adversarial examples in text classification.
This weakness indicates that deep learning is not very robust.
We create a framework that measures the robustness of a text classifier by using the gradients of the classifier.
arXiv Detail & Related papers (2024-05-06T18:45:18Z) - Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation [52.72682366640554]
Authorship Verification (AV) is a text classification task concerned with inferring whether a candidate text has been written by one specific author or by someone else.
It has been shown that many AV systems are vulnerable to adversarial attacks, where a malicious author actively tries to fool the classifier by either concealing their writing style, or by imitating the style of another author.
arXiv Detail & Related papers (2024-03-17T16:36:26Z) - Improving the Robustness of Summarization Systems with Dual Augmentation [68.53139002203118]
A robust summarization system should be able to capture the gist of the document, regardless of the specific word choices or noise in the input.
We first explore the summarization models' robustness against perturbations including word-level synonym substitution and noise.
We propose a SummAttacker, which is an efficient approach to generating adversarial samples based on language models.
arXiv Detail & Related papers (2023-06-01T19:04:17Z) - A Unified Evaluation of Textual Backdoor Learning: Frameworks and
Benchmarks [72.7373468905418]
We develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning.
We also propose CUBE, a simple yet strong clustering-based defense baseline.
arXiv Detail & Related papers (2022-06-17T02:29:23Z) - Detection of Word Adversarial Examples in Text Classification: Benchmark
and Baseline via Robust Density Estimation [33.46393193123221]
We release a dataset for four popular attack methods on four datasets and four models.
We propose a competitive baseline based on density estimation that has the highest AUC on 29 out of 30 dataset-attack-model combinations.
arXiv Detail & Related papers (2022-03-03T12:32:59Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions.
To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles.
In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z) - TextDecepter: Hard Label Black Box Attack on Text Classifiers [0.0]
We present a novel approach for hard-label black-box attacks against Natural Language Processing (NLP) classifiers.
Such an attack scenario applies to real-world black-box models being used for security-sensitive applications such as sentiment analysis and toxic content detection.
arXiv Detail & Related papers (2020-08-16T08:57:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.