Adversarial Style Augmentation via Large Language Model for Robust Fake News Detection
- URL: http://arxiv.org/abs/2406.11260v2
- Date: Mon, 22 Jul 2024 11:56:44 GMT
- Title: Adversarial Style Augmentation via Large Language Model for Robust Fake News Detection
- Authors: Sungwon Park, Sungwon Han, Meeyoung Cha,
- Abstract summary: This study proposes adversarial style augmentation, AdStyle, to train a fake news detector.
Our model's key mechanism is the careful use of LLMs to automatically generate a diverse yet coherent range of style-conversion attack prompts.
Experiments show that our augmentation strategy improves robustness and detection performance when tested on fake news benchmark datasets.
- Score: 18.998947450697337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The spread of fake news negatively impacts individuals and is regarded as a significant social challenge that needs to be addressed. A number of algorithmic and insightful features have been identified for detecting fake news. However, with the recent LLMs and their advanced generation capabilities, many of the detectable features (e.g., style-conversion attacks) can be altered, making it more challenging to distinguish from real news. This study proposes adversarial style augmentation, AdStyle, to train a fake news detector that remains robust against various style-conversion attacks. Our model's key mechanism is the careful use of LLMs to automatically generate a diverse yet coherent range of style-conversion attack prompts. This improves the generation of prompts that are particularly difficult for the detector to handle. Experiments show that our augmentation strategy improves robustness and detection performance when tested on fake news benchmark datasets.
Related papers
- Exploring the Deceptive Power of LLM-Generated Fake News: A Study of Real-World Detection Challenges [21.425647152424585]
We propose a strong fake news attack method called conditional Variational-autoencoder-Like Prompt (VLPrompt)
Unlike current methods, VLPrompt eliminates the need for additional data collection while maintaining contextual coherence.
Our experiments, including various detection methods and novel human study metrics, were conducted to assess their performance on our dataset.
arXiv Detail & Related papers (2024-03-27T04:39:18Z) - Token-Level Adversarial Prompt Detection Based on Perplexity Measures
and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks.
This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs.
We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z) - Adapting Fake News Detection to the Era of Large Language Models [48.5847914481222]
We study the interplay between machine-(paraphrased) real news, machine-generated fake news, human-written fake news, and human-written real news.
Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
arXiv Detail & Related papers (2023-11-02T08:39:45Z) - FakeGPT: Fake News Generation, Explanation and Detection of Large Language Models [18.543917359268345]
ChatGPT has gained significant attention due to its exceptional natural language processing capabilities.
We employ four prompt methods to generate fake news samples and prove the high quality of these samples through both self-assessment and human evaluation.
We examine ChatGPT's capacity to identify fake news and propose a reason-aware prompt method to improve its performance.
arXiv Detail & Related papers (2023-10-08T07:01:07Z) - MGTBench: Benchmarking Machine-Generated Text Detection [54.81446366272403]
This paper proposes the first benchmark framework for MGT detection against powerful large language models (LLMs)
We show that a larger number of words in general leads to better performance and most detection methods can achieve similar performance with much fewer training samples.
Our findings indicate that the model-based detection methods still perform well in the text attribution task.
arXiv Detail & Related papers (2023-03-26T21:12:36Z) - Verifying the Robustness of Automatic Credibility Assessment [79.08422736721764]
Text classification methods have been widely investigated as a way to detect content of low credibility.
In some cases insignificant changes in input text can mislead the models.
We introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - A Multi-Policy Framework for Deep Learning-Based Fake News Detection [0.31498833540989407]
This work introduces Multi-Policy Statement Checker (MPSC), a framework that automates fake news detection.
MPSC uses deep learning techniques to analyze a statement itself and its related news articles, predicting whether it is seemingly credible or suspicious.
arXiv Detail & Related papers (2022-06-01T21:25:21Z) - "That Is a Suspicious Reaction!": Interpreting Logits Variation to
Detect NLP Adversarial Attacks [0.2999888908665659]
Adversarial attacks are a major challenge faced by current machine learning research.
Our work presents a model-agnostic detector of adversarial text examples.
arXiv Detail & Related papers (2022-04-10T09:24:41Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.