Robust Fake News Detection using Large Language Models under Adversarial Sentiment Attacks
- URL: http://arxiv.org/abs/2601.15277v1
- Date: Wed, 21 Jan 2026 18:56:49 GMT
- Title: Robust Fake News Detection using Large Language Models under Adversarial Sentiment Attacks
- Authors: Sahar Tahmasebi, Eric Müller-Budack, Ralph Ewerth,
- Abstract summary: AdSent is a sentiment-robust detection framework designed to ensure consistent predictions across both original and sentiment-altered news articles.<n>We show that changing the sentiment heavily impacts the performance of fake news detection models.<n>We introduce a novel sentiment-agnostic training strategy that enhances robustness against such perturbations.
- Score: 7.075749925221166
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Misinformation and fake news have become a pressing societal challenge, driving the need for reliable automated detection methods. Prior research has highlighted sentiment as an important signal in fake news detection, either by analyzing which sentiments are associated with fake news or by using sentiment and emotion features for classification. However, this poses a vulnerability since adversaries can manipulate sentiment to evade detectors especially with the advent of large language models (LLMs). A few studies have explored adversarial samples generated by LLMs, but they mainly focus on stylistic features such as writing style of news publishers. Thus, the crucial vulnerability of sentiment manipulation remains largely unexplored. In this paper, we investigate the robustness of state-of-the-art fake news detectors under sentiment manipulation. We introduce AdSent, a sentiment-robust detection framework designed to ensure consistent veracity predictions across both original and sentiment-altered news articles. Specifically, we (1) propose controlled sentiment-based adversarial attacks using LLMs, (2) analyze the impact of sentiment shifts on detection performance. We show that changing the sentiment heavily impacts the performance of fake news detection models, indicating biases towards neutral articles being real, while non-neutral articles are often classified as fake content. (3) We introduce a novel sentiment-agnostic training strategy that enhances robustness against such perturbations. Extensive experiments on three benchmark datasets demonstrate that AdSent significantly outperforms competitive baselines in both accuracy and robustness, while also generalizing effectively to unseen datasets and adversarial scenarios.
Related papers
- FactGuard: Event-Centric and Commonsense-Guided Fake News Detection [9.397476786006111]
Large language models (LLMs) are an untapped goldmine for fake news detection.<n>We propose a novel fake news detection framework, dubbed FactGuard, that leverages LLMs to extract event-centric content.<n>Our approach consistently outperforms existing methods in both robustness and accuracy.
arXiv Detail & Related papers (2025-11-13T13:11:42Z) - Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges [52.96987928118327]
We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to content injection attacks.<n>We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively "relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance.<n>Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material.
arXiv Detail & Related papers (2025-01-30T18:02:15Z) - Fake News Detection and Manipulation Reasoning via Large Vision-Language Models [38.457805116130004]
This paper introduces a benchmark for fake news detection and manipulation reasoning, referred to as Human-centric and Fact-related Fake News (HFFN)
The benchmark highlights the centrality of human and the high factual relevance, with detailed manual annotations.
A Multi-modal news Detection and Reasoning langUage Model (M-DRUM) is presented not only to judge on the authenticity of multi-modal news, but also raise analytical reasoning about potential manipulations.
arXiv Detail & Related papers (2024-07-02T08:16:43Z) - Adversarial Style Augmentation via Large Language Model for Robust Fake News Detection [48.545082903061136]
This study proposes adversarial style augmentation, AdStyle, designed to train a fake news detector.<n>The primary mechanism involves the strategic use of LLMs to automatically generate a diverse and coherent array of style-conversion attack prompts.<n>Experiments indicate that our augmentation strategy significantly improves robustness and detection performance when evaluated on fake news benchmark datasets.
arXiv Detail & Related papers (2024-06-17T07:00:41Z) - Adapting Fake News Detection to the Era of Large Language Models [48.5847914481222]
We study the interplay between machine-(paraphrased) real news, machine-generated fake news, human-written fake news, and human-written real news.
Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
arXiv Detail & Related papers (2023-11-02T08:39:45Z) - Prompt-and-Align: Prompt-Based Social Alignment for Few-Shot Fake News
Detection [50.07850264495737]
"Prompt-and-Align" (P&A) is a novel prompt-based paradigm for few-shot fake news detection.
We show that P&A sets new states-of-the-art for few-shot fake news detection performance by significant margins.
arXiv Detail & Related papers (2023-09-28T13:19:43Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - An Adversarial Benchmark for Fake News Detection Models [0.065268245109828]
We formulate adversarial attacks that target three aspects of "understanding"
We test our benchmark using BERT classifiers fine-tuned on the LIAR arXiv:arch-ive/1705648 and Kaggle Fake-News datasets.
arXiv Detail & Related papers (2022-01-03T23:51:55Z) - Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions.
To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles.
In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.