Real-time Factuality Assessment from Adversarial Feedback
- URL: http://arxiv.org/abs/2410.14651v3
- Date: Sun, 27 Jul 2025 14:23:31 GMT
- Title: Real-time Factuality Assessment from Adversarial Feedback
- Authors: Sanxing Chen, Yukun Huang, Bhuwan Dhingra,
- Abstract summary: We show that evaluations for assessing the factuality of news from conventional sources result in high accuracies over time for LLM-based detectors.<n>We argue that a proper factuality evaluation dataset should test a model's ability to reason about current events by retrieving and reading related evidence.
- Score: 11.742257531343814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show that existing evaluations for assessing the factuality of news from conventional sources, such as claims on fact-checking websites, result in high accuracies over time for LLM-based detectors-even after their knowledge cutoffs. This suggests that recent popular false information from such sources can be easily identified due to its likely presence in pre-training/retrieval corpora or the emergence of salient, yet shallow, patterns in these datasets. Instead, we argue that a proper factuality evaluation dataset should test a model's ability to reason about current events by retrieving and reading related evidence. To this end, we develop a novel pipeline that leverages natural language feedback from a RAG-based detector to iteratively modify real-time news into deceptive variants that challenge LLMs. Our iterative rewrite decreases the binary classification ROC-AUC by an absolute 17.5 percent for a strong RAG-based GPT-4o detector. Our experiments reveal the important role of RAG in both evaluating and generating challenging news examples, as retrieval-free LLM detectors are vulnerable to unseen events and adversarial attacks, while feedback from RAG-based evaluation helps discover more deceitful patterns.
Related papers
- RAG-Zeval: Towards Robust and Interpretable Evaluation on RAG Responses through End-to-End Rule-Guided Reasoning [64.46921169261852]
RAG-Zeval is a novel end-to-end framework that formulates faithfulness and correctness evaluation as a rule-guided reasoning task.<n>Our approach trains evaluators with reinforcement learning, facilitating compact models to generate comprehensive and sound assessments.<n>Experiments demonstrate RAG-Zeval's superior performance, achieving the strongest correlation with human judgments.
arXiv Detail & Related papers (2025-05-28T14:55:33Z) - Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation [108.13261761812517]
We introduce FRANQ (Faithfulness-based Retrieval Augmented UNcertainty Quantification), a novel method for hallucination detection in RAG outputs.<n>We present a new long-form Question Answering (QA) dataset annotated for both factuality and faithfulness.
arXiv Detail & Related papers (2025-05-27T11:56:59Z) - Worse than Zero-shot? A Fact-Checking Dataset for Evaluating the Robustness of RAG Against Misleading Retrievals [3.9139847342664864]
We introduce RAGuard, a fact-checking dataset designed to evaluate the robustness of RAG systems against misleading retrievals.<n> RAGuard categorizes retrieved evidence into three types: supporting, misleading, and irrelevant.<n>Our benchmark experiments reveal that when exposed to misleading retrievals, all tested LLM-powered RAG systems perform worse than their zero-shot baselines.
arXiv Detail & Related papers (2025-02-22T05:50:15Z) - Fake News Detection After LLM Laundering: Measurement and Explanation [0.7661534297488013]
Large Language Models (LLMs) can generate highly convincing and contextually relevant fake news.
This research measures the efficacy of detectors in identifying LLM-paraphrased fake news.
arXiv Detail & Related papers (2025-01-29T17:58:07Z) - Time-Reversal Provides Unsupervised Feedback to LLMs [31.575024356581846]
Time Reversed Language Models (TRLMs) can score and generate queries when conditioned on responses.<n>We show that TRLM scoring outperforms conventional forward scoring of response given query.
arXiv Detail & Related papers (2024-12-03T17:54:12Z) - RevPRAG: Revealing Poisoning Attacks in Retrieval-Augmented Generation through LLM Activation Analysis [3.706288937295861]
RevPRAG is a flexible and automated detection pipeline that leverages the activations of LLMs for poisoned response detection.<n>Our results on multiple benchmark datasets and RAG architectures show our approach could achieve 98% true positive rate, while maintaining false positive rates close to 1%.
arXiv Detail & Related papers (2024-11-28T06:29:46Z) - Toward Robust RALMs: Revealing the Impact of Imperfect Retrieval on Retrieval-Augmented Language Models [5.10832476049103]
We identify three common scenarios-unanswerable, adversarial, conflicting-where retrieved document sets can confuse RALM with plausible real-world examples.
We propose a new adversarial attack method, Generative model-based ADVersarial attack (GenADV) and a novel metric Robustness under Additional Document (RAD)
Our findings reveal that RALMs often fail to identify the unanswerability or contradiction of a document set, which frequently leads to hallucinations.
arXiv Detail & Related papers (2024-10-19T13:40:33Z) - Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework [77.45983464131977]
We focus on how likely it is that a RAG model's prediction is incorrect, resulting in uncontrollable risks in real-world applications.<n>Our research identifies two critical latent factors affecting RAG's confidence in its predictions.<n>We develop a counterfactual prompting framework that induces the models to alter these factors and analyzes the effect on their answers.
arXiv Detail & Related papers (2024-09-24T14:52:14Z) - RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [66.93260816493553]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios.<n>With a focus on factual accuracy, we propose three novel metrics: Completeness, Hallucination, and Irrelevance.<n> Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z) - Detect, Investigate, Judge and Determine: A Knowledge-guided Framework for Few-shot Fake News Detection [50.079690200471454]
Few-Shot Fake News Detection (FS-FND) aims to distinguish inaccurate news from real ones in extremely low-resource scenarios.
This task has garnered increased attention due to the widespread dissemination and harmful impact of fake news on social media.
We propose a Dual-perspective Knowledge-guided Fake News Detection (DKFND) model, designed to enhance LLMs from both inside and outside perspectives.
arXiv Detail & Related papers (2024-07-12T03:15:01Z) - Exploring the Deceptive Power of LLM-Generated Fake News: A Study of Real-World Detection Challenges [21.425647152424585]
We propose a strong fake news attack method called conditional Variational-autoencoder-Like Prompt (VLPrompt)
Unlike current methods, VLPrompt eliminates the need for additional data collection while maintaining contextual coherence.
Our experiments, including various detection methods and novel human study metrics, were conducted to assess their performance on our dataset.
arXiv Detail & Related papers (2024-03-27T04:39:18Z) - Adapting Fake News Detection to the Era of Large Language Models [48.5847914481222]
We study the interplay between machine-(paraphrased) real news, machine-generated fake news, human-written fake news, and human-written real news.
Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
arXiv Detail & Related papers (2023-11-02T08:39:45Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z) - Prompt-and-Align: Prompt-Based Social Alignment for Few-Shot Fake News
Detection [50.07850264495737]
"Prompt-and-Align" (P&A) is a novel prompt-based paradigm for few-shot fake news detection.
We show that P&A sets new states-of-the-art for few-shot fake news detection performance by significant margins.
arXiv Detail & Related papers (2023-09-28T13:19:43Z) - Fake News Detectors are Biased against Texts Generated by Large Language
Models [39.36284616311687]
The spread of fake news has emerged as a critical challenge, undermining trust and posing threats to society.
We present a novel paradigm to evaluate fake news detectors in scenarios involving both human-written and LLM-generated misinformation.
arXiv Detail & Related papers (2023-09-15T18:04:40Z) - Benchmarking Large Language Models in Retrieval-Augmented Generation [53.504471079548]
We systematically investigate the impact of Retrieval-Augmented Generation on large language models.
We analyze the performance of different large language models in 4 fundamental abilities required for RAG.
We establish Retrieval-Augmented Generation Benchmark (RGB), a new corpus for RAG evaluation in both English and Chinese.
arXiv Detail & Related papers (2023-09-04T08:28:44Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - A Multi-Policy Framework for Deep Learning-Based Fake News Detection [0.31498833540989407]
This work introduces Multi-Policy Statement Checker (MPSC), a framework that automates fake news detection.
MPSC uses deep learning techniques to analyze a statement itself and its related news articles, predicting whether it is seemingly credible or suspicious.
arXiv Detail & Related papers (2022-06-01T21:25:21Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - User Preference-aware Fake News Detection [61.86175081368782]
Existing fake news detection algorithms focus on mining news content for deceptive signals.
We propose a new framework, UPFD, which simultaneously captures various signals from user preferences by joint content and graph modeling.
arXiv Detail & Related papers (2021-04-25T21:19:24Z) - Connecting the Dots Between Fact Verification and Fake News Detection [21.564628184287173]
We propose a simple yet effective approach to connect the dots between fact verification and fake news detection.
Our approach makes use of the recent success of fact verification models and enables zero-shot fake news detection.
arXiv Detail & Related papers (2020-10-11T09:28:52Z) - Leveraging Multi-Source Weak Social Supervision for Early Detection of
Fake News [67.53424807783414]
Social media has greatly enabled people to participate in online activities at an unprecedented rate.
This unrestricted access also exacerbates the spread of misinformation and fake news online which might cause confusion and chaos unless being detected early for its mitigation.
We jointly leverage the limited amount of clean data along with weak signals from social engagements to train deep neural networks in a meta-learning framework to estimate the quality of different weak instances.
Experiments on realworld datasets demonstrate that the proposed framework outperforms state-of-the-art baselines for early detection of fake news without using any user engagements at prediction time.
arXiv Detail & Related papers (2020-04-03T18:26:33Z) - Weak Supervision for Fake News Detection via Reinforcement Learning [34.448503443582396]
We propose a weakly-supervised fake news detection framework, i.e., WeFEND.
The proposed framework consists of three main components: the annotator, the reinforced selector and the fake news detector.
We tested the proposed framework on a large collection of news articles published via WeChat official accounts and associated user reports.
arXiv Detail & Related papers (2019-12-28T21:20:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.