Related papers: Fighting Fire with Fire: Adversarial Prompting to Generate a Misinformation Detection Dataset

Fighting Fire with Fire: Adversarial Prompting to Generate a Misinformation Detection Dataset

URL: http://arxiv.org/abs/2401.04481v1
Date: Tue, 9 Jan 2024 10:38:13 GMT
Title: Fighting Fire with Fire: Adversarial Prompting to Generate a Misinformation Detection Dataset
Authors: Shrey Satapara, Parth Mehta, Debasis Ganguly, Sandip Modha
Abstract summary: We propose an LLM-based approach of creating silver-standard ground-truth datasets for identifying misinformation. Specifically speaking, given a trusted news article, our proposed approach involves prompting LLMs to automatically generate a summarised version of the original article. To investigate the usefulness of this dataset, we conduct a set of experiments where we train a range of supervised models for the task of misinformation detection.
Score: 10.860133543817659
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The recent success in language generation capabilities of large language models (LLMs), such as GPT, Bard, Llama etc., can potentially lead to concerns about their possible misuse in inducing mass agitation and communal hatred via generating fake news and spreading misinformation. Traditional means of developing a misinformation ground-truth dataset does not scale well because of the extensive manual effort required to annotate the data. In this paper, we propose an LLM-based approach of creating silver-standard ground-truth datasets for identifying misinformation. Specifically speaking, given a trusted news article, our proposed approach involves prompting LLMs to automatically generate a summarised version of the original article. The prompts in our proposed approach act as a controlling mechanism to generate specific types of factual incorrectness in the generated summaries, e.g., incorrect quantities, false attributions etc. To investigate the usefulness of this dataset, we conduct a set of experiments where we train a range of supervised models for the task of misinformation detection.

Related papers

From Fragments to Facts: A Curriculum-Driven DPO Approach for Generating Hindi News Veracity Explanations [27.17408568972408]
In an era of rampant misinformation, generating reliable news explanations is vital, especially for under-represented languages like Hindi.<n>We propose a novel framework integrating Direct Preference Optimization (DPO) with curriculum learning to align machine-generated explanations with human reasoning.<n>Experiments with LLMs (Mistral, Llama, Gemma) and PLMs (mBART, mT5) confirm the framework's effectiveness in generating coherent, contextually relevant explanations.
arXiv Detail & Related papers (2025-07-07T16:34:28Z)
Hey, That's My Data! Label-Only Dataset Inference in Large Language Models [63.35066172530291]
CatShift is a label-only dataset-inference framework.<n>It capitalizes on catastrophic forgetting: the tendency of an LLM to overwrite previously learned knowledge when exposed to new data.
arXiv Detail & Related papers (2025-06-06T13:02:59Z)
Generating Grounded Responses to Counter Misinformation via Learning Efficient Fine-Grained Critiques [9.514892000592912]
MisMitiFact is an efficient framework for generating fact-grounded counter-responses at scale.<n>We develop lightweight, fine-grained critique models trained on data sourced from readily available fact-checking sites.<n>It achieves 5x increase in feedback generation throughput, making it highly suitable for cost-effective, large-scale misinformation mitigation.
arXiv Detail & Related papers (2025-06-06T09:46:09Z)
Idiosyncrasies in Large Language Models [54.26923012617675]
We unveil and study idiosyncrasies in Large Language Models (LLMs) We find that fine-tuning existing text embedding models on LLM-generated texts yields excellent classification accuracy. We leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies.
arXiv Detail & Related papers (2025-02-17T18:59:02Z)
Fake News Detection After LLM Laundering: Measurement and Explanation [0.7661534297488013]
Large Language Models (LLMs) can generate highly convincing and contextually relevant fake news. This research measures the efficacy of detectors in identifying LLM-paraphrased fake news.
arXiv Detail & Related papers (2025-01-29T17:58:07Z)
Extracting Unlearned Information from LLMs with Activation Steering [46.16882599881247]
Unlearning has emerged as a solution to remove sensitive knowledge from models after training. We propose activation steering as a method for exact information retrieval from unlearned models. Our results demonstrate that exact information retrieval from unlearned models is possible, highlighting a severe vulnerability of current unlearning techniques.
arXiv Detail & Related papers (2024-11-04T21:42:56Z)
Catching Chameleons: Detecting Evolving Disinformation Generated using Large Language Models [16.408611714514976]
We propose DELD (Detecting Evolving LLM-generated Disinformation), a parameter-efficient approach that jointly leverages the general fact-checking capabilities of pre-trained language models. Our experiments show that textitDELD significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-06-26T00:21:39Z)
Are you still on track!? Catching LLM Task Drift with Activations [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users. We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set. We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z)
SPOT: Text Source Prediction from Originality Score Thresholding [6.790905400046194]
countermeasures aim at detecting misinformation, usually involve domain specific models trained to recognize the relevance of any information. Instead of evaluating the validity of the information, we propose to investigate LLM generated text from the perspective of trust.
arXiv Detail & Related papers (2024-05-30T21:51:01Z)
ExaRanker-Open: Synthetic Explanation for IR using Open-Source LLMs [60.81649785463651]
We introduce ExaRanker-Open, where we adapt and explore the use of open-source language models to generate explanations. Our findings reveal that incorporating explanations consistently enhances neural rankers, with benefits escalating as the LLM size increases.
arXiv Detail & Related papers (2024-02-09T11:23:14Z)
Generative Context-aware Fine-tuning of Self-supervised Speech Models [54.389711404209415]
We study the use of generative large language models (LLM) generated context information. We propose an approach to distill the generated information during fine-tuning of self-supervised speech models. We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks: automatic speech recognition, named entity recognition, and sentiment analysis.
arXiv Detail & Related papers (2023-12-15T15:46:02Z)
A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia [57.31074448586854]
Large language models (LLMs) have an impressive ability to draw on novel information supplied in their context. Yet the mechanisms underlying this contextual grounding remain unknown. We present a novel method to study grounding abilities using Fakepedia.
arXiv Detail & Related papers (2023-12-04T17:35:42Z)
Disinformation Capabilities of Large Language Models [0.564232659769944]
This paper presents a study of the disinformation capabilities of the current generation of large language models (LLMs) We evaluated the capabilities of 10 LLMs using 20 disinformation narratives. We conclude that LLMs are able to generate convincing news articles that agree with dangerous disinformation narratives.
arXiv Detail & Related papers (2023-11-15T10:25:30Z)
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks. We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate. We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z)
Mixture of Soft Prompts for Controllable Data Generation [21.84489422361048]
Mixture of Soft Prompts (MSP) is proposed as a tool for data augmentation rather than direct prediction. Our method achieves state-of-the-art results on three benchmarks when compared against strong baselines.
arXiv Detail & Related papers (2023-03-02T21:13:56Z)
Towards Fine-Grained Information: Identifying the Type and Location of Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type. We build an FG-TED model to predict the textbf addition and textbfomission errors. Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.