Fighting Fire with Fire: Adversarial Prompting to Generate a
Misinformation Detection Dataset
- URL: http://arxiv.org/abs/2401.04481v1
- Date: Tue, 9 Jan 2024 10:38:13 GMT
- Title: Fighting Fire with Fire: Adversarial Prompting to Generate a
Misinformation Detection Dataset
- Authors: Shrey Satapara, Parth Mehta, Debasis Ganguly, Sandip Modha
- Abstract summary: We propose an LLM-based approach of creating silver-standard ground-truth datasets for identifying misinformation.
Specifically speaking, given a trusted news article, our proposed approach involves prompting LLMs to automatically generate a summarised version of the original article.
To investigate the usefulness of this dataset, we conduct a set of experiments where we train a range of supervised models for the task of misinformation detection.
- Score: 10.860133543817659
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent success in language generation capabilities of large language
models (LLMs), such as GPT, Bard, Llama etc., can potentially lead to concerns
about their possible misuse in inducing mass agitation and communal hatred via
generating fake news and spreading misinformation. Traditional means of
developing a misinformation ground-truth dataset does not scale well because of
the extensive manual effort required to annotate the data. In this paper, we
propose an LLM-based approach of creating silver-standard ground-truth datasets
for identifying misinformation. Specifically speaking, given a trusted news
article, our proposed approach involves prompting LLMs to automatically
generate a summarised version of the original article. The prompts in our
proposed approach act as a controlling mechanism to generate specific types of
factual incorrectness in the generated summaries, e.g., incorrect quantities,
false attributions etc. To investigate the usefulness of this dataset, we
conduct a set of experiments where we train a range of supervised models for
the task of misinformation detection.
Related papers
- Catching Chameleons: Detecting Evolving Disinformation Generated using Large Language Models [16.408611714514976]
We propose DELD (Detecting Evolving LLM-generated Disinformation), a parameter-efficient approach that jointly leverages the general fact-checking capabilities of pre-trained language models.
Our experiments show that textitDELD significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-06-26T00:21:39Z) - SPOT: Text Source Prediction from Originality Score Thresholding [6.790905400046194]
countermeasures aim at detecting misinformation, usually involve domain specific models trained to recognize the relevance of any information.
Instead of evaluating the validity of the information, we propose to investigate LLM generated text from the perspective of trust.
arXiv Detail & Related papers (2024-05-30T21:51:01Z) - ExaRanker-Open: Synthetic Explanation for IR using Open-Source LLMs [60.81649785463651]
We introduce ExaRanker-Open, where we adapt and explore the use of open-source language models to generate explanations.
Our findings reveal that incorporating explanations consistently enhances neural rankers, with benefits escalating as the LLM size increases.
arXiv Detail & Related papers (2024-02-09T11:23:14Z) - Generative Context-aware Fine-tuning of Self-supervised Speech Models [54.389711404209415]
We study the use of generative large language models (LLM) generated context information.
We propose an approach to distill the generated information during fine-tuning of self-supervised speech models.
We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks: automatic speech recognition, named entity recognition, and sentiment analysis.
arXiv Detail & Related papers (2023-12-15T15:46:02Z) - A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia [57.31074448586854]
Large language models (LLMs) have an impressive ability to draw on novel information supplied in their context.
Yet the mechanisms underlying this contextual grounding remain unknown.
We present a novel method to study grounding abilities using Fakepedia.
arXiv Detail & Related papers (2023-12-04T17:35:42Z) - Disinformation Capabilities of Large Language Models [0.564232659769944]
This paper presents a study of the disinformation capabilities of the current generation of large language models (LLMs)
We evaluated the capabilities of 10 LLMs using 20 disinformation narratives.
We conclude that LLMs are able to generate convincing news articles that agree with dangerous disinformation narratives.
arXiv Detail & Related papers (2023-11-15T10:25:30Z) - Enhancing LLM with Evolutionary Fine Tuning for News Summary Generation [2.1828601975620257]
We propose a new paradigm for news summary generation using LLM with powerful natural language understanding and generative capabilities.
We use LLM to extract multiple structured event patterns from the events contained in news paragraphs, evolve the event pattern population with genetic algorithm, and select the most adaptive event pattern to input into the LLM to generate news summaries.
A News Summary Generator (NSG) is designed to select and evolve the event pattern populations and generate news summaries.
arXiv Detail & Related papers (2023-07-06T08:13:53Z) - On the Risk of Misinformation Pollution with Large Language Models [127.1107824751703]
We investigate the potential misuse of modern Large Language Models (LLMs) for generating credible-sounding misinformation.
Our study reveals that LLMs can act as effective misinformation generators, leading to a significant degradation in the performance of Open-Domain Question Answering (ODQA) systems.
arXiv Detail & Related papers (2023-05-23T04:10:26Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z) - Mixture of Soft Prompts for Controllable Data Generation [21.84489422361048]
Mixture of Soft Prompts (MSP) is proposed as a tool for data augmentation rather than direct prediction.
Our method achieves state-of-the-art results on three benchmarks when compared against strong baselines.
arXiv Detail & Related papers (2023-03-02T21:13:56Z) - Towards Fine-Grained Information: Identifying the Type and Location of
Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type.
We build an FG-TED model to predict the textbf addition and textbfomission errors.
Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.