Related papers: Assessing the Effectiveness of GPT-3 in Detecting False Political Statements: A Case Study on the LIAR Dataset

Assessing the Effectiveness of GPT-3 in Detecting False Political Statements: A Case Study on the LIAR Dataset

URL: http://arxiv.org/abs/2306.08190v1
Date: Wed, 14 Jun 2023 01:16:49 GMT
Title: Assessing the Effectiveness of GPT-3 in Detecting False Political Statements: A Case Study on the LIAR Dataset
Authors: Mars Gokturk Buchholz
Abstract summary: The detection of political fake statements is crucial for maintaining information integrity and preventing the spread of misinformation in society. Historically, state-of-the-art machine learning models employed various methods for detecting deceptive statements. Recent advancements in large language models, such as GPT-3, have achieved state-of-the-art performance on a wide range of tasks.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The detection of political fake statements is crucial for maintaining information integrity and preventing the spread of misinformation in society. Historically, state-of-the-art machine learning models employed various methods for detecting deceptive statements. These methods include the use of metadata (W. Wang et al., 2018), n-grams analysis (Singh et al., 2021), and linguistic (Wu et al., 2022) and stylometric (Islam et al., 2020) features. Recent advancements in large language models, such as GPT-3 (Brown et al., 2020) have achieved state-of-the-art performance on a wide range of tasks. In this study, we conducted experiments with GPT-3 on the LIAR dataset (W. Wang et al., 2018) and achieved higher accuracy than state-of-the-art models without using any additional meta or linguistic features. Additionally, we experimented with zero-shot learning using a carefully designed prompt and achieved near state-of-the-art performance. An advantage of this approach is that the model provided evidence for its decision, which adds transparency to the model's decision-making and offers a chance for users to verify the validity of the evidence provided.

Related papers

Instance-Level Data-Use Auditing of Visual ML Models [49.862257986549885]
Growing trend of legal disputes over the unauthorized use of data in machine learning (ML) systems highlights the need for reliable data-use auditing mechanisms.<n>We present the first proactive, instance-level, data-use auditing method designed to enable data owners to audit the use of their individual data instances in ML models.
arXiv Detail & Related papers (2025-03-28T13:28:57Z)
Navigating Nuance: In Quest for Political Truth [1.4127714091330967]
We evaluate the performance of the Llama-3 (70B) language model on the Media Bias Identification Benchmark (MBIB) Our findings underscore the challenges of detecting political bias and highlight the potential of transfer learning methods to enhance future models.
arXiv Detail & Related papers (2025-01-01T09:24:47Z)
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation [31.61985215677114]
We conducted extensive experiments on domain adaptation of the Meta-Llama-3-70B-Instruct model on SEC data. Our focus included continual pre-training (CPT) and model merging, aiming to enhance the model's domain-specific capabilities. This is a preprint technical report with thorough evaluations to understand the entire process.
arXiv Detail & Related papers (2024-06-21T08:29:31Z)
Investigating Persuasion Techniques in Arabic: An Empirical Study Leveraging Large Language Models [0.13980986259786224]
This paper presents a comprehensive empirical study focused on identifying persuasive techniques in Arabic social media content. We utilize Pre-trained Language Models (PLMs) and leverage the ArAlEval dataset. Our study explores three different learning approaches by harnessing the power of PLMs.
arXiv Detail & Related papers (2024-05-21T15:55:09Z)
CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection [3.849401956130233]
We explore the effectiveness of pre-trained vision-language models (VLMs) when paired with recent adaptation methods for universal deepfake detection. We employ only a single dataset (ProGAN) in order to adapt CLIP for deepfake detection. The simple and lightweight Prompt Tuning based adaptation strategy outperforms the previous SOTA approach by 5.01% mAP and 6.61% accuracy.
arXiv Detail & Related papers (2024-02-20T11:26:42Z)
Black-Box Analysis: GPTs Across Time in Legal Textual Entailment Task [17.25356594832692]
We present an analysis of GPT-3.5 (ChatGPT) and GPT-4 performances on COLIEE Task 4 dataset. Our preliminary experimental results unveil intriguing insights into the models' strengths and weaknesses in handling legal textual entailment tasks.
arXiv Detail & Related papers (2023-09-11T14:43:54Z)
Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world. We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique. By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z)
Selective In-Context Data Augmentation for Intent Detection using Pointwise V-Information [100.03188187735624]
We introduce a novel approach based on PLMs and pointwise V-information (PVI), a metric that can measure the usefulness of a datapoint for training a model. Our method first fine-tunes a PLM on a small seed of training data and then synthesizes new datapoints - utterances that correspond to given intents. Our method is thus able to leverage the expressive power of large language models to produce diverse training data.
arXiv Detail & Related papers (2023-02-10T07:37:49Z)
Metadata Might Make Language Models Better [1.7100280218774935]
Using 19th-century newspapers as a case study, we compare different strategies for inserting temporal, political and geographical information into a Masked Language Model. We find that showing relevant metadata to a language model has a beneficial impact and may even produce more robust and fairer models.
arXiv Detail & Related papers (2022-11-18T08:29:00Z)
An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models. We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z)
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models [102.63817106363597]
We build ELEVATER, the first benchmark to compare and evaluate pre-trained language-augmented visual models. It consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge. We will release our toolkit and evaluation platforms for the research community.
arXiv Detail & Related papers (2022-04-19T10:23:42Z)
AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models. Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models. We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z)
Artificial Text Detection via Examining the Topology of Attention Maps [58.46367297712477]
We propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA) We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10% on three common datasets. The probing analysis of the features reveals their sensitivity to the surface and syntactic properties.
arXiv Detail & Related papers (2021-09-10T12:13:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.