Related papers: Are Large Language Models Good at Detecting Propaganda?

Are Large Language Models Good at Detecting Propaganda?

URL: http://arxiv.org/abs/2505.13706v1
Date: Mon, 19 May 2025 20:11:13 GMT
Title: Are Large Language Models Good at Detecting Propaganda?
Authors: Julia Jose, Rachel Greenstadt,
Abstract summary: Propagandists use rhetorical devices that rely on logical fallacies and emotional appeals to advance their agendas.<n>Recent advances in Natural Language Processing have enabled the development of systems capable of detecting manipulative content.
Score: 2.927159756213616
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Propagandists use rhetorical devices that rely on logical fallacies and emotional appeals to advance their agendas. Recognizing these techniques is key to making informed decisions. Recent advances in Natural Language Processing (NLP) have enabled the development of systems capable of detecting manipulative content. In this study, we look at several Large Language Models and their performance in detecting propaganda techniques in news articles. We compare the performance of these LLMs with transformer-based models. We find that, while GPT-4 demonstrates superior F1 scores (F1=0.16) compared to GPT-3.5 and Claude 3 Opus, it does not outperform a RoBERTa-CRF baseline (F1=0.67). Additionally, we find that all three LLMs outperform a MultiGranularity Network (MGN) baseline in detecting instances of one out of six propaganda techniques (name-calling), with GPT-3.5 and GPT-4 also outperforming the MGN baseline in detecting instances of appeal to fear and flag-waving.

Related papers

GIVE: Structured Reasoning of Large Language Models with Knowledge Graph Inspired Veracity Extrapolation [108.2008975785364]
Graph Inspired Veracity Extrapolation (GIVE) is a novel reasoning method that merges parametric and non-parametric memories to improve accurate reasoning with minimal external input.<n>GIVE guides the LLM agent to select the most pertinent expert data (observe), engage in query-specific divergent thinking (reflect), and then synthesize this information to produce the final output (speak)
arXiv Detail & Related papers (2024-10-11T03:05:06Z)
Large Language Model for Vulnerability Detection: Emerging Results and Future Directions [15.981132063061661]
Previous learning-based vulnerability detection methods relied on either medium-sized pre-trained models or smaller neural networks from scratch. Recent advancements in Large Pre-Trained Language Models (LLMs) have showcased remarkable few-shot learning capabilities in various tasks.
arXiv Detail & Related papers (2024-01-27T17:39:36Z)
Comparing GPT-4 and Open-Source Language Models in Misinformation Mitigation [6.929834518749884]
GPT-4 is known to be strong in this domain, but it is closed source, potentially expensive, and can show instability between different versions. We show that Zephyr-7b presents a consistently viable alternative, overcoming key limitations of commonly used approaches. We then highlight how GPT-3.5 exhibits unstable performance, such that this very widely used model could provide misleading results in misinformation detection.
arXiv Detail & Related papers (2024-01-12T22:27:25Z)
Fighting Fire with Fire: The Dual Role of LLMs in Crafting and Detecting Elusive Disinformation [7.782551258221384]
Recent ubiquity and disruptive impacts of large language models (LLMs) have raised concerns about their potential to be misused. We propose a novel "Fighting Fire with Fire" (F3) strategy that harnesses modern LLMs' generative and emergent reasoning capabilities. In our experiments, GPT-3.5-turbo consistently achieved accuracy at 68-72%, unlike the decline observed in previous customized and fine-tuned disinformation detectors.
arXiv Detail & Related papers (2023-10-24T04:50:29Z)
Large Language Models for Propaganda Detection [2.587450057509126]
This study investigates the effectiveness of Large Language Models (LLMs) for propaganda detection. Five variations of GPT-3 and GPT-4 are employed, incorporating various prompt engineering and fine-tuning strategies. Our findings demonstrate that GPT-4 achieves comparable results to the current state-of-the-art.
arXiv Detail & Related papers (2023-10-10T08:46:10Z)
RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models [56.51705482912727]
We present RankVicuna, the first fully open-source LLM capable of performing high-quality listwise reranking in a zero-shot setting. Experimental results on the TREC 2019 and 2020 Deep Learning Tracks show that we can achieve effectiveness comparable to zero-shot reranking with GPT-3.5 with a much smaller 7B parameter model, although our effectiveness remains slightly behind reranking with GPT-4.
arXiv Detail & Related papers (2023-09-26T17:31:57Z)
Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias [57.42417061979399]
Recent studies show that instruction tuning (IT) and reinforcement learning from human feedback (RLHF) improve the abilities of large language models (LMs) dramatically. In this work, we investigate the effect of IT and RLHF on decision making and reasoning in LMs. Our findings highlight the presence of these biases in various models from the GPT-3, Mistral, and T5 families.
arXiv Detail & Related papers (2023-08-01T01:39:25Z)
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents [53.78782375511531]
Large Language Models (LLMs) have demonstrated remarkable zero-shot generalization across various language-related tasks.<n>This paper investigates generative LLMs for relevance ranking in Information Retrieval (IR)<n>To address concerns about data contamination of LLMs, we collect a new test set called NovelEval.<n>To improve efficiency in real-world applications, we delve into the potential for distilling the ranking capabilities of ChatGPT into small specialized models.
arXiv Detail & Related papers (2023-04-19T10:16:03Z)
Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks. This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z)
Prompting GPT-3 To Be Reliable [117.23966502293796]
This work decomposes reliability into four facets: generalizability, fairness, calibration, and factuality. We find that GPT-3 outperforms smaller-scale supervised models by large margins on all these facets.
arXiv Detail & Related papers (2022-10-17T14:52:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.