Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda
Spans in News Articles
- URL: http://arxiv.org/abs/2402.17478v1
- Date: Tue, 27 Feb 2024 13:02:19 GMT
- Title: Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda
Spans in News Articles
- Authors: Maram Hasanain, Fatema Ahmed, Firoj Alam
- Abstract summary: We develop the largest propaganda dataset to date, comprised of 8K paragraphs from newspaper articles, labeled at the text span level following a taxonomy of 23 propagandistic techniques.
Our work offers the first attempt to understand the performance of large language models (LLMs), using GPT-4, for fine-grained propaganda detection from text.
Results showed that GPT-4's performance degrades as the task moves from simply classifying a paragraph as propagandistic or not, to the fine-grained task of detecting propaganda techniques and their manifestation in text.
- Score: 11.64165958410489
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The use of propaganda has spiked on mainstream and social media, aiming to
manipulate or mislead users. While efforts to automatically detect propaganda
techniques in textual, visual, or multimodal content have increased, most of
them primarily focus on English content. The majority of the recent initiatives
targeting medium to low-resource languages produced relatively small annotated
datasets, with a skewed distribution, posing challenges for the development of
sophisticated propaganda detection models. To address this challenge, we
carefully develop the largest propaganda dataset to date, ArPro, comprised of
8K paragraphs from newspaper articles, labeled at the text span level following
a taxonomy of 23 propagandistic techniques. Furthermore, our work offers the
first attempt to understand the performance of large language models (LLMs),
using GPT-4, for fine-grained propaganda detection from text. Results showed
that GPT-4's performance degrades as the task moves from simply classifying a
paragraph as propagandistic or not, to the fine-grained task of detecting
propaganda techniques and their manifestation in text. Compared to models
fine-tuned on the dataset for propaganda detection at different classification
granularities, GPT-4 is still far behind. Finally, we evaluate GPT-4 on a
dataset consisting of six other languages for span detection, and results
suggest that the model struggles with the task across languages. Our dataset
and resources will be released to the community.
Related papers
- PropaInsight: Toward Deeper Understanding of Propaganda in Terms of Techniques, Appeals, and Intent [71.20471076045916]
Propaganda plays a critical role in shaping public opinion and fueling disinformation.
Propainsight systematically dissects propaganda into techniques, arousal appeals, and underlying intent.
Propagaze combines human-annotated data with high-quality synthetic data.
arXiv Detail & Related papers (2024-09-19T06:28:18Z) - Large Language Models for Propaganda Span Annotation [10.358271919023903]
This study investigates whether Large Language Models, such as GPT-4, can effectively extract propagandistic spans.
The experiments are performed over a large-scale in-house manually annotated dataset.
arXiv Detail & Related papers (2023-11-16T11:37:54Z) - Large Language Models for Propaganda Detection [2.587450057509126]
This study investigates the effectiveness of Large Language Models (LLMs) for propaganda detection.
Five variations of GPT-3 and GPT-4 are employed, incorporating various prompt engineering and fine-tuning strategies.
Our findings demonstrate that GPT-4 achieves comparable results to the current state-of-the-art.
arXiv Detail & Related papers (2023-10-10T08:46:10Z) - Large Language Models for Multi-label Propaganda Detection [0.0]
We describe our approach for the WANLP 2022 shared task which handles the task of propaganda detection in a multi-label setting.
The task demands the model to label the given text as having one or more types of propaganda techniques.
We show that an ensemble of five models performs the best on the task, scoring a micro-F1 score of 59.73%.
arXiv Detail & Related papers (2022-10-15T06:47:31Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - Dataset of Propaganda Techniques of the State-Sponsored Information
Operation of the People's Republic of China [0.0]
This research aims to bridge the information gap by providing a multi-labeled propaganda techniques dataset in Mandarin based on a state-backed information operation dataset provided by Twitter.
In addition to presenting the dataset, we apply a multi-label text classification using fine-tuned BERT.
arXiv Detail & Related papers (2021-06-14T16:11:13Z) - Cross-Domain Learning for Classifying Propaganda in Online Contents [67.10699378370752]
We present an approach to leverage cross-domain learning, based on labeled documents and sentences from news and tweets, as well as political speeches with a clear difference in their degrees of being propagandistic.
Our experiments demonstrate the usefulness of this approach, and identify difficulties and limitations in various configurations of sources and targets for the transfer step.
arXiv Detail & Related papers (2020-11-13T10:19:13Z) - LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for
Multi-Granular Propaganda Span Identification [70.1903083747775]
This paper describes our submission for the task of Propaganda Span Identification in news articles.
We introduce a BERT-BiLSTM based span-level propaganda classification model that identifies which token spans within the sentence are indicative of propaganda.
arXiv Detail & Related papers (2020-08-11T16:14:47Z) - Leveraging Declarative Knowledge in Text and First-Order Logic for
Fine-Grained Propaganda Detection [139.3415751957195]
We study the detection of propagandistic text fragments in news articles.
We introduce an approach to inject declarative knowledge of fine-grained propaganda techniques.
arXiv Detail & Related papers (2020-04-29T13:46:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.