Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda
Spans in News Articles
- URL: http://arxiv.org/abs/2402.17478v1
- Date: Tue, 27 Feb 2024 13:02:19 GMT
- Title: Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda
Spans in News Articles
- Authors: Maram Hasanain, Fatema Ahmed, Firoj Alam
- Abstract summary: We develop the largest propaganda dataset to date, comprised of 8K paragraphs from newspaper articles, labeled at the text span level following a taxonomy of 23 propagandistic techniques.
Our work offers the first attempt to understand the performance of large language models (LLMs), using GPT-4, for fine-grained propaganda detection from text.
Results showed that GPT-4's performance degrades as the task moves from simply classifying a paragraph as propagandistic or not, to the fine-grained task of detecting propaganda techniques and their manifestation in text.
- Score: 11.64165958410489
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The use of propaganda has spiked on mainstream and social media, aiming to
manipulate or mislead users. While efforts to automatically detect propaganda
techniques in textual, visual, or multimodal content have increased, most of
them primarily focus on English content. The majority of the recent initiatives
targeting medium to low-resource languages produced relatively small annotated
datasets, with a skewed distribution, posing challenges for the development of
sophisticated propaganda detection models. To address this challenge, we
carefully develop the largest propaganda dataset to date, ArPro, comprised of
8K paragraphs from newspaper articles, labeled at the text span level following
a taxonomy of 23 propagandistic techniques. Furthermore, our work offers the
first attempt to understand the performance of large language models (LLMs),
using GPT-4, for fine-grained propaganda detection from text. Results showed
that GPT-4's performance degrades as the task moves from simply classifying a
paragraph as propagandistic or not, to the fine-grained task of detecting
propaganda techniques and their manifestation in text. Compared to models
fine-tuned on the dataset for propaganda detection at different classification
granularities, GPT-4 is still far behind. Finally, we evaluate GPT-4 on a
dataset consisting of six other languages for span detection, and results
suggest that the model struggles with the task across languages. Our dataset
and resources will be released to the community.
Related papers
- Large Language Models for Propaganda Span Annotation [11.64165958410489]
We investigate whether large language models (LLMs), such as GPT-4, can effectively perform the task.
We use a large-scale in-house dataset consisting of annotations from human annotators with varying expertise levels.
We plan to make the collected span-level labels from multiple annotators, including GPT-4, available for the community.
arXiv Detail & Related papers (2023-11-16T11:37:54Z) - Large Language Models for Propaganda Detection [2.587450057509126]
This study investigates the effectiveness of Large Language Models (LLMs) for propaganda detection.
Five variations of GPT-3 and GPT-4 are employed, incorporating various prompt engineering and fine-tuning strategies.
Our findings demonstrate that GPT-4 achieves comparable results to the current state-of-the-art.
arXiv Detail & Related papers (2023-10-10T08:46:10Z) - Large Language Models for Multi-label Propaganda Detection [0.0]
We describe our approach for the WANLP 2022 shared task which handles the task of propaganda detection in a multi-label setting.
The task demands the model to label the given text as having one or more types of propaganda techniques.
We show that an ensemble of five models performs the best on the task, scoring a micro-F1 score of 59.73%.
arXiv Detail & Related papers (2022-10-15T06:47:31Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - Dataset of Propaganda Techniques of the State-Sponsored Information
Operation of the People's Republic of China [0.0]
This research aims to bridge the information gap by providing a multi-labeled propaganda techniques dataset in Mandarin based on a state-backed information operation dataset provided by Twitter.
In addition to presenting the dataset, we apply a multi-label text classification using fine-tuned BERT.
arXiv Detail & Related papers (2021-06-14T16:11:13Z) - Cross-Domain Learning for Classifying Propaganda in Online Contents [67.10699378370752]
We present an approach to leverage cross-domain learning, based on labeled documents and sentences from news and tweets, as well as political speeches with a clear difference in their degrees of being propagandistic.
Our experiments demonstrate the usefulness of this approach, and identify difficulties and limitations in various configurations of sources and targets for the transfer step.
arXiv Detail & Related papers (2020-11-13T10:19:13Z) - LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for
Multi-Granular Propaganda Span Identification [70.1903083747775]
This paper describes our submission for the task of Propaganda Span Identification in news articles.
We introduce a BERT-BiLSTM based span-level propaganda classification model that identifies which token spans within the sentence are indicative of propaganda.
arXiv Detail & Related papers (2020-08-11T16:14:47Z) - Investigating Pretrained Language Models for Graph-to-Text Generation [55.55151069694146]
Graph-to-text generation aims to generate fluent texts from graph-based data.
We present a study across three graph domains: meaning representations, Wikipedia knowledge graphs (KGs) and scientific KGs.
We show that the PLMs BART and T5 achieve new state-of-the-art results and that task-adaptive pretraining strategies improve their performance even further.
arXiv Detail & Related papers (2020-07-16T16:05:34Z) - BPGC at SemEval-2020 Task 11: Propaganda Detection in News Articles with
Multi-Granularity Knowledge Sharing and Linguistic Features based Ensemble
Learning [2.8913142991383114]
SemEval 2020 Task-11 aims to design automated systems for news propaganda detection.
Task-11 consists of two sub-tasks, namely, Span Identification and Technique Classification.
arXiv Detail & Related papers (2020-05-31T19:35:53Z) - Leveraging Declarative Knowledge in Text and First-Order Logic for
Fine-Grained Propaganda Detection [139.3415751957195]
We study the detection of propagandistic text fragments in news articles.
We introduce an approach to inject declarative knowledge of fine-grained propaganda techniques.
arXiv Detail & Related papers (2020-04-29T13:46:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.