Detecting Propaganda Techniques in Code-Switched Social Media Text
- URL: http://arxiv.org/abs/2305.14534v2
- Date: Fri, 15 Mar 2024 18:26:48 GMT
- Title: Detecting Propaganda Techniques in Code-Switched Social Media Text
- Authors: Muhammad Umar Salman, Asif Hanif, Shady Shehata, Preslav Nakov,
- Abstract summary: We propose a novel task of detecting propaganda techniques in code-switched text.
We create a corpus of 1,030 texts code-switching between English and Roman Urdu, annotated with 20 propaganda techniques.
The code and the dataset are publicly available at https://github.com/mbzuai-nlp/propaganda-codeswitched-text.
- Score: 30.844812761448107
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Propaganda is a form of communication intended to influence the opinions and the mindset of the public to promote a particular agenda. With the rise of social media, propaganda has spread rapidly, leading to the need for automatic propaganda detection systems. Most work on propaganda detection has focused on high-resource languages, such as English, and little effort has been made to detect propaganda for low-resource languages. Yet, it is common to find a mix of multiple languages in social media communication, a phenomenon known as code-switching. Code-switching combines different languages within the same text, which poses a challenge for automatic systems. With this in mind, here we propose the novel task of detecting propaganda techniques in code-switched text. To support this task, we create a corpus of 1,030 texts code-switching between English and Roman Urdu, annotated with 20 propaganda techniques, which we make publicly available. We perform a number of experiments contrasting different experimental setups, and we find that it is important to model the multilinguality directly (rather than using translation) as well as to use the right fine-tuning strategy. The code and the dataset are publicly available at https://github.com/mbzuai-nlp/propaganda-codeswitched-text
Related papers
- MENTOR: Multilingual tExt detectioN TOward leaRning by analogy [59.37382045577384]
We propose a framework to detect and identify both seen and unseen language regions inside scene images.
"MENTOR" is the first work to realize a learning strategy between zero-shot learning and few-shot learning for multilingual scene text detection.
arXiv Detail & Related papers (2024-03-12T03:35:17Z) - Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda
Spans in News Articles [11.64165958410489]
We develop the largest propaganda dataset to date, comprised of 8K paragraphs from newspaper articles, labeled at the text span level following a taxonomy of 23 propagandistic techniques.
Our work offers the first attempt to understand the performance of large language models (LLMs), using GPT-4, for fine-grained propaganda detection from text.
Results showed that GPT-4's performance degrades as the task moves from simply classifying a paragraph as propagandistic or not, to the fine-grained task of detecting propaganda techniques and their manifestation in text.
arXiv Detail & Related papers (2024-02-27T13:02:19Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - Contextual Hate Speech Detection in Code Mixed Text using Transformer
Based Approaches [0.0]
We propose automated techniques for hate speech detection in code mixed text from Twitter.
While regular approaches analyze the text independently, we also make use of content text in the form of parent tweets.
We show that the dual-encoder approach using independent representations yields better performance.
arXiv Detail & Related papers (2021-10-18T14:05:36Z) - Detecting Propaganda Techniques in Memes [32.209606526323945]
We propose a new multi-label multimodal task: detecting the type of propaganda techniques used in memes.
We create and release a new corpus of 950 memes, carefully annotated with 22 propaganda techniques, which can appear in the text, in the image, or in both.
Our analysis of the corpus shows that understanding both modalities together is essential for detecting these techniques.
arXiv Detail & Related papers (2021-08-07T11:56:52Z) - Dataset of Propaganda Techniques of the State-Sponsored Information
Operation of the People's Republic of China [0.0]
This research aims to bridge the information gap by providing a multi-labeled propaganda techniques dataset in Mandarin based on a state-backed information operation dataset provided by Twitter.
In addition to presenting the dataset, we apply a multi-label text classification using fine-tuned BERT.
arXiv Detail & Related papers (2021-06-14T16:11:13Z) - Role of Artificial Intelligence in Detection of Hateful Speech for
Hinglish Data on Social Media [1.8899300124593648]
Prevalence of Hindi-English code-mixed data (Hinglish) is on the rise with most of the urban population all over the world.
Hate speech detection algorithms deployed by most social networking platforms are unable to filter out offensive and abusive content posted in these code-mixed languages.
We propose a methodology for efficient detection of unstructured code-mix Hinglish language.
arXiv Detail & Related papers (2021-05-11T10:02:28Z) - Cross-Domain Learning for Classifying Propaganda in Online Contents [67.10699378370752]
We present an approach to leverage cross-domain learning, based on labeled documents and sentences from news and tweets, as well as political speeches with a clear difference in their degrees of being propagandistic.
Our experiments demonstrate the usefulness of this approach, and identify difficulties and limitations in various configurations of sources and targets for the transfer step.
arXiv Detail & Related papers (2020-11-13T10:19:13Z) - LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for
Multi-Granular Propaganda Span Identification [70.1903083747775]
This paper describes our submission for the task of Propaganda Span Identification in news articles.
We introduce a BERT-BiLSTM based span-level propaganda classification model that identifies which token spans within the sentence are indicative of propaganda.
arXiv Detail & Related papers (2020-08-11T16:14:47Z) - Leveraging Declarative Knowledge in Text and First-Order Logic for
Fine-Grained Propaganda Detection [139.3415751957195]
We study the detection of propagandistic text fragments in news articles.
We introduce an approach to inject declarative knowledge of fine-grained propaganda techniques.
arXiv Detail & Related papers (2020-04-29T13:46:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.