Related papers: Large Language Models for Propaganda Span Annotation

Large Language Models for Propaganda Span Annotation

URL: http://arxiv.org/abs/2311.09812v2
Date: Sun, 14 Jan 2024 06:32:09 GMT
Title: Large Language Models for Propaganda Span Annotation
Authors: Maram Hasanain, Fatema Ahmed, Firoj Alam
Abstract summary: We investigate whether large language models (LLMs), such as GPT-4, can effectively perform the task. We use a large-scale in-house dataset consisting of annotations from human annotators with varying expertise levels. We plan to make the collected span-level labels from multiple annotators, including GPT-4, available for the community.
Score: 11.64165958410489
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The use of propagandistic techniques in online contents has increased in recent years aiming to manipulate online audiences. Efforts to automatically detect and debunk such content have been made addressing various modeling scenarios. These include determining whether the content (text, image, or multimodal) (i) is propagandistic, (ii) employs one or more propagandistic techniques, and (iii) includes techniques with identifiable spans. Significant research efforts have been devoted to the first two scenarios compared to the latter. Therefore, in this study, we focus on the task of detecting propagandistic textual spans. Specifically, we investigate whether large language models (LLMs), such as GPT-4, can effectively perform the task. Moreover, we study the potential of employing the model to collect more cost-effective annotations. Our experiments use a large-scale in-house dataset consisting of annotations from human annotators with varying expertise levels. The results suggest that providing more information to the model as prompts improves its performance compared to human annotations. Moreover, our work is the first to show the potential of utilizing LLMs to develop annotated datasets for this specific task, prompting it with annotations from human annotators with limited expertise. We plan to make the collected span-level labels from multiple annotators, including GPT-4, available for the community.

Related papers

Hybrid Annotation for Propaganda Detection: Integrating LLM Pre-Annotations with Human Intelligence [8.856227991149506]
This paper introduces a novel framework that combines human expertise with Large Language Model (LLM) assistance to improve both annotation consistency and scalability.<n>We propose a hierarchical taxonomy that organizes 14 fine-grained propaganda techniques into three broader categories.<n>We implement an LLM-assisted pre-annotation pipeline that extracts propagandistic spans, generates concise explanations, and assigns local labels as well as a global label.
arXiv Detail & Related papers (2025-07-24T12:16:52Z)
Large Language Models as Span Annotators [5.488183187190419]
span annotation can guide improvements and provide insights. Until recently, span annotation was limited to human annotators or fine-tuned encoder models. We show that large language models (LLMs) are straightforward to implement and notably more cost-efficient than human annotators.
arXiv Detail & Related papers (2025-04-11T17:04:51Z)
Improving Pinterest Search Relevance Using Large Language Models [15.24121687428178]
We integrate Large Language Models (LLMs) into our search relevance model. Our approach uses search queries alongside content representations that include captions extracted from a generative visual language model. We distill from the LLM-based model into real-time servable model architectures and features.
arXiv Detail & Related papers (2024-10-22T16:29:33Z)
CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning [4.004641316826348]
We introduce a novel language-image Contrastive Learning method with an Efficient large language model and prompt Fine-Tuning (CLEFT) Our method demonstrates state-of-the-art performance on multiple chest X-ray and mammography datasets. The proposed parameter efficient framework can reduce the total trainable model size by 39% and reduce the trainable language model to only 4% compared with the current BERT encoder.
arXiv Detail & Related papers (2024-07-30T17:57:32Z)
Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian [75.94354349994576]
This paper explores the feasibility of employing smaller, domain-specific encoder LMs alongside prompting techniques to enhance performance in specialized contexts. Our study concentrates on the Italian bureaucratic and legal language, experimenting with both general-purpose and further pre-trained encoder-only models. The results indicate that while further pre-trained models may show diminished robustness in general knowledge, they exhibit superior adaptability for domain-specific tasks, even in a zero-shot setting.
arXiv Detail & Related papers (2024-07-30T08:50:16Z)
GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text [1.2699007098398802]
This study codifies 22 rhetorical and linguistic features identified in literature related to the language of persuasion. RhetAnn, a web application, was specifically designed to minimize an otherwise considerable mental effort. A small set of annotated data was used to fine-tune GPT-3.5, a generative large language model (LLM), to annotate the remaining data.
arXiv Detail & Related papers (2024-07-16T15:15:39Z)
Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda Spans in News Articles [11.64165958410489]
We develop the largest propaganda dataset to date, comprised of 8K paragraphs from newspaper articles, labeled at the text span level following a taxonomy of 23 propagandistic techniques. Our work offers the first attempt to understand the performance of large language models (LLMs), using GPT-4, for fine-grained propaganda detection from text. Results showed that GPT-4's performance degrades as the task moves from simply classifying a paragraph as propagandistic or not, to the fine-grained task of detecting propaganda techniques and their manifestation in text.
arXiv Detail & Related papers (2024-02-27T13:02:19Z)
Exploring Large Language Model for Graph Data Understanding in Online Job Recommendations [63.19448893196642]
We present a novel framework that harnesses the rich contextual information and semantic representations provided by large language models to analyze behavior graphs. By leveraging this capability, our framework enables personalized and accurate job recommendations for individual users.
arXiv Detail & Related papers (2023-07-10T11:29:41Z)
Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners. We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting. Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z)
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks. We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate. We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z)
Using Large Language Models to Generate Engaging Captions for Data Visualizations [51.98253121636079]
Large language models (LLM) use sophisticated deep learning technology to produce human-like prose. Key challenge lies in designing the most effective prompt for the LLM, a task called prompt engineering. We report on first experiments using the popular LLM GPT-3 and deliver some promising results.
arXiv Detail & Related papers (2022-12-27T23:56:57Z)
Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model. In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.