Related papers: GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text

GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text

URL: http://arxiv.org/abs/2407.11827v1
Date: Tue, 16 Jul 2024 15:15:39 GMT
Title: GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text
Authors: Kyle Hamilton, Luca Longo, Bojan Bozic,
Abstract summary: This study codifies 22 rhetorical and linguistic features identified in literature related to the language of persuasion. RhetAnn, a web application, was specifically designed to minimize an otherwise considerable mental effort. A small set of annotated data was used to fine-tune GPT-3.5, a generative large language model (LLM), to annotate the remaining data.
Score: 1.2699007098398802
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While the use of machine learning for the detection of propaganda techniques in text has garnered considerable attention, most approaches focus on "black-box" solutions with opaque inner workings. Interpretable approaches provide a solution, however, they depend on careful feature engineering and costly expert annotated data. Additionally, language features specific to propagandistic text are generally the focus of rhetoricians or linguists, and there is no data set labeled with such features suitable for machine learning. This study codifies 22 rhetorical and linguistic features identified in literature related to the language of persuasion for the purpose of annotating an existing data set labeled with propaganda techniques. To help human experts annotate natural language sentences with these features, RhetAnn, a web application, was specifically designed to minimize an otherwise considerable mental effort. Finally, a small set of annotated data was used to fine-tune GPT-3.5, a generative large language model (LLM), to annotate the remaining data while optimizing for financial cost and classification accuracy. This study demonstrates how combining a small number of human annotated examples with GPT can be an effective strategy for scaling the annotation process at a fraction of the cost of traditional annotation relying solely on human experts. The results are on par with the best performing model at the time of writing, namely GPT-4, at 10x less the cost. Our contribution is a set of features, their properties, definitions, and examples in a machine-readable format, along with the code for RhetAnn and the GPT prompts and fine-tuning procedures for advancing state-of-the-art interpretable propaganda technique detection.

Related papers

Large Language Models as Span Annotators [5.488183187190419]
span annotation can guide improvements and provide insights. Until recently, span annotation was limited to human annotators or fine-tuned encoder models. We show that large language models (LLMs) are straightforward to implement and notably more cost-efficient than human annotators.
arXiv Detail & Related papers (2025-04-11T17:04:51Z)
Enhancing Plagiarism Detection in Marathi with a Weighted Ensemble of TF-IDF and BERT Embeddings for Low-Resource Language Processing [0.0]
It is crucial to design robust plagiarism detection systems tailored for low-resource languages. This paper presents a method to enhance the accuracy of plagiarism detection for Marathi texts.
arXiv Detail & Related papers (2025-01-09T14:14:18Z)
TextSleuth: Towards Explainable Tampered Text Detection [49.88698441048043]
We propose to explain the basis of tampered text detection with natural language via large multimodal models. To fill the data gap for this task, we propose a large-scale, comprehensive dataset, ETTD. Elaborate queries are introduced to generate high-quality anomaly descriptions with GPT4o. To automatically filter out low-quality annotations, we also propose to prompt GPT4o to recognize tampered texts.
arXiv Detail & Related papers (2024-12-19T13:10:03Z)
Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation. We introduce novel methodologies and datasets to overcome these challenges. We propose MhBART, an encoder-decoder model designed to emulate human writing style. We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z)
GUS-Net: Social Bias Classification in Text with Generalizations, Unfairness, and Stereotypes [2.2162879952427343]
This paper introduces GUS-Net, an innovative approach to bias detection. GUS-Net focuses on three key types of biases: (G)eneralizations, (U)nfairness, and (S)tereotypes. Our methodology enhances traditional bias detection methods by incorporating the contextual encodings of pre-trained models.
arXiv Detail & Related papers (2024-10-10T21:51:22Z)
Can GPT-4 learn to analyse moves in research article abstracts? [0.9999629695552195]
We employ the affordances of GPT-4 to automate the annotation process by using natural language prompts. An 8-shot prompt was more effective than one using two, confirming that the inclusion of examples illustrating areas of variability can enhance GPT-4's ability to recognize multiple moves in a single sentence.
arXiv Detail & Related papers (2024-07-22T13:14:27Z)
Large Language Models for Propaganda Span Annotation [10.358271919023903]
This study investigates whether Large Language Models, such as GPT-4, can effectively extract propagandistic spans. The experiments are performed over a large-scale in-house manually annotated dataset.
arXiv Detail & Related papers (2023-11-16T11:37:54Z)
Exploring Large Language Model for Graph Data Understanding in Online Job Recommendations [63.19448893196642]
We present a novel framework that harnesses the rich contextual information and semantic representations provided by large language models to analyze behavior graphs. By leveraging this capability, our framework enables personalized and accurate job recommendations for individual users.
arXiv Detail & Related papers (2023-07-10T11:29:41Z)
Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text [4.863310073296471]
We propose 2SDiac, a multi-source model that can effectively support optional diacritics in input to inform all predictions. We also introduce Guided Learning, a training scheme to leverage given diacritics in input with different levels of random masking.
arXiv Detail & Related papers (2023-06-06T10:18:17Z)
COFFEE: Counterfactual Fairness for Personalized Text Generation in Explainable Recommendation [56.520470678876656]
bias inherent in user written text can associate different levels of linguistic quality with users' protected attributes. We introduce a general framework to achieve measure-specific counterfactual fairness in explanation generation.
arXiv Detail & Related papers (2022-10-14T02:29:10Z)
Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets. We define a uniform evaluation setup including a new formalization of the annotation error detection task. We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z)
Assisted Text Annotation Using Active Learning to Achieve High Quality with Little Effort [9.379650501033465]
We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations. We combine an active learning (AL) approach with a pre-trained language model to semi-automatically identify annotation categories. Our preliminary results show that employing AL strongly reduces the number of annotations for correct classification of even complex and subtle frames.
arXiv Detail & Related papers (2021-12-15T13:14:58Z)
Leveraging Pre-trained Language Model for Speech Sentiment Analysis [58.78839114092951]
We explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis. We propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach.
arXiv Detail & Related papers (2021-06-11T20:15:21Z)
Annotation Curricula to Implicitly Train Non-Expert Annotators [56.67768938052715]
voluntary studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain. This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations. We propose annotation curricula, a novel approach to implicitly train annotators.
arXiv Detail & Related papers (2021-06-04T09:48:28Z)
Leveraging Declarative Knowledge in Text and First-Order Logic for Fine-Grained Propaganda Detection [139.3415751957195]
We study the detection of propagandistic text fragments in news articles. We introduce an approach to inject declarative knowledge of fine-grained propaganda techniques.
arXiv Detail & Related papers (2020-04-29T13:46:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.