Panning for gold: Lessons learned from the platform-agnostic automated
detection of political content in textual data
- URL: http://arxiv.org/abs/2207.00489v1
- Date: Fri, 1 Jul 2022 15:23:23 GMT
- Title: Panning for gold: Lessons learned from the platform-agnostic automated
detection of political content in textual data
- Authors: Mykola Makhortykh, Ernesto de Le\'on, Aleksandra Urman, Clara
Christner, Maryna Sydorova, Silke Adam, Michaela Maier, and Teresa Gil-Lopez
- Abstract summary: We discuss how these techniques can be used to detect political content across different platforms.
We compare the performance of three groups of detection techniques relying on dictionaries, supervised machine learning, or neural networks.
Our results show the limited impact of preprocessing on model performance, with the best results for less noisy data being achieved by neural network- and machine-learning-based models.
- Score: 48.7576911714538
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The growing availability of data about online information behaviour enables
new possibilities for political communication research. However, the volume and
variety of these data makes them difficult to analyse and prompts the need for
developing automated content approaches relying on a broad range of natural
language processing techniques (e.g. machine learning- or neural network-based
ones). In this paper, we discuss how these techniques can be used to detect
political content across different platforms. Using three validation datasets,
which include a variety of political and non-political textual documents from
online platforms, we systematically compare the performance of three groups of
detection techniques relying on dictionaries, supervised machine learning, or
neural networks. We also examine the impact of different modes of data
preprocessing (e.g. stemming and stopword removal) on the low-cost
implementations of these techniques using a large set (n = 66) of detection
models. Our results show the limited impact of preprocessing on model
performance, with the best results for less noisy data being achieved by neural
network- and machine-learning-based models, in contrast to the more robust
performance of dictionary-based models on noisy data.
Related papers
- Unsupervised Data Validation Methods for Efficient Model Training [0.0]
State-of-the-art models in natural language processing (NLP), text-to-speech (TTS), speech-to-text (STT) and vision-language models (VLM) rely heavily on large datasets.
This research explores key areas such as defining "quality data," developing methods for generating appropriate data and enhancing accessibility to model training.
arXiv Detail & Related papers (2024-10-10T13:00:53Z) - Text2Data: Low-Resource Data Generation with Textual Control [104.38011760992637]
Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines.
We propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model.
It undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - Few-shot learning for automated content analysis: Efficient coding of
arguments and claims in the debate on arms deliveries to Ukraine [0.9576975587953563]
Pre-trained language models (PLM) based on transformer neural networks offer great opportunities to improve automatic content analysis in communication science.
Three characteristics so far impeded the widespread adoption of the methods in the applying disciplines: the dominance of English language models in NLP research, the necessary computing resources, and the effort required to produce training data to fine-tune PLMs.
We test our approach on a realistic use case from communication science to automatically detect claims and arguments together with their stance in the German news debate on arms deliveries to Ukraine.
arXiv Detail & Related papers (2023-12-28T11:39:08Z) - Anticipated Network Surveillance -- An extrapolated study to predict
cyber-attacks using Machine Learning and Data Analytics [0.0]
This paper discusses a novel technique to predict an upcoming attack in a network based on several data parameters.
The proposed model comprises dataset pre-processing, and training, followed by the testing phase.
Based on the results of the testing phase, the best model is selected using which, event class which may lead to an attack is extracted.
arXiv Detail & Related papers (2023-12-27T01:09:11Z) - Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification.
Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Evaluating BERT-based Pre-training Language Models for Detecting
Misinformation [2.1915057426589746]
It is challenging to control the quality of online information due to the lack of supervision over all the information posted online.
There is a need for automated rumour detection techniques to limit the adverse effects of spreading misinformation.
This study proposes the BERT-based pre-trained language models to encode text data into vectors and utilise neural network models to classify these vectors to detect misinformation.
arXiv Detail & Related papers (2022-03-15T08:54:36Z) - Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or
Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms.
Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications.
By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z) - What Matters in Learning from Offline Human Demonstrations for Robot
Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation.
Our study analyzes the most critical challenges when learning from offline human data.
We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.