A Context-Sensitive Word Embedding Approach for The Detection of Troll
Tweets
- URL: http://arxiv.org/abs/2207.08230v4
- Date: Wed, 7 Jun 2023 12:25:00 GMT
- Title: A Context-Sensitive Word Embedding Approach for The Detection of Troll
Tweets
- Authors: Seyhmus Yilmaz and Sultan Zavrak
- Abstract summary: We develop and evaluate a set of model architectures for the automatic detection of troll tweets.
BERT, ELMo, and GloVe embedding methods performed better than the GloVe method.
CNN and GRU encoders performed similarly in terms of F1 score and AUC.
The best-performing method was found to be an ELMo-based architecture that employed a GRU classifier, with an AUC score of 0.929.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this study, we aimed to address the growing concern of trolling behavior
on social media by developing and evaluating a set of model architectures for
the automatic detection of troll tweets. Utilizing deep learning techniques and
pre-trained word embedding methods such as BERT, ELMo, and GloVe, we evaluated
the performance of each architecture using metrics such as classification
accuracy, F1 score, AUC, and precision. Our results indicate that BERT and ELMo
embedding methods performed better than the GloVe method, likely due to their
ability to provide contextualized word embeddings that better capture the
nuances and subtleties of language use in online social media. Additionally, we
found that CNN and GRU encoders performed similarly in terms of F1 score and
AUC, suggesting their effectiveness in extracting relevant information from
input text. The best-performing method was found to be an ELMo-based
architecture that employed a GRU classifier, with an AUC score of 0.929. This
research highlights the importance of utilizing contextualized word embeddings
and appropriate encoder methods in the task of troll tweet detection, which can
assist social-based systems in improving their performance in identifying and
addressing trolling behavior on their platforms.
Related papers
- Pronunciation Assessment with Multi-modal Large Language Models [10.35401596425946]
We propose a scoring system based on large language models (LLMs)
The speech encoder first maps the learner's speech into contextual features.
The adapter layer then transforms these features to align with the text embedding in latent space.
arXiv Detail & Related papers (2024-07-12T12:16:14Z) - Unifying Structure and Language Semantic for Efficient Contrastive
Knowledge Graph Completion with Structured Entity Anchors [0.3913403111891026]
The goal of knowledge graph completion (KGC) is to predict missing links in a KG using trained facts that are already known.
We propose a novel method to effectively unify structure information and language semantics without losing the power of inductive reasoning.
arXiv Detail & Related papers (2023-11-07T11:17:55Z) - Fuzzy Fingerprinting Transformer Language-Models for Emotion Recognition
in Conversations [0.7874708385247353]
We propose to combine the two approaches to perform Emotion Recognition in Conversations (ERC)
We feed utterances and their previous conversational turns to a pre-trained RoBERTa, obtaining contextual embedding utterance representations.
We validate our approach on the widely used DailyDialog ERC benchmark dataset.
arXiv Detail & Related papers (2023-09-08T12:26:01Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - Prefer to Classify: Improving Text Classifiers via Auxiliary Preference
Learning [76.43827771613127]
In this paper, we investigate task-specific preferences between pairs of input texts as a new alternative way for such auxiliary data annotation.
We propose a novel multi-task learning framework, called prefer-to-classify (P2C), which can enjoy the cooperative effect of learning both the given classification task and the auxiliary preferences.
arXiv Detail & Related papers (2023-06-08T04:04:47Z) - CSSL-MHTR: Continual Self-Supervised Learning for Scalable Multi-script Handwritten Text Recognition [16.987008461171065]
We explore the potential of continual self-supervised learning to alleviate the catastrophic forgetting problem in handwritten text recognition.
Our method consists in adding intermediate layers called adapters for each task, and efficiently distilling knowledge from the previous model while learning the current task.
We attain state-of-the-art performance on English, Italian and Russian scripts, whilst adding only a few parameters per task.
arXiv Detail & Related papers (2023-03-16T14:27:45Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Comparative Analysis of Machine Learning and Deep Learning Algorithms
for Detection of Online Hate Speech [5.543220407902113]
Several attempts have been made to classify hate speech using machine learning but the state-of-the-art models are not robust enough for practical applications.
In this paper, we explored various feature engineering techniques ranging from different embeddings to conventional NLP algorithms.
We conclude that BERT based embeddings give the most useful features for this problem and have the capacity to be made into a practical robust model.
arXiv Detail & Related papers (2021-04-23T04:19:15Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.