Analysing Cyberbullying using Natural Language Processing by
Understanding Jargon in Social Media
- URL: http://arxiv.org/abs/2107.08902v1
- Date: Fri, 23 Apr 2021 04:20:19 GMT
- Title: Analysing Cyberbullying using Natural Language Processing by
Understanding Jargon in Social Media
- Authors: Bhumika Bhatia, Anuj Verma, Anjum, Rahul Katarya
- Abstract summary: In our work, we explore binary classification by using a combination of datasets from various social media platforms.
We experiment through multiple models such as Bi-LSTM, GloVe, state-of-the-art models like BERT, and apply a unique preprocessing technique by introducing a slang-abusive corpus.
- Score: 4.932130498861987
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cyberbullying is of extreme prevalence today. Online-hate comments, toxicity,
cyberbullying amongst children and other vulnerable groups are only growing
over online classes, and increased access to social platforms, especially post
COVID-19. It is paramount to detect and ensure minors' safety across social
platforms so that any violence or hate-crime is automatically detected and
strict action is taken against it. In our work, we explore binary
classification by using a combination of datasets from various social media
platforms that cover a wide range of cyberbullying such as sexism, racism,
abusive, and hate-speech. We experiment through multiple models such as
Bi-LSTM, GloVe, state-of-the-art models like BERT, and apply a unique
preprocessing technique by introducing a slang-abusive corpus, achieving a
higher precision in comparison to models without slang preprocessing.
Related papers
- The Use of a Large Language Model for Cyberbullying Detection [0.0]
cyberbullying (CB) is the most prevalent phenomenon in todays cyber world.
It is a severe threat to the mental and physical health of citizens.
This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms.
arXiv Detail & Related papers (2024-02-06T15:46:31Z) - Explain Thyself Bully: Sentiment Aided Cyberbullying Detection with
Explanation [52.3781496277104]
Cyberbullying has become a big issue with the popularity of different social media networks and online communication apps.
Recent laws like "right to explanations" of General Data Protection Regulation have spurred research in developing interpretable models.
We develop first interpretable multi-task model called em mExCB for automatic cyberbullying detection from code-mixed languages.
arXiv Detail & Related papers (2024-01-17T07:36:22Z) - Deep Learning Based Cyberbullying Detection in Bangla Language [0.0]
This study demonstrates a deep learning strategy for identifying cyberbullying in Bengali.
A two-layer bidirectional long short-term memory (Bi-LSTM) model has been built to identify cyberbullying.
arXiv Detail & Related papers (2024-01-07T04:58:59Z) - Understanding writing style in social media with a supervised
contrastively pre-trained transformer [57.48690310135374]
Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation.
We introduce the Style Transformer for Authorship Representations (STAR), trained on a large corpus derived from public sources of 4.5 x 106 authored texts.
Using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy.
arXiv Detail & Related papers (2023-10-17T09:01:17Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - Automated Detection of Cyberbullying Against Women and Immigrants and
Cross-domain Adaptability [2.294014185517203]
This paper focuses on advancing the technology using state-of-the-art NLP techniques.
We use a Twitter dataset from SemEval 2019 - Task 5(HatEval) on hate speech against women and immigrants.
Our best performing ensemble model based on DistilBERT has achieved 0.73 and 0.74 of F1 score in the task of classifying hate speech.
arXiv Detail & Related papers (2020-12-04T13:12:31Z) - Sampling Attacks: Amplification of Membership Inference Attacks by
Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model.
We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance.
For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z) - Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining
Representations for Cyberbullying Classification [4.945634077636197]
We study the nuanced problem of cyberbullying using five explicit factors to represent its social and linguistic aspects.
These results demonstrate the importance of representing and modeling cyberbullying as a social phenomenon.
arXiv Detail & Related papers (2020-04-04T00:35:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.