Comparative Performance of Machine Learning Algorithms in Cyberbullying
Detection: Using Turkish Language Preprocessing Techniques
- URL: http://arxiv.org/abs/2101.12718v1
- Date: Fri, 29 Jan 2021 18:28:44 GMT
- Title: Comparative Performance of Machine Learning Algorithms in Cyberbullying
Detection: Using Turkish Language Preprocessing Techniques
- Authors: Emre Cihan Ates, Erkan Bostanci, Mehmet Serdar Guzel
- Abstract summary: The aim of this study is to compare the performance of different machine learning algorithms in detecting Turkish messages containing cyberbullying.
It was determined that the Light Gradient Boosting Model (LGBM) algorithm showed the best performance with 90.788% accuracy and 90.949% F1 Score value.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing use of the internet and social media, it is obvious that
cyberbullying has become a major problem. The most basic way for protection
against the dangerous consequences of cyberbullying is to actively detect and
control the contents containing cyberbullying. When we look at today's internet
and social media statistics, it is impossible to detect cyberbullying contents
only by human power. Effective cyberbullying detection methods are necessary in
order to make social media a safe communication space. Current research efforts
focus on using machine learning for detecting and eliminating cyberbullying.
Although most of the studies have been conducted on English texts for the
detection of cyberbullying, there are few studies in Turkish. Limited methods
and algorithms were also used in studies conducted on the Turkish language. In
addition, the scope and performance of the algorithms used to classify the
texts containing cyberbullying is different, and this reveals the importance of
using an appropriate algorithm. The aim of this study is to compare the
performance of different machine learning algorithms in detecting Turkish
messages containing cyberbullying. In this study, nineteen different
classification algorithms were used to identify texts containing cyberbullying
using Turkish natural language processing techniques. Precision, recall,
accuracy and F1 score values were used to evaluate the performance of
classifiers. It was determined that the Light Gradient Boosting Model (LGBM)
algorithm showed the best performance with 90.788% accuracy and 90.949% F1
Score value.
Related papers
- Deep Learning Approaches for Detecting Adversarial Cyberbullying and Hate Speech in Social Networks [0.0]
This paper focuses on detecting cyberbullying in adversarial attack content within social networking site text data, specifically emphasizing hate speech.
An LSTM model with a fixed epoch of 100 demonstrated remarkable performance, achieving high accuracy, precision, recall, F1-score, and AUC-ROC scores of 87.57%, 88.73%, 87.57%, 88.15%, and 91% respectively.
arXiv Detail & Related papers (2024-05-30T21:44:15Z) - Securing Social Spaces: Harnessing Deep Learning to Eradicate Cyberbullying [1.8749305679160366]
cyberbullying is a serious problem that can harm the mental and physical health of people who use social media.
This paper explains just how serious cyberbullying is and how it really affects indi-viduals exposed to it.
It stresses how important it is to find better ways to detect cyberbullying so that online spaces can be safer.
arXiv Detail & Related papers (2024-04-01T20:41:28Z) - Explain Thyself Bully: Sentiment Aided Cyberbullying Detection with
Explanation [52.3781496277104]
Cyberbullying has become a big issue with the popularity of different social media networks and online communication apps.
Recent laws like "right to explanations" of General Data Protection Regulation have spurred research in developing interpretable models.
We develop first interpretable multi-task model called em mExCB for automatic cyberbullying detection from code-mixed languages.
arXiv Detail & Related papers (2024-01-17T07:36:22Z) - DEMASQ: Unmasking the ChatGPT Wordsmith [63.8746084667206]
We propose an effective ChatGPT detector named DEMASQ, which accurately identifies ChatGPT-generated content.
Our method addresses two critical factors: (i) the distinct biases in text composition observed in human- and machine-generated content and (ii) the alterations made by humans to evade previous detection methods.
arXiv Detail & Related papers (2023-11-08T21:13:05Z) - Cyberbullying Detection for Low-resource Languages and Dialects: Review
of the State of the Art [0.9831489366502298]
There are 23 low-resource languages and dialects covered by this paper, including Bangla, Hindi, Dravidian languages and others.
In the survey, we identify some of the research gaps of previous studies, which include the lack of reliable definitions of cyberbullying.
Based on those proposed suggestions, we collect and release a cyberbullying dataset in the Chittagonian dialect of Bangla.
arXiv Detail & Related papers (2023-08-30T03:52:28Z) - Graph Mining for Cybersecurity: A Survey [61.505995908021525]
The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society.
Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities.
With the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance.
arXiv Detail & Related papers (2023-04-02T08:43:03Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - In the Service of Online Order: Tackling Cyber-Bullying with Machine
Learning and Affect Analysis [13.092135222168324]
PTA (Parent-Teacher Association) members have started Online Patrol to spot malicious contents within Web forums and blogs.
In practise, Online Patrol assumes reading through the whole Web contents, which is a task difficult to perform manually.
We aim to develop a set of tools that can automatically detect malicious entries and report them to PTA members.
arXiv Detail & Related papers (2022-03-04T03:13:45Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - Enhancing the Identification of Cyberbullying through Participant Roles [1.399948157377307]
This paper proposes a novel approach to enhancing cyberbullying detection through role modeling.
We utilise a dataset from ASKfm to perform multi-class classification to detect participant roles.
arXiv Detail & Related papers (2020-10-13T19:13:07Z) - TextHide: Tackling Data Privacy in Language Understanding Tasks [54.11691303032022]
TextHide mitigates privacy risks without slowing down training or reducing accuracy.
It requires all participants to add a simple encryption step to prevent an eavesdropping attacker from recovering private text data.
We evaluate TextHide on the GLUE benchmark, and our experiments show that TextHide can effectively defend attacks on shared gradients or representations.
arXiv Detail & Related papers (2020-10-12T22:22:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.