Initial Study into Application of Feature Density and
Linguistically-backed Embedding to Improve Machine Learning-based
Cyberbullying Detection
- URL: http://arxiv.org/abs/2206.01889v1
- Date: Sat, 4 Jun 2022 03:17:15 GMT
- Title: Initial Study into Application of Feature Density and
Linguistically-backed Embedding to Improve Machine Learning-based
Cyberbullying Detection
- Authors: Juuso Eronen, Michal Ptaszynski, Fumito Masui, Gniewosz Leliwa, Michal
Wroczynski, Mateusz Piech and Aleksander Smywinski-Pohl
- Abstract summary: The research was conducted on a Formspring dataset provided in a Kaggle competition on automatic cyberbullying detection.
The study confirmed the effectiveness of Neural Networks in cyberbullying detection and the correlation between classifier performance and Feature Density.
- Score: 54.83707803301847
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this research, we study the change in the performance of machine learning
(ML) classifiers when various linguistic preprocessing methods of a dataset
were used, with the specific focus on linguistically-backed embeddings in
Convolutional Neural Networks (CNN). Moreover, we study the concept of Feature
Density and confirm its potential to comparatively predict the performance of
ML classifiers, including CNN. The research was conducted on a Formspring
dataset provided in a Kaggle competition on automatic cyberbullying detection.
The dataset was re-annotated by objective experts (psychologists), as the
importance of professional annotation in cyberbullying research has been
indicated multiple times. The study confirmed the effectiveness of Neural
Networks in cyberbullying detection and the correlation between classifier
performance and Feature Density while also proposing a new approach of training
various linguistically-backed embeddings for Convolutional Neural Networks.
Related papers
- Unveiling the Power of Sparse Neural Networks for Feature Selection [60.50319755984697]
Sparse Neural Networks (SNNs) have emerged as powerful tools for efficient feature selection.
We show that SNNs trained with dynamic sparse training (DST) algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction.
Our findings show that feature selection with SNNs trained with DST algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction.
arXiv Detail & Related papers (2024-08-08T16:48:33Z) - Exploring the Potential of Feature Density in Estimating Machine
Learning Classifier Performance with Application to Cyberbullying Detection [2.4674086273775035]
We analyze the potential of Feature Density (HD) as a way to comparatively estimate machine learning (ML) classifier performance prior to training.
Our approach 1s to optimize the resource-intensive training of ML models for Natural Language Processing to reduce the number of required experiments.
arXiv Detail & Related papers (2022-06-04T09:11:13Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - How Facial Features Convey Attention in Stationary Environments [0.0]
This paper aims to extend previous research on distraction detection by analyzing which visual features contribute most to predicting awareness and fatigue.
We utilized the open source facial analysis toolkit OpenFace in order to analyze visual data of subjects at varying levels of attentiveness.
arXiv Detail & Related papers (2021-11-29T20:11:57Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Intraclass clustering: an implicit learning ability that regularizes
DNNs [22.732204569029648]
We show that deep neural networks are regularized through their ability to extract meaningful clusters among a class.
Measures of intraclass clustering are designed based on the neuron- and layer-level representations of the training data.
arXiv Detail & Related papers (2021-03-11T15:26:27Z) - Context Decoupling Augmentation for Weakly Supervised Semantic
Segmentation [53.49821324597837]
Weakly supervised semantic segmentation is a challenging problem that has been deeply studied in recent years.
We present a Context Decoupling Augmentation ( CDA) method to change the inherent context in which the objects appear.
To validate the effectiveness of the proposed method, extensive experiments on PASCAL VOC 2012 dataset with several alternative network architectures demonstrate that CDA can boost various popular WSSS methods to the new state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-03-02T15:05:09Z) - Anomaly Detection on Attributed Networks via Contrastive Self-Supervised
Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks.
Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair.
A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z) - An Evaluation of Recent Neural Sequence Tagging Models in Turkish Named
Entity Recognition [5.161531917413708]
We propose a transformer-based network with a conditional random field layer that leads to the state-of-the-art result.
Our study contributes to the literature that quantifies the impact of transfer learning on processing morphologically rich languages.
arXiv Detail & Related papers (2020-05-14T06:54:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.