A Trustable LSTM-Autoencoder Network for Cyberbullying Detection on
Social Media Using Synthetic Data
- URL: http://arxiv.org/abs/2308.09722v1
- Date: Tue, 15 Aug 2023 17:20:05 GMT
- Title: A Trustable LSTM-Autoencoder Network for Cyberbullying Detection on
Social Media Using Synthetic Data
- Authors: Mst Shapna Akter, Hossain Shahriar, Alfredo Cuzzocrea
- Abstract summary: This paper proposes a trustable LSTM-Autoencoder Network for cyberbullying detection on social media.
We have demonstrated a cutting-edge method to address data availability difficulties by producing machine-translated data.
We carried out experimental identification of aggressive comments on Hindi, Bangla, and English datasets.
- Score: 2.378735224874938
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Social media cyberbullying has a detrimental effect on human life. As online
social networking grows daily, the amount of hate speech also increases. Such
terrible content can cause depression and actions related to suicide. This
paper proposes a trustable LSTM-Autoencoder Network for cyberbullying detection
on social media using synthetic data. We have demonstrated a cutting-edge
method to address data availability difficulties by producing
machine-translated data. However, several languages such as Hindi and Bangla
still lack adequate investigations due to a lack of datasets. We carried out
experimental identification of aggressive comments on Hindi, Bangla, and
English datasets using the proposed model and traditional models, including
Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BiLSTM),
LSTM-Autoencoder, Word2vec, Bidirectional Encoder Representations from
Transformers (BERT), and Generative Pre-trained Transformer 2 (GPT-2) models.
We employed evaluation metrics such as f1-score, accuracy, precision, and
recall to assess the models performance. Our proposed model outperformed all
the models on all datasets, achieving the highest accuracy of 95%. Our model
achieves state-of-the-art results among all the previous works on the dataset
we used in this paper.
Related papers
- Semantic and Contextual Modeling for Malicious Comment Detection with BERT-BiLSTM [1.5845117761091052]
We propose a deep learning model that combines BERT and BiLSTM.
The BERT model, through pre-training, captures deep semantic features of text, while the BiLSTM network excels at processing sequential data.
Experimental results on the Jigsaw Unintended Bias in Toxicity Classification dataset demonstrate that the BERT+BiLSTM model achieves superior performance in malicious comment detection tasks.
arXiv Detail & Related papers (2025-03-14T04:51:36Z) - Towards Characterizing Cyber Networks with Large Language Models [0.0]
We employ latent features of cyber data to find anomalies via a prototype tool called Cyber Log Embeddings Model (CLEM)
CLEM was trained on Zeek network traffic logs from both a real-world production network and an from Internet of Things (IoT) cybersecurity testbed.
arXiv Detail & Related papers (2024-11-11T16:09:13Z) - SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models [64.40250409933752]
We build upon our previous publication by implementing a simple and efficient non-autoregressive (NAR) TTS framework, termed SimpleSpeech 2.
SimpleSpeech 2 effectively combines the strengths of both autoregressive (AR) and non-autoregressive (NAR) methods.
We show a significant improvement in generation performance and generation speed compared to our previous work and other state-of-the-art (SOTA) large-scale TTS models.
arXiv Detail & Related papers (2024-08-25T17:07:39Z) - Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification [11.6055501181235]
We investigate the use of verification on synthesized data to prevent model collapse.
We show that verifiers, even imperfect ones, can indeed be harnessed to prevent model collapse.
arXiv Detail & Related papers (2024-06-11T17:46:16Z) - Deep Learning-Based Cyber-Attack Detection Model for Smart Grids [6.642400003243118]
A novel artificial intelligence-based cyber-attack detection model is developed to stop data integrity cyber-attacks (DIAs) on the received load data by supervisory control and data acquisition (SCADA)
In the proposed model, first the load data is forecasted using a regression model and after processing stage, the processed data is clustered using the unsupervised learning method.
The proposed EE-BiLSTM method can perform more robust and accurate compared to the other two methods.
arXiv Detail & Related papers (2023-12-14T10:54:04Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - The Curse of Recursion: Training on Generated Data Makes Models Forget [70.02793975243212]
Large language models (LLMs) are here to stay, and will bring about drastic change in the whole ecosystem of online text and images.
We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear.
arXiv Detail & Related papers (2023-05-27T15:10:41Z) - Deep Learning Approach for Classifying the Aggressive Comments on Social
Media: Machine Translated Data Vs Real Life Data [15.813222387547357]
This paper particularly worked on the Hindi, Bangla, and English datasets to detect aggressive comments.
A fully machine-translated English dataset has been analyzed with the models such as the Long Short term memory model (LSTM), Bidirectional Long-short term memory model (BiLSTM), word2vec, Bidirectional Representations from Transformers (BERT), and generative pre-trained transformer (GPT-2)
We have compared the performance of using the noisy data with two more datasets such as raw data, which does not contain any noises, and semi-noisy data, which contains a certain amount of noisy data.
arXiv Detail & Related papers (2023-03-13T21:43:08Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Offensive Language and Hate Speech Detection with Deep Learning and
Transfer Learning [1.77356577919977]
We propose an approach to automatically classify tweets into three classes: Hate, offensive and Neither.
We create a class module which contains main functionality including text classification, sentiment checking and text data augmentation.
arXiv Detail & Related papers (2021-08-06T20:59:47Z) - Decoupling Pronunciation and Language for End-to-end Code-switching
Automatic Speech Recognition [66.47000813920617]
We propose a decoupled transformer model to use monolingual paired data and unpaired text data.
The model is decoupled into two parts: audio-to-phoneme (A2P) network and phoneme-to-text (P2T) network.
By using monolingual data and unpaired text data, the decoupled transformer model reduces the high dependency on code-switching paired training data of E2E model.
arXiv Detail & Related papers (2020-10-28T07:46:15Z) - Depth-Adaptive Graph Recurrent Network for Text Classification [71.20237659479703]
Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network.
We propose a depth-adaptive mechanism for the S-LSTM, which allows the model to learn how many computational steps to conduct for different words as required.
arXiv Detail & Related papers (2020-02-29T03:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.