A Trustable LSTM-Autoencoder Network for Cyberbullying Detection on
Social Media Using Synthetic Data
- URL: http://arxiv.org/abs/2308.09722v1
- Date: Tue, 15 Aug 2023 17:20:05 GMT
- Title: A Trustable LSTM-Autoencoder Network for Cyberbullying Detection on
Social Media Using Synthetic Data
- Authors: Mst Shapna Akter, Hossain Shahriar, Alfredo Cuzzocrea
- Abstract summary: This paper proposes a trustable LSTM-Autoencoder Network for cyberbullying detection on social media.
We have demonstrated a cutting-edge method to address data availability difficulties by producing machine-translated data.
We carried out experimental identification of aggressive comments on Hindi, Bangla, and English datasets.
- Score: 2.378735224874938
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Social media cyberbullying has a detrimental effect on human life. As online
social networking grows daily, the amount of hate speech also increases. Such
terrible content can cause depression and actions related to suicide. This
paper proposes a trustable LSTM-Autoencoder Network for cyberbullying detection
on social media using synthetic data. We have demonstrated a cutting-edge
method to address data availability difficulties by producing
machine-translated data. However, several languages such as Hindi and Bangla
still lack adequate investigations due to a lack of datasets. We carried out
experimental identification of aggressive comments on Hindi, Bangla, and
English datasets using the proposed model and traditional models, including
Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BiLSTM),
LSTM-Autoencoder, Word2vec, Bidirectional Encoder Representations from
Transformers (BERT), and Generative Pre-trained Transformer 2 (GPT-2) models.
We employed evaluation metrics such as f1-score, accuracy, precision, and
recall to assess the models performance. Our proposed model outperformed all
the models on all datasets, achieving the highest accuracy of 95%. Our model
achieves state-of-the-art results among all the previous works on the dataset
we used in this paper.
Related papers
- Camouflage is all you need: Evaluating and Enhancing Language Model
Robustness Against Camouflage Adversarial Attacks [53.87300498478744]
Adversarial attacks represent a substantial challenge in Natural Language Processing (NLP)
This study undertakes a systematic exploration of this challenge in two distinct phases: vulnerability evaluation and resilience enhancement.
Results suggest a trade-off between performance and robustness, with some models maintaining similar performance while gaining robustness.
arXiv Detail & Related papers (2024-02-15T10:58:22Z) - Deep Learning-Based Cyber-Attack Detection Model for Smart Grids [6.642400003243118]
A novel artificial intelligence-based cyber-attack detection model is developed to stop data integrity cyber-attacks (DIAs) on the received load data by supervisory control and data acquisition (SCADA)
In the proposed model, first the load data is forecasted using a regression model and after processing stage, the processed data is clustered using the unsupervised learning method.
The proposed EE-BiLSTM method can perform more robust and accurate compared to the other two methods.
arXiv Detail & Related papers (2023-12-14T10:54:04Z) - A Text-to-Text Model for Multilingual Offensive Language Identification [19.23565690468299]
This study presents the first pre-trained model with encoder-decoder architecture for offensive language identification with text-to-text transformers (T5)
Our pre-trained T5 model outperforms other transformer-based models fine-tuned for offensive language detection, such as fBERT and HateBERT, in multiple English benchmarks.
Following a similar approach, we also train the first multilingual pre-trained model for offensive language identification using mT5.
arXiv Detail & Related papers (2023-12-06T09:37:27Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - The Curse of Recursion: Training on Generated Data Makes Models Forget [70.02793975243212]
Large language models (LLMs) are here to stay, and will bring about drastic change in the whole ecosystem of online text and images.
We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear.
arXiv Detail & Related papers (2023-05-27T15:10:41Z) - Deep Learning Approach for Classifying the Aggressive Comments on Social
Media: Machine Translated Data Vs Real Life Data [15.813222387547357]
This paper particularly worked on the Hindi, Bangla, and English datasets to detect aggressive comments.
A fully machine-translated English dataset has been analyzed with the models such as the Long Short term memory model (LSTM), Bidirectional Long-short term memory model (BiLSTM), word2vec, Bidirectional Representations from Transformers (BERT), and generative pre-trained transformer (GPT-2)
We have compared the performance of using the noisy data with two more datasets such as raw data, which does not contain any noises, and semi-noisy data, which contains a certain amount of noisy data.
arXiv Detail & Related papers (2023-03-13T21:43:08Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Offensive Language and Hate Speech Detection with Deep Learning and
Transfer Learning [1.77356577919977]
We propose an approach to automatically classify tweets into three classes: Hate, offensive and Neither.
We create a class module which contains main functionality including text classification, sentiment checking and text data augmentation.
arXiv Detail & Related papers (2021-08-06T20:59:47Z) - TextGNN: Improving Text Encoder via Graph Neural Network in Sponsored
Search [11.203006652211075]
We propose a TextGNN model that naturally extends the strong twin tower structured encoders with the complementary graph information from user historical behaviors.
In offline experiments, the model achieves a 0.14% overall increase in ROC-AUC with a 1% increased accuracy for long-tail low-frequency Ads.
In online A/B testing, the model shows a 2.03% increase in Revenue Per Mille with a 2.32% decrease in Ad defect rate.
arXiv Detail & Related papers (2021-01-15T23:12:47Z) - Decoupling Pronunciation and Language for End-to-end Code-switching
Automatic Speech Recognition [66.47000813920617]
We propose a decoupled transformer model to use monolingual paired data and unpaired text data.
The model is decoupled into two parts: audio-to-phoneme (A2P) network and phoneme-to-text (P2T) network.
By using monolingual data and unpaired text data, the decoupled transformer model reduces the high dependency on code-switching paired training data of E2E model.
arXiv Detail & Related papers (2020-10-28T07:46:15Z) - Depth-Adaptive Graph Recurrent Network for Text Classification [71.20237659479703]
Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network.
We propose a depth-adaptive mechanism for the S-LSTM, which allows the model to learn how many computational steps to conduct for different words as required.
arXiv Detail & Related papers (2020-02-29T03:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.