Comparative Analysis of Machine Learning and Deep Learning Algorithms
for Detection of Online Hate Speech
- URL: http://arxiv.org/abs/2108.01063v1
- Date: Fri, 23 Apr 2021 04:19:15 GMT
- Title: Comparative Analysis of Machine Learning and Deep Learning Algorithms
for Detection of Online Hate Speech
- Authors: Tashvik Dhamija, Anjum, Rahul Katarya
- Abstract summary: Several attempts have been made to classify hate speech using machine learning but the state-of-the-art models are not robust enough for practical applications.
In this paper, we explored various feature engineering techniques ranging from different embeddings to conventional NLP algorithms.
We conclude that BERT based embeddings give the most useful features for this problem and have the capacity to be made into a practical robust model.
- Score: 5.543220407902113
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the day and age of social media, users have become prone to online hate
speech. Several attempts have been made to classify hate speech using machine
learning but the state-of-the-art models are not robust enough for practical
applications. This is attributed to the use of primitive NLP feature
engineering techniques. In this paper, we explored various feature engineering
techniques ranging from different embeddings to conventional NLP algorithms. We
also experimented with combinations of different features. From our
experimentation, we realized that roBERTa (robustly optimized BERT approach)
based sentence embeddings classified using decision trees gives the best
results of 0.9998 F1 score. In our paper, we concluded that BERT based
embeddings give the most useful features for this problem and have the capacity
to be made into a practical robust model.
Related papers
- Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a new framework based on large language models (LLMs) and decision Tree reasoning (OCTree)
Our key idea is to leverage LLMs' reasoning capabilities to find good feature generation rules without manually specifying the search space.
Our empirical results demonstrate that this simple framework consistently enhances the performance of various prediction models.
arXiv Detail & Related papers (2024-06-12T08:31:34Z) - Comparative Analysis of Libraries for the Sentimental Analysis [0.0]
This study is main goal is to provide a comparative comparison of libraries using machine learning methods.
Five Python and R libraries NLTK, Textlob Vader, Transformers (GPT and BERT pretrained), and Tidytext will be used in the study to apply sentiment analysis techniques.
arXiv Detail & Related papers (2023-07-26T17:21:53Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - A Context-Sensitive Word Embedding Approach for The Detection of Troll
Tweets [0.0]
We develop and evaluate a set of model architectures for the automatic detection of troll tweets.
BERT, ELMo, and GloVe embedding methods performed better than the GloVe method.
CNN and GRU encoders performed similarly in terms of F1 score and AUC.
The best-performing method was found to be an ELMo-based architecture that employed a GRU classifier, with an AUC score of 0.929.
arXiv Detail & Related papers (2022-07-17T17:12:16Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Comparison Analysis of Traditional Machine Learning and Deep Learning
Techniques for Data and Image Classification [62.997667081978825]
The purpose of the study is to analyse and compare the most common machine learning and deep learning techniques used for computer vision 2D object classification tasks.
Firstly, we will present the theoretical background of the Bag of Visual words model and Deep Convolutional Neural Networks (DCNN)
Secondly, we will implement a Bag of Visual Words model, the VGG16 CNN Architecture.
arXiv Detail & Related papers (2022-04-11T11:34:43Z) - Nearest neighbour approaches for Emotion Detection in Tweets [1.7581155313656314]
We propose an approach using weighted $k$ Nearest Neighbours (kNN), a simple, easy to implement, and explainable machine learning model.
In particular, we apply the weighted kNN model to the shared emotion detection task in tweets from SemEval-2018.
arXiv Detail & Related papers (2021-07-08T13:00:06Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Comparing BERT against traditional machine learning text classification [0.0]
The BERT model has arisen as a popular state-of-the-art machine learning model in the recent years.
Our purpose of this work is to add empirical evidence to support or refuse the use of BERT as a default on NLP tasks.
arXiv Detail & Related papers (2020-05-26T20:14:39Z) - Leveraging End-to-End Speech Recognition with Neural Architecture Search [0.0]
We show that a large improvement in the accuracy of deep speech models can be achieved with effective Neural Architecture Optimization.
Our method achieves test error of 7% Word Error Rate (WER) on the LibriSpeech corpus and 13% Phone Error Rate (PER) on the TIMIT corpus, on par with state-of-the-art results.
arXiv Detail & Related papers (2019-12-11T08:15:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.