HateGPT: Unleashing GPT-3.5 Turbo to Combat Hate Speech on X
- URL: http://arxiv.org/abs/2411.09214v1
- Date: Thu, 14 Nov 2024 06:20:21 GMT
- Title: HateGPT: Unleashing GPT-3.5 Turbo to Combat Hate Speech on X
- Authors: Aniket Deroy, Subhankar Maity,
- Abstract summary: We evaluate the performance of a classification model using Macro-F1 scores across three distinct runs.
The results suggest that the model consistently performs well in terms of precision and recall, with run 1 showing the highest performance.
- Score: 0.0
- License:
- Abstract: The widespread use of social media platforms like Twitter and Facebook has enabled people of all ages to share their thoughts and experiences, leading to an immense accumulation of user-generated content. However, alongside the benefits, these platforms also face the challenge of managing hate speech and offensive content, which can undermine rational discourse and threaten democratic values. As a result, there is a growing need for automated methods to detect and mitigate such content, especially given the complexity of conversations that may require contextual analysis across multiple languages, including code-mixed languages like Hinglish, German-English, and Bangla. We participated in the English task where we have to classify English tweets into two categories namely Hate and Offensive and Non Hate-Offensive. In this work, we experiment with state-of-the-art large language models like GPT-3.5 Turbo via prompting to classify tweets into Hate and Offensive or Non Hate-Offensive. In this study, we evaluate the performance of a classification model using Macro-F1 scores across three distinct runs. The Macro-F1 score, which balances precision and recall across all classes, is used as the primary metric for model evaluation. The scores obtained are 0.756 for run 1, 0.751 for run 2, and 0.754 for run 3, indicating a high level of performance with minimal variance among the runs. The results suggest that the model consistently performs well in terms of precision and recall, with run 1 showing the highest performance. These findings highlight the robustness and reliability of the model across different runs.
Related papers
- Hate Speech and Offensive Content Detection in Indo-Aryan Languages: A
Battle of LSTM and Transformers [0.0]
We conduct a comparative analysis of hate speech classification across five distinct languages: Bengali, Assamese, Bodo, Sinhala, and Gujarati.
Bert Base Multilingual Cased emerges as a strong performer across languages, achieving an F1 score of 0.67027 for Bengali and 0.70525 for Assamese.
In Sinhala, XLM-R stands out with an F1 score of 0.83493, whereas for Gujarati, a custom LSTM-based model outshined with an F1 score of 0.76601.
arXiv Detail & Related papers (2023-12-09T20:24:00Z) - Making Large Language Models Better Reasoners with Step-Aware Verifier [49.16750018427259]
DIVERSE (Diverse Verifier on Reasoning Step) is a novel approach that further enhances the reasoning capability of language models.
We evaluate DIVERSE on the latest language model code-davinci and show that it achieves new state-of-the-art results on six of eight reasoning benchmarks.
arXiv Detail & Related papers (2022-06-06T03:38:36Z) - Multilingual Hate Speech and Offensive Content Detection using Modified
Cross-entropy Loss [0.0]
Large language models are trained on a lot of data and they also make use of contextual embeddings.
The data is also quite unbalanced; so we used a modified cross-entropy loss to tackle the issue.
Our team (HNLP) achieved the macro F1-scores of 0.808, 0.639 in English Subtask A and English Subtask B respectively.
arXiv Detail & Related papers (2022-02-05T20:31:40Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Few-shot Learning with Multilingual Language Models [66.49496434282564]
We train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages.
Our largest model sets new state of the art in few-shot learning in more than 20 representative languages.
We present a detailed analysis of where the model succeeds and fails, showing in particular that it enables cross-lingual in-context learning.
arXiv Detail & Related papers (2021-12-20T16:52:35Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - One to rule them all: Towards Joint Indic Language Hate Speech Detection [7.296361860015606]
We present a multilingual architecture using state-of-the-art transformer language models to jointly learn hate and offensive speech detection.
On the provided testing corpora, we achieve Macro F1 scores of 0.7996, 0.7748, 0.8651 for sub-task 1A and 0.6268, 0.5603 during the fine-grained classification of sub-task 1B.
arXiv Detail & Related papers (2021-09-28T13:30:00Z) - Detecting Abusive Albanian [5.092028049119383]
scShaj is an annotated dataset for hate speech and offensive speech constructed from user-text content on various social media platforms.
The dataset is tested using three different classification models, the best of which achieves an F1 score of 0.77 for the identification of offensive language.
arXiv Detail & Related papers (2021-07-28T18:47:32Z) - An Online Multilingual Hate speech Recognition System [13.87667165678441]
We analyse six datasets by combining them into a single homogeneous dataset and classify them into three classes, abusive, hateful or neither.
We create a tool which identifies and scores a page with effective metric in near-real time and uses the same as feedback to re-train our model.
We prove the competitive performance of our multilingual model on two langauges, English and Hindi, leading to comparable or superior performance to most monolingual models.
arXiv Detail & Related papers (2020-11-23T16:33:48Z) - Language Models are Few-Shot Learners [61.36677350504291]
We show that scaling up language models greatly improves task-agnostic, few-shot performance.
We train GPT-3, an autoregressive language model with 175 billion parameters, and test its performance in the few-shot setting.
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
arXiv Detail & Related papers (2020-05-28T17:29:03Z) - TuringAdvice: A Generative and Dynamic Evaluation of Language Use [90.3029315711237]
We propose TuringAdvice, a new challenge task and dataset for language understanding models.
Given a written situation that a real person is currently facing, a model must generate helpful advice in natural language.
Empirical results show that today's models struggle at TuringAdvice.
arXiv Detail & Related papers (2020-04-07T18:00:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.