Related papers: Hate speech detection in algerian dialect using deep learning

Hate speech detection in algerian dialect using deep learning

URL: http://arxiv.org/abs/2309.11611v2
Date: Fri, 25 Oct 2024 17:32:08 GMT
Title: Hate speech detection in algerian dialect using deep learning
Authors: Dihia Lanasri, Juan Olano, Sifal Klioui, Sin Liang Lee, Lamia Sekkai,
Abstract summary: We propose a complete approach for detecting hate speech on online Algerian messages. This corpus contains more than 13.5K documents in Algerian dialect written in Arabic, labeled as hateful or non-hateful.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: With the proliferation of hate speech on social networks under different formats, such as abusive language, cyberbullying, and violence, etc., people have experienced a significant increase in violence, putting them in uncomfortable situations and threats. Plenty of efforts have been dedicated in the last few years to overcome this phenomenon to detect hate speech in different structured languages like English, French, Arabic, and others. However, a reduced number of works deal with Arabic dialects like Tunisian, Egyptian, and Gulf, mainly the Algerian ones. To fill in the gap, we propose in this work a complete approach for detecting hate speech on online Algerian messages. Many deep learning architectures have been evaluated on the corpus we created from some Algerian social networks (Facebook, YouTube, and Twitter). This corpus contains more than 13.5K documents in Algerian dialect written in Arabic, labeled as hateful or non-hateful. Promising results are obtained, which show the efficiency of our approach.

Related papers

Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion [55.27025066199226]
This paper addresses the need for democratizing large language models (LLM) in the Arab world. One practical objective for an Arabic LLM is to utilize an Arabic-specific vocabulary for the tokenizer that could speed up decoding. Inspired by the vocabulary learning during Second Language (Arabic) Acquisition for humans, the released AraLLaMA employs progressive vocabulary expansion.
arXiv Detail & Related papers (2024-12-16T19:29:06Z)
Hate Speech Detection and Classification in Amharic Text with Deep Learning [4.834669033093363]
We develop Amharic hate speech data and SBi-LSTM deep learning model that can detect and classify text into four categories of hate speech. We have annotated 5k Amharic social media post and comment data into four categories. The model achieves a 94.8 F1-score performance.
arXiv Detail & Related papers (2024-08-07T15:46:45Z)
An Investigation of Large Language Models for Real-World Hate Speech Detection [46.15140831710683]
A major limitation of existing methods is that hate speech detection is a highly contextual problem. Recently, large language models (LLMs) have demonstrated state-of-the-art performance in several natural language tasks. Our study reveals that a meticulously crafted reasoning prompt can effectively capture the context of hate speech.
arXiv Detail & Related papers (2024-01-07T00:39:33Z)
Task-Agnostic Low-Rank Adapters for Unseen English Dialects [52.88554155235167]
Large Language Models (LLMs) are trained on corpora disproportionally weighted in favor of Standard American English. By disentangling dialect-specific and cross-dialectal information, HyperLoRA improves generalization to unseen dialects in a task-agnostic fashion.
arXiv Detail & Related papers (2023-11-02T01:17:29Z)
AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages [45.88640066767242]
Africa is home to over 2,000 languages from more than six language families and has the highest linguistic diversity among all continents. Yet, there is little NLP research conducted on African languages. Crucial to enabling such research is the availability of high-quality annotated datasets. In this paper, we introduce AfriSenti, a sentiment analysis benchmark that contains a total of >110,000 tweets in 14 African languages.
arXiv Detail & Related papers (2023-02-17T15:40:12Z)
Offensive Language Detection in Under-resourced Algerian Dialectal Arabic Language [0.0]
We focus on the Algerian dialectal Arabic which is one of under-resourced languages. Due to the scarcity of works on the same language, we have built a new corpus regrouping more than 8.7k texts manually annotated as normal, abusive and offensive.
arXiv Detail & Related papers (2022-03-18T15:42:21Z)
Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages. We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language. We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z)
Ceasing hate withMoH: Hate Speech Detection in Hindi-English Code-Switched Language [2.9926023796813728]
This work focuses on analyzing hate speech in Hindi-English code-switched language. To contain the structure of data, we developed MoH or Map Only Hindi, which means "Love" in Hindi. MoH pipeline consists of language identification, Roman to Devanagari Hindi transliteration using a knowledge base of Roman Hindi words.
arXiv Detail & Related papers (2021-10-18T15:24:32Z)
Hate Speech Detection in Roman Urdu [1.6436293069942314]
This study is the first to conduct a study for hate speech detection in Roman Urdu text. We have scrapped more than 90,000 tweets and manually parsed them to identify 5,000 Roman Urdu tweets. We have employed an iterative approach to develop guidelines and used them for generating the Hate Speech Roman Urdu 2020 corpus.
arXiv Detail & Related papers (2021-08-05T19:49:46Z)
Hate speech detection using static BERT embeddings [0.9176056742068814]
Hate speech is emerging as a major concern, where it expresses abusive speech that targets specific group characteristics. In this paper, we analyze the performance of hate speech detection by replacing or integrating the word embeddings. In comparison to fine-tuned BERT, one metric that significantly improved is specificity.
arXiv Detail & Related papers (2021-06-29T16:17:10Z)
Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media during the COVID-19 Crisis [51.39895377836919]
COVID-19 has sparked racism and hate on social media targeted towards Asian communities. We study the evolution and spread of anti-Asian hate speech through the lens of Twitter. We create COVID-HATE, the largest dataset of anti-Asian hate and counterspeech spanning 14 months.
arXiv Detail & Related papers (2020-05-25T21:58:09Z)
Demoting Racial Bias in Hate Speech Detection [39.376886409461775]
In current hate speech datasets, there exists a correlation between annotators' perceptions of toxicity and signals of African American English (AAE) In this paper, we use adversarial training to mitigate this bias, introducing a hate speech classifier that learns to detect toxic sentences while demoting confounds corresponding to AAE texts. Experimental results on a hate speech dataset and an AAE dataset suggest that our method is able to substantially reduce the false positive rate for AAE text while only minimally affecting the performance of hate speech classification.
arXiv Detail & Related papers (2020-05-25T17:43:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.