Navigating Dialectal Bias and Ethical Complexities in Levantine Arabic Hate Speech Detection
- URL: http://arxiv.org/abs/2412.10991v1
- Date: Sat, 14 Dec 2024 23:02:46 GMT
- Title: Navigating Dialectal Bias and Ethical Complexities in Levantine Arabic Hate Speech Detection
- Authors: Ahmed Haj Ahmed, Rui-Jie Yew, Xerxes Minocher, Suresh Venkatasubramanian,
- Abstract summary: Social media platforms have become central to global communication, yet they also facilitate the spread of hate speech.
For underrepresented dialects like Levantine Arabic, detecting hate speech presents unique cultural, ethical, and linguistic challenges.
This paper explores the complex sociopolitical and linguistic landscape of Levantine Arabic and critically examines the limitations of current datasets used in hate speech detection.
- Score: 4.207344194403586
- License:
- Abstract: Social media platforms have become central to global communication, yet they also facilitate the spread of hate speech. For underrepresented dialects like Levantine Arabic, detecting hate speech presents unique cultural, ethical, and linguistic challenges. This paper explores the complex sociopolitical and linguistic landscape of Levantine Arabic and critically examines the limitations of current datasets used in hate speech detection. We highlight the scarcity of publicly available, diverse datasets and analyze the consequences of dialectal bias within existing resources. By emphasizing the need for culturally and contextually informed natural language processing (NLP) tools, we advocate for a more nuanced and inclusive approach to hate speech detection in the Arab world.
Related papers
- A Survey on Automatic Online Hate Speech Detection in Low-Resource Languages [0.5825410941577593]
Social media and easy accessibility of the internet has facilitated the spread of hate speech.
This article provides a detailed survey of hate speech detection in low-resource languages around the world.
arXiv Detail & Related papers (2024-11-28T09:42:53Z) - Hate Speech Detection and Classification in Amharic Text with Deep Learning [4.834669033093363]
We develop Amharic hate speech data and SBi-LSTM deep learning model that can detect and classify text into four categories of hate speech.
We have annotated 5k Amharic social media post and comment data into four categories.
The model achieves a 94.8 F1-score performance.
arXiv Detail & Related papers (2024-08-07T15:46:45Z) - Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects [72.18753241750964]
Yorub'a is an African language with roughly 47 million speakers.
Recent efforts to develop NLP technologies for African languages have focused on their standard dialects.
We take steps towards bridging this gap by introducing a new high-quality parallel text and speech corpus.
arXiv Detail & Related papers (2024-06-27T22:38:04Z) - What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects [60.8361859783634]
We survey speakers of dialects and regional languages related to German.
We find that respondents are especially in favour of potential NLP tools that work with dialectal input.
arXiv Detail & Related papers (2024-02-19T09:15:28Z) - TuPy-E: detecting hate speech in Brazilian Portuguese social media with
a novel dataset and comprehensive analysis of models [0.0]
TuPy-E is the largest annotated Portuguese corpus for hate speech detection.
We conduct a detailed analysis using advanced techniques like BERT models.
arXiv Detail & Related papers (2023-12-29T17:47:00Z) - Content-Localization based System for Analyzing Sentiment and Hate
Behaviors in Low-Resource Dialectal Arabic: English to Levantine and Gulf [5.2957928879391]
This paper proposes to localize content of resources in high-resourced languages into under-resourced Arabic dialects.
We utilize content-localization based neural machine translation to develop sentiment and hate classifiers for two low-resourced Arabic dialects: Levantine and Gulf.
Our findings shed light on the importance of considering the unique nature of dialects within the same language and ignoring the dialectal aspect would lead to misleading analysis.
arXiv Detail & Related papers (2023-11-27T15:37:33Z) - Task-Agnostic Low-Rank Adapters for Unseen English Dialects [52.88554155235167]
Large Language Models (LLMs) are trained on corpora disproportionally weighted in favor of Standard American English.
By disentangling dialect-specific and cross-dialectal information, HyperLoRA improves generalization to unseen dialects in a task-agnostic fashion.
arXiv Detail & Related papers (2023-11-02T01:17:29Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a
Context Synergized Hyperbolic Network [52.85130555886915]
CoSyn is a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations.
We show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
arXiv Detail & Related papers (2023-03-02T17:30:43Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Cross-lingual Capsule Network for Hate Speech Detection in Social Media [6.531659195805749]
We investigate the cross-lingual hate speech detection task, tackling the problem by adapting the hate speech resources from one language to another.
We propose a cross-lingual capsule network learning model coupled with extra domain-specific lexical semantics for hate speech.
Our model achieves state-of-the-art performance on benchmark datasets from AMI@Evalita 2018 and AMI@Ibereval 2018.
arXiv Detail & Related papers (2021-08-06T12:53:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.