Transfer Learning Approach for Arabic Offensive Language Detection
System -- BERT-Based Model
- URL: http://arxiv.org/abs/2102.05708v1
- Date: Tue, 9 Feb 2021 04:58:18 GMT
- Title: Transfer Learning Approach for Arabic Offensive Language Detection
System -- BERT-Based Model
- Authors: Fatemah Husain and Ozlem Uzuner
- Abstract summary: Cyberhate, online harassment and other misuses of technology are on the rise.
Applying advanced techniques from the Natural Language Processing (NLP) field to support the development of an online hate-free community is a critical task for social justice.
This study aims at investigating the effects of fine-tuning and training Bidirectional Representations from Transformers (BERT) model on multiple Arabic offensive language datasets individually.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developing a system to detect online offensive language is very important to
the health and the security of online users. Studies have shown that cyberhate,
online harassment and other misuses of technology are on the rise, particularly
during the global Coronavirus pandemic in 2020. According to the latest report
by the Anti-Defamation League (ADL), 35% of online users reported online
harassment related to their identity-based characteristics, which is a 3%
increase over 2019. Applying advanced techniques from the Natural Language
Processing (NLP) field to support the development of an online hate-free
community is a critical task for social justice. Transfer learning enhances the
performance of the classifier by allowing the transfer of knowledge from one
domain or one dataset to others that have not been seen before, thus,
supporting the classifier to be more generalizable. In our study, we apply the
principles of transfer learning cross multiple Arabic offensive language
datasets to compare the effects on system performance. This study aims at
investigating the effects of fine-tuning and training Bidirectional Encoder
Representations from Transformers (BERT) model on multiple Arabic offensive
language datasets individually and testing it using other datasets
individually. Our experiment starts with a comparison among multiple BERT
models to guide the selection of the main model that is used for our study. The
study also investigates the effects of concatenating all datasets to be used
for fine-tuning and training BERT model. Our results demonstrate the limited
effects of transfer learning on the performance of the classifiers,
particularly for highly dialectic comments.
Related papers
- Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [49.043599241803825]
Iterative Contrastive Unlearning (ICU) framework consists of three core components.
A Knowledge Unlearning Induction module removes specific knowledge through an unlearning loss.
A Contrastive Learning Enhancement module to preserve the model's expressive capabilities against the pure unlearning goal.
And an Iterative Unlearning Refinement module that dynamically assess the unlearning extent on specific data pieces and make iterative update.
arXiv Detail & Related papers (2024-07-25T07:09:35Z) - Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data.
This process might suffer from privacy issues and violations of data protection regulations.
We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z) - Hate Speech and Offensive Language Detection using an Emotion-aware
Shared Encoder [1.8734449181723825]
Existing works on hate speech and offensive language detection produce promising results based on pre-trained transformer models.
This paper addresses a multi-task joint learning approach which combines external emotional features extracted from another corpora.
Our findings demonstrate that emotional knowledge helps to more reliably identify hate speech and offensive language across datasets.
arXiv Detail & Related papers (2023-02-17T09:31:06Z) - Responsible Active Learning via Human-in-the-loop Peer Study [88.01358655203441]
We propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability.
We first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side.
During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion.
arXiv Detail & Related papers (2022-11-24T13:18:27Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - Fine-Tuning Approach for Arabic Offensive Language Detection System:
BERT-Based Model [0.0]
This study investigates the effects of fine-tuning across several Arabic offensive language datasets.
We develop multiple classifiers that use four datasets individually and in combination to gain knowledge about online Arabic offensive content.
arXiv Detail & Related papers (2022-02-07T17:26:35Z) - A Feature Extraction based Model for Hate Speech Identification [2.9005223064604078]
This paper presents TU Berlin team experiments and results on the task 1A and 1B of the shared task on hate speech and offensive content identification in Indo-European languages 2021.
The success of different Natural Language Processing models is evaluated for the respective subtasks throughout the competition.
Among the tested models that have been used for the experiments, the transfer learning-based models achieved the best results in both subtasks.
arXiv Detail & Related papers (2022-01-11T22:53:28Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Offensive Language and Hate Speech Detection with Deep Learning and
Transfer Learning [1.77356577919977]
We propose an approach to automatically classify tweets into three classes: Hate, offensive and Neither.
We create a class module which contains main functionality including text classification, sentiment checking and text data augmentation.
arXiv Detail & Related papers (2021-08-06T20:59:47Z) - An Online Multilingual Hate speech Recognition System [13.87667165678441]
We analyse six datasets by combining them into a single homogeneous dataset and classify them into three classes, abusive, hateful or neither.
We create a tool which identifies and scores a page with effective metric in near-real time and uses the same as feedback to re-train our model.
We prove the competitive performance of our multilingual model on two langauges, English and Hindi, leading to comparable or superior performance to most monolingual models.
arXiv Detail & Related papers (2020-11-23T16:33:48Z) - Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP)
In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.