Model Bias in NLP -- Application to Hate Speech Classification
- URL: http://arxiv.org/abs/2109.09725v2
- Date: Wed, 22 Sep 2021 21:02:29 GMT
- Title: Model Bias in NLP -- Application to Hate Speech Classification
- Authors: Jonas Bokstaller, Georgios Patoulidis and Aygul Zagidullina
- Abstract summary: This document sums up our results forthe NLP lecture at ETH in the spring semester 2021.
In this work, a BERT based neural network model is applied to the JIGSAW dataset.
We get precisions from 64% to around 90% while still achieving acceptable recall values of at least lower 60s%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This document sums up our results forthe NLP lecture at ETH in the spring
semester 2021. In this work, a BERT based neural network model (Devlin et
al.,2018) is applied to the JIGSAW dataset (Jigsaw/Conversation AI, 2019) in
order to create a model identifying hateful and toxic comments (strictly
seperated from offensive language) in online social platforms (English
language), inthis case Twitter. Three other neural network architectures and a
GPT-2 (Radfordet al., 2019) model are also applied on the provided data set in
order to compare these different models. The trained BERT model is then applied
on two different data sets to evaluate its generalisation power, namely on
another Twitter data set (Tom Davidson, 2017) (Davidsonet al., 2017) and the
data set HASOC 2019 (Thomas Mandl, 2019) (Mandl et al.,2019) which includes
Twitter and also Facebook comments; we focus on the English HASOC 2019 data. In
addition, it can be shown that by fine-tuning the trained BERT model on these
two datasets by applying different transfer learning scenarios via retraining
partial or all layers the predictive scores improve compared to simply applying
the model pre-trained on the JIGSAW data set. Withour results, we get
precisions from 64% to around 90% while still achieving acceptable recall
values of at least lower 60s%, proving that BERT is suitable for real usecases
in social platforms.
Related papers
- Context-Based Tweet Engagement Prediction [0.0]
This thesis investigates how well context alone may be used to predict tweet engagement likelihood.
We employed the Spark engine on TU Wien's Little Big Data Cluster to create scalable data preprocessing, feature engineering, feature selection, and machine learning pipelines.
We also found that factors such as the prediction algorithm, training dataset size, training dataset sampling method, and feature selection significantly affect the results.
arXiv Detail & Related papers (2023-09-28T08:36:57Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase.
We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output.
Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z) - "Medium" LMs of Code in the Era of LLMs: Lessons From StackOverflow [5.036273913335737]
We train two models: SOBertBase, with 109M parameters, and SOBertLarge with 762M parameters, at a budget of just $$187$ and $$800$ each.
Results demonstrate that pre-training both extensively and properly on in-domain data can yield a powerful and affordable alternative to leveraging closed-source general-purpose models.
arXiv Detail & Related papers (2023-06-05T21:38:30Z) - Exploring Category Structure with Contextual Language Models and Lexical
Semantic Networks [0.0]
We test a wider array of methods for probing CLMs for predicting typicality scores.
Our experiments, using BERT, show the importance of using the right type of CLM probes.
Results highlight the importance of polysemy in this task.
arXiv Detail & Related papers (2023-02-14T09:57:23Z) - BERT-based Ensemble Approaches for Hate Speech Detection [1.8734449181723825]
This paper focuses on classifying hate speech in social media using multiple deep models.
We evaluated with several ensemble techniques, including soft voting, maximum value, hard voting and stacking.
Experiments have shown good results especially the ensemble models, where stacking gave F1 score of 97% on Davidson dataset and aggregating ensembles 77% on the DHO dataset.
arXiv Detail & Related papers (2022-09-14T09:08:24Z) - HaT5: Hate Language Identification using Text-to-Text Transfer
Transformer [1.2532400738980594]
We investigate the performance of a state-of-the art (SoTA) architecture T5 across 5 different tasks from 2 relatively diverse datasets.
To improve performance, we augment the training data by using an autoregressive model.
It reveals the difficulties of poor data annotation by using a small set of examples.
arXiv Detail & Related papers (2022-02-11T15:21:27Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - CorDEL: A Contrastive Deep Learning Approach for Entity Linkage [70.82533554253335]
Entity linkage (EL) is a critical problem in data cleaning and integration.
With the ever-increasing growth of new data, deep learning (DL) based approaches have been proposed to alleviate the high cost of EL associated with the traditional models.
We argue that the twin-network architecture is sub-optimal to EL, leading to inherent drawbacks of existing models.
arXiv Detail & Related papers (2020-09-15T16:33:05Z) - DeBERTa: Decoding-enhanced BERT with Disentangled Attention [119.77305080520718]
We propose a new model architecture DeBERTa that improves the BERT and RoBERTa models using two novel techniques.
We show that these techniques significantly improve the efficiency of model pre-training and the performance of both natural language understanding (NLU) and natural langauge generation (NLG) downstream tasks.
arXiv Detail & Related papers (2020-06-05T19:54:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.