HateCheck: Functional Tests for Hate Speech Detection Models
- URL: http://arxiv.org/abs/2012.15606v1
- Date: Thu, 31 Dec 2020 13:44:56 GMT
- Title: HateCheck: Functional Tests for Hate Speech Detection Models
- Authors: Paul R\"ottger, Bertram Vidgen, Dong Nguyen, Zeerak Waseem, Helen
Margetts, Janet Pierrehumbert
- Abstract summary: We introduce HateCheck, a first suite of functional tests for hate speech detection models.
We specify 29 model functionalities, the selection of which we motivate by reviewing previous research.
We test near-state-of-the-art transformer detection models as well as a popular commercial model, revealing critical model weaknesses.
- Score: 3.4938484663205776
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting online hate is a difficult task that even state-of-the-art models
struggle with. In previous research, hate speech detection models are typically
evaluated by measuring their performance on held-out test data using metrics
such as accuracy and F1 score. However, this approach makes it difficult to
identify specific model weak points. It also risks overestimating generalisable
model quality due to increasingly well-evidenced systematic gaps and biases in
hate speech datasets. To enable more targeted diagnostic insights, we introduce
HateCheck, a first suite of functional tests for hate speech detection models.
We specify 29 model functionalities, the selection of which we motivate by
reviewing previous research and through a series of interviews with civil
society stakeholders. We craft test cases for each functionality and validate
data quality through a structured annotation process. To illustrate HateCheck's
utility, we test near-state-of-the-art transformer detection models as well as
a popular commercial model, revealing critical model weaknesses.
Related papers
- Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models [49.06068319380296]
We introduce context-aware testing (CAT) which uses context as an inductive bias to guide the search for meaningful model failures.
We instantiate the first CAT system, SMART Testing, which employs large language models to hypothesize relevant and likely failures.
arXiv Detail & Related papers (2024-10-31T15:06:16Z) - GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection? [50.53312866647302]
HateCheck is a suite for testing fine-grained model functionalities on synthesized data.
We propose GPT-HateCheck, a framework to generate more diverse and realistic functional tests from scratch.
Crowd-sourced annotation demonstrates that the generated test cases are of high quality.
arXiv Detail & Related papers (2024-02-23T10:02:01Z) - A Comprehensive Evaluation and Analysis Study for Chinese Spelling Check [53.152011258252315]
We show that using phonetic and graphic information reasonably is effective for Chinese Spelling Check.
Models are sensitive to the error distribution of the test set, which reflects the shortcomings of models.
The commonly used benchmark, SIGHAN, can not reliably evaluate models' performance.
arXiv Detail & Related papers (2023-07-25T17:02:38Z) - Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate
Speech Detection [4.809236881780707]
Large language models like ChatGPT have recently shown a great promise in performing several tasks, including hate speech detection.
This study aims to evaluate the strengths and weaknesses of the ChatGPT model in detecting hate speech at a granular level across 11 languages.
arXiv Detail & Related papers (2023-05-22T17:36:58Z) - Discover, Explanation, Improvement: An Automatic Slice Detection
Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints.
This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks.
Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z) - Multilingual HateCheck: Functional Tests for Multilingual Hate Speech
Detection Models [14.128029444990895]
We introduce HateCheck (MHC), a suite of functional tests for multilingual hate speech detection models.
MHC covers 34 functionalities across ten languages, which is more languages than any other hate speech dataset.
We train and test a high-performing multilingual hate speech detection model, and reveal critical model weaknesses for monolingual and cross-lingual applications.
arXiv Detail & Related papers (2022-06-20T17:54:39Z) - HateCheckHIn: Evaluating Hindi Hate Speech Detection Models [6.52974752091861]
multilingual hate is a major emerging challenge for automated detection.
We introduce a set of functionalities for the purpose of evaluation.
Considering Hindi as a base language, we craft test cases for each functionality.
arXiv Detail & Related papers (2022-04-30T19:09:09Z) - Checking HateCheck: a cross-functional analysis of behaviour-aware
learning for hate speech detection [4.0810783261728565]
We investigate fine-tuning schemes using HateCheck, a suite of functional tests for hate speech detection systems.
We train and evaluate models on different configurations of HateCheck by holding out categories of test cases.
The fine-tuning procedure led to improvements in the classification accuracy of held-out functionalities and identity groups.
However, performance on held-out functionality classes and i.i.d. hate speech detection data decreased, which indicates that generalisation occurs mostly across functionalities from the same class.
arXiv Detail & Related papers (2022-04-08T13:03:01Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.