Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of
Hate Online
- URL: http://arxiv.org/abs/2204.13400v1
- Date: Thu, 28 Apr 2022 10:36:49 GMT
- Title: Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of
Hate Online
- Authors: Dana Ruiter, Liane Reiners, Ashwin Geet D'Sa, Thomas Kleinbauer,
Dominique Fohr, Irina Illina, Dietrich Klakow, Christian Schemer, Angeliki
Monnier
- Abstract summary: We present the M-Phasis corpus, a corpus of 9k German and French user comments collected from migration-related news articles.
It goes beyond the "hate"-"neutral" dichotomy and is instead annotated with 23 features, which in combination become descriptors of various types of speech.
- Score: 18.973398187389083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Even though hate speech (HS) online has been an important object of research
in the last decade, most HS-related corpora over-simplify the phenomenon of
hate by attempting to label user comments as "hate" or "neutral". This ignores
the complex and subjective nature of HS, which limits the real-life
applicability of classifiers trained on these corpora. In this study, we
present the M-Phasis corpus, a corpus of ~9k German and French user comments
collected from migration-related news articles. It goes beyond the
"hate"-"neutral" dichotomy and is instead annotated with 23 features, which in
combination become descriptors of various types of speech, ranging from
critical comments to implicit and explicit expressions of hate. The annotations
are performed by 4 native speakers per language and achieve high (0.77 <= k <=
1) inter-annotator agreements. Besides describing the corpus creation and
presenting insights from a content, error and domain analysis, we explore its
data characteristics by training several classification baselines.
Related papers
- Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales [15.458557611029518]
Social media platforms are a prominent arena for users to engage in interpersonal discussions and express opinions.
There arises a need to automatically identify and flag instances of hate speech.
We propose to use state-of-the-art Large Language Models (LLMs) to extract features in the form of rationales from the input text.
arXiv Detail & Related papers (2024-03-19T03:22:35Z) - HCDIR: End-to-end Hate Context Detection, and Intensity Reduction model
for online comments [2.162419921663162]
We propose a novel end-to-end model, HCDIR, for Hate Context Detection, and Hate Intensity Reduction in social media posts.
We fine-tuned several pre-trained language models to detect hateful comments to ascertain the best-performing hateful comments detection model.
arXiv Detail & Related papers (2023-12-20T17:05:46Z) - Understanding writing style in social media with a supervised
contrastively pre-trained transformer [57.48690310135374]
Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation.
We introduce the Style Transformer for Authorship Representations (STAR), trained on a large corpus derived from public sources of 4.5 x 106 authored texts.
Using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy.
arXiv Detail & Related papers (2023-10-17T09:01:17Z) - Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis [44.17106903728264]
Most hate speech datasets neglect the cultural diversity within a single language.
To address this, we introduce CREHate, a CRoss-cultural English Hate speech dataset.
Only 56.2% of the posts in CREHate achieve consensus among all countries, with the highest pairwise label difference rate of 26%.
arXiv Detail & Related papers (2023-08-31T13:14:47Z) - CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a
Context Synergized Hyperbolic Network [52.85130555886915]
CoSyn is a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations.
We show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
arXiv Detail & Related papers (2023-03-02T17:30:43Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Fine-Grained Opinion Summarization with Minimal Supervision [48.43506393052212]
FineSum aims to profile a target by extracting opinions from multiple documents.
FineSum automatically identifies opinion phrases from the raw corpus, classifies them into different aspects and sentiments, and constructs multiple fine-grained opinion clusters under each aspect/sentiment.
Both automatic evaluation on the benchmark and quantitative human evaluation validate the effectiveness of our approach.
arXiv Detail & Related papers (2021-10-17T15:16:34Z) - Latent Hatred: A Benchmark for Understanding Implicit Hate Speech [22.420275418616242]
This work introduces a theoretically-justified taxonomy of implicit hate speech and a benchmark corpus with fine-grained labels for each message.
We present systematic analyses of our dataset using contemporary baselines to detect and explain implicit hate speech.
arXiv Detail & Related papers (2021-09-11T16:52:56Z) - What's in the Box? An Analysis of Undesirable Content in the Common
Crawl Corpus [77.34726150561087]
We analyze the Common Crawl, a colossal web corpus extensively used for training language models.
We find that it contains a significant amount of undesirable content, including hate speech and sexually explicit content, even after filtering procedures.
arXiv Detail & Related papers (2021-05-06T14:49:43Z) - Leveraging Multilingual Transformers for Hate Speech Detection [11.306581296760864]
We leverage state of the art Transformer language models to identify hate speech in a multilingual setting.
With a pre-trained multilingual Transformer-based text encoder at the base, we are able to successfully identify and classify hate speech from multiple languages.
arXiv Detail & Related papers (2021-01-08T20:23:50Z) - Annotating for Hate Speech: The MaNeCo Corpus and Some Input from
Critical Discourse Analysis [3.3008315224941978]
This paper presents a novel scheme for the annotation of hate speech in corpora of Web 2.0 commentary.
It is motivated by the critical analysis of posts made in reaction to news reports on the Mediterranean migration crisis and LGBTIQ+ matters in Malta.
We suggest a multi-layer annotation scheme, which is pilot-tested against a binary +/- hate speech classification and appears to yield higher inter-annotator agreement.
arXiv Detail & Related papers (2020-08-14T07:39:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.