Annotating for Hate Speech: The MaNeCo Corpus and Some Input from
Critical Discourse Analysis
- URL: http://arxiv.org/abs/2008.06222v1
- Date: Fri, 14 Aug 2020 07:39:21 GMT
- Title: Annotating for Hate Speech: The MaNeCo Corpus and Some Input from
Critical Discourse Analysis
- Authors: Stavros Assimakopoulos, Rebecca Vella Muskat, Lonneke van der Plas,
Albert Gatt
- Abstract summary: This paper presents a novel scheme for the annotation of hate speech in corpora of Web 2.0 commentary.
It is motivated by the critical analysis of posts made in reaction to news reports on the Mediterranean migration crisis and LGBTIQ+ matters in Malta.
We suggest a multi-layer annotation scheme, which is pilot-tested against a binary +/- hate speech classification and appears to yield higher inter-annotator agreement.
- Score: 3.3008315224941978
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a novel scheme for the annotation of hate speech in
corpora of Web 2.0 commentary. The proposed scheme is motivated by the critical
analysis of posts made in reaction to news reports on the Mediterranean
migration crisis and LGBTIQ+ matters in Malta, which was conducted under the
auspices of the EU-funded C.O.N.T.A.C.T. project. Based on the realization that
hate speech is not a clear-cut category to begin with, appears to belong to a
continuum of discriminatory discourse and is often realized through the use of
indirect linguistic means, it is argued that annotation schemes for its
detection should refrain from directly including the label 'hate speech,' as
different annotators might have different thresholds as to what constitutes
hate speech and what not. In view of this, we suggest a multi-layer annotation
scheme, which is pilot-tested against a binary +/- hate speech classification
and appears to yield higher inter-annotator agreement. Motivating the
postulation of our scheme, we then present the MaNeCo corpus on which it will
eventually be used; a substantial corpus of on-line newspaper comments spanning
10 years.
Related papers
- Demarked: A Strategy for Enhanced Abusive Speech Moderation through Counterspeech, Detoxification, and Message Management [71.99446449877038]
We propose a more comprehensive approach called Demarcation scoring abusive speech based on four aspect -- (i) severity scale; (ii) presence of a target; (iii) context scale; (iv) legal scale.
Our work aims to inform future strategies for effectively addressing abusive speech online.
arXiv Detail & Related papers (2024-06-27T21:45:33Z) - Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales [15.458557611029518]
Social media platforms are a prominent arena for users to engage in interpersonal discussions and express opinions.
There arises a need to automatically identify and flag instances of hate speech.
We propose to use state-of-the-art Large Language Models (LLMs) to extract features in the form of rationales from the input text.
arXiv Detail & Related papers (2024-03-19T03:22:35Z) - Hate Speech Detection via Dual Contrastive Learning [25.878271501274245]
We propose a novel dual contrastive learning framework for hate speech detection.
Our framework jointly optimize the self-supervised and the supervised contrastive learning loss for capturing span-level information.
We conduct experiments on two publicly available English datasets, and experimental results show that the proposed model outperforms the state-of-the-art models.
arXiv Detail & Related papers (2023-07-10T13:23:36Z) - DisfluencyFixer: A tool to enhance Language Learning through Speech To
Speech Disfluency Correction [50.51901599433536]
DisfluencyFixer is a tool that performs speech-to-speech disfluency correction in English and Hindi.
Our proposed system removes disfluencies from input speech and returns fluent speech as output.
arXiv Detail & Related papers (2023-05-26T14:13:38Z) - CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a
Context Synergized Hyperbolic Network [52.85130555886915]
CoSyn is a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations.
We show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
arXiv Detail & Related papers (2023-03-02T17:30:43Z) - Improved two-stage hate speech classification for twitter based on Deep
Neural Networks [0.0]
Hate speech is a form of online harassment that involves the use of abusive language.
The model we propose in this work is an extension of an existing approach based on LSTM neural network architectures.
Our study includes a performance comparison of several proposed alternative methods for the second stage evaluated on a public corpus of 16k tweets.
arXiv Detail & Related papers (2022-06-08T20:57:41Z) - Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of
Hate Online [18.973398187389083]
We present the M-Phasis corpus, a corpus of 9k German and French user comments collected from migration-related news articles.
It goes beyond the "hate"-"neutral" dichotomy and is instead annotated with 23 features, which in combination become descriptors of various types of speech.
arXiv Detail & Related papers (2022-04-28T10:36:49Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Textless Speech Emotion Conversion using Decomposed and Discrete
Representations [49.55101900501656]
We decompose speech into discrete and disentangled learned representations, consisting of content units, F0, speaker, and emotion.
First, we modify the speech content by translating the content units to a target emotion, and then predict the prosodic features based on these units.
Finally, the speech waveform is generated by feeding the predicted representations into a neural vocoder.
arXiv Detail & Related papers (2021-11-14T18:16:42Z) - Latent Hatred: A Benchmark for Understanding Implicit Hate Speech [22.420275418616242]
This work introduces a theoretically-justified taxonomy of implicit hate speech and a benchmark corpus with fine-grained labels for each message.
We present systematic analyses of our dataset using contemporary baselines to detect and explain implicit hate speech.
arXiv Detail & Related papers (2021-09-11T16:52:56Z) - VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised
Speech Representation Disentanglement for One-shot Voice Conversion [54.29557210925752]
One-shot voice conversion can be effectively achieved by speech representation disentanglement.
We employ vector quantization (VQ) for content encoding and introduce mutual information (MI) as the correlation metric during training.
Experimental results reflect the superiority of the proposed method in learning effective disentangled speech representations.
arXiv Detail & Related papers (2021-06-18T13:50:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.