Contextual Lexicon-Based Approach for Hate Speech and Offensive Language
Detection
- URL: http://arxiv.org/abs/2104.12265v1
- Date: Sun, 25 Apr 2021 21:34:51 GMT
- Title: Contextual Lexicon-Based Approach for Hate Speech and Offensive Language
Detection
- Authors: Francielle Alves Vargas, Fabiana Rodrigues de G\'oes, Isabelle
Carvalho, Fabr\'icio Benevenuto, Thiago Alexandre Salgueiro Pardo
- Abstract summary: This paper presents a new approach for offensive language and hate speech detection on social media.
Our approach incorporates an offensive lexicon composed by implicit and explicit offensive and swearing expressions annotated with binary classes.
Due to the severity of the hate speech and offensive comments in Brazil and the lack of research in Portuguese, Brazilian Portuguese is the language used to validate our method.
- Score: 1.1744028458220426
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a new approach for offensive language and hate speech
detection on social media. Our approach incorporates an offensive lexicon
composed by implicit and explicit offensive and swearing expressions annotated
with binary classes: context-dependent offensive and context-independent
offensive. Due to the severity of the hate speech and offensive comments in
Brazil and the lack of research in Portuguese, Brazilian Portuguese is the
language used to validate our method. However, the proposal may be applied to
any other language or domain. Based on the obtained results, the proposed
approach showed high performance results overcoming the current baselines for
European and Brazilian Portuguese.
Related papers
- Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales [15.458557611029518]
Social media platforms are a prominent arena for users to engage in interpersonal discussions and express opinions.
There arises a need to automatically identify and flag instances of hate speech.
We propose to use state-of-the-art Large Language Models (LLMs) to extract features in the form of rationales from the input text.
arXiv Detail & Related papers (2024-03-19T03:22:35Z) - TuPy-E: detecting hate speech in Brazilian Portuguese social media with
a novel dataset and comprehensive analysis of models [0.0]
TuPy-E is the largest annotated Portuguese corpus for hate speech detection.
We conduct a detailed analysis using advanced techniques like BERT models.
arXiv Detail & Related papers (2023-12-29T17:47:00Z) - A Corpus for Sentence-level Subjectivity Detection on English News Articles [49.49218203204942]
We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics.
Our corpus paves the way for subjectivity detection in English without relying on language-specific tools, such as lexicons or machine translation.
arXiv Detail & Related papers (2023-05-29T11:54:50Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - COLD: A Benchmark for Chinese Offensive Language Detection [54.60909500459201]
We use COLDataset, a Chinese offensive language dataset with 37k annotated sentences.
We also propose textscCOLDetector to study output offensiveness of popular Chinese language models.
Our resources and analyses are intended to help detoxify the Chinese online communities and evaluate the safety performance of generative language models.
arXiv Detail & Related papers (2022-01-16T11:47:23Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Cross-lingual Capsule Network for Hate Speech Detection in Social Media [6.531659195805749]
We investigate the cross-lingual hate speech detection task, tackling the problem by adapting the hate speech resources from one language to another.
We propose a cross-lingual capsule network learning model coupled with extra domain-specific lexical semantics for hate speech.
Our model achieves state-of-the-art performance on benchmark datasets from AMI@Evalita 2018 and AMI@Ibereval 2018.
arXiv Detail & Related papers (2021-08-06T12:53:41Z) - Identifying Offensive Expressions of Opinion in Context [0.0]
It is still a challenge to subjective information extraction systems to identify opinions and feelings in context.
In sentiment-based NLP tasks, there are few resources to information extraction, above all offensive or hateful opinions in context.
This paper provides a new cross-lingual and contextual offensive lexicon, which consists of explicit and implicit offensive and swearing expressions of opinion.
arXiv Detail & Related papers (2021-04-25T18:35:39Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Leveraging Multilingual Transformers for Hate Speech Detection [11.306581296760864]
We leverage state of the art Transformer language models to identify hate speech in a multilingual setting.
With a pre-trained multilingual Transformer-based text encoder at the base, we are able to successfully identify and classify hate speech from multiple languages.
arXiv Detail & Related papers (2021-01-08T20:23:50Z) - On Negative Interference in Multilingual Models: Findings and A
Meta-Learning Treatment [59.995385574274785]
We show that, contrary to previous belief, negative interference also impacts low-resource languages.
We present a meta-learning algorithm that obtains better cross-lingual transferability and alleviates negative interference.
arXiv Detail & Related papers (2020-10-06T20:48:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.