Introducing an Abusive Language Classification Framework for Telegram to
Investigate the German Hater Community
- URL: http://arxiv.org/abs/2109.07346v1
- Date: Wed, 15 Sep 2021 14:58:46 GMT
- Title: Introducing an Abusive Language Classification Framework for Telegram to
Investigate the German Hater Community
- Authors: Maximilian Wich, Adrian Gorniak, Tobias Eder, Daniel Bartmann, Burak
Enes \c{C}akici, Georg Groh
- Abstract summary: We develop a framework that consists of (i) an abusive language classification model for German Telegram messages and (ii) a classification model for the hatefulness of Telegram channels.
For the channel classification model, we develop a method that combines channel specific content information coming from a topic model with a social graph to predict the hatefulness of channels.
As an additional output of the study, we release an annotated abusive language dataset containing 1,149 annotated Telegram messages.
- Score: 0.6459215652021234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since traditional social media platforms ban more and more actors that
distribute hate speech or other forms of abusive language (deplatforming),
these actors migrate to alternative platforms that do not moderate the users'
content. One known platform that is relevant for the German hater community is
Telegram, for which there have only been made limited research efforts so far.
The goal of this study is to develop a broad framework that consists of (i)
an abusive language classification model for German Telegram messages and (ii)
a classification model for the hatefulness of Telegram channels. For the first
part, we employ existing abusive language datasets containing posts from other
platforms to build our classification models. For the channel classification
model, we develop a method that combines channel specific content information
coming from a topic model with a social graph to predict the hatefulness of
channels. Furthermore, we complement these two approaches for hate speech
detection with insightful results on the evolution of the hater community on
Telegram in Germany. Moreover, we propose methods to the hate speech research
community for scalable network analyses for social media platforms. As an
additional output of the study, we release an annotated abusive language
dataset containing 1,149 annotated Telegram messages.
Related papers
- Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales [15.458557611029518]
Social media platforms are a prominent arena for users to engage in interpersonal discussions and express opinions.
There arises a need to automatically identify and flag instances of hate speech.
We propose to use state-of-the-art Large Language Models (LLMs) to extract features in the form of rationales from the input text.
arXiv Detail & Related papers (2024-03-19T03:22:35Z) - SADAS: A Dialogue Assistant System Towards Remediating Norm Violations
in Bilingual Socio-Cultural Conversations [56.31816995795216]
Socially-Aware Dialogue Assistant System (SADAS) is designed to ensure that conversations unfold with respect and understanding.
Our system's novel architecture includes: (1) identifying the categories of norms present in the dialogue, (2) detecting potential norm violations, (3) evaluating the severity of these violations, and (4) implementing targeted remedies to rectify the breaches.
arXiv Detail & Related papers (2024-01-29T08:54:21Z) - Analysis and Detection of Multilingual Hate Speech Using Transformer
Based Deep Learning [7.332311991395427]
As the prevalence of hate speech increases online, the demand for automated detection as an NLP task is increasing.
In this work, the proposed method is using transformer-based model to detect hate speech in social media, like twitter, Facebook, WhatsApp, Instagram, etc.
The Gold standard datasets were collected from renowned researcher Zeerak Talat, Sara Tonelli, Melanie Siegel, and Rezaul Karim.
The success rate of the proposed model for hate speech detection is higher than the existing baseline and state-of-the-art models with accuracy in Bengali dataset is 89%, in English: 91%, in German
arXiv Detail & Related papers (2024-01-19T20:40:23Z) - Seamless: Multilingual Expressive and Streaming Speech Translation [71.12826355107889]
We introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion.
First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4T model- SeamlessM4T v2.
We bring major components from SeamlessExpressive and SeamlessStreaming together to form Seamless, the first publicly available system that unlocks expressive cross-lingual communication in real-time.
arXiv Detail & Related papers (2023-12-08T17:18:42Z) - TGDataset: a Collection of Over One Hundred Thousand Telegram Channels [69.22187804798162]
This paper presents the TGDataset, a new dataset that includes 120,979 Telegram channels and over 400 million messages.
We analyze the languages spoken within our dataset and the topic covered by English channels.
In addition to the raw dataset, we released the scripts we used to analyze the dataset and the list of channels belonging to the network of a new conspiracy theory called Sabmyk.
arXiv Detail & Related papers (2023-03-09T15:42:38Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Role of Artificial Intelligence in Detection of Hateful Speech for
Hinglish Data on Social Media [1.8899300124593648]
Prevalence of Hindi-English code-mixed data (Hinglish) is on the rise with most of the urban population all over the world.
Hate speech detection algorithms deployed by most social networking platforms are unable to filter out offensive and abusive content posted in these code-mixed languages.
We propose a methodology for efficient detection of unstructured code-mix Hinglish language.
arXiv Detail & Related papers (2021-05-11T10:02:28Z) - Leveraging cross-platform data to improve automated hate speech
detection [0.0]
Most existing approaches for hate speech detection focus on a single social media platform in isolation.
Here we propose a new cross-platform approach to detect hate speech which leverages multiple datasets and classification models from different platforms.
We demonstrate how this approach outperforms existing models, and achieves good performance when tested on messages from novel social media platforms.
arXiv Detail & Related papers (2021-02-09T15:52:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.