Related papers: Understanding and Detecting Dangerous Speech in Social Media

Understanding and Detecting Dangerous Speech in Social Media

URL: http://arxiv.org/abs/2005.06608v1
Date: Mon, 4 May 2020 09:42:09 GMT
Title: Understanding and Detecting Dangerous Speech in Social Media
Authors: Ali Alshehri, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed
Abstract summary: Dangerous language such as physical threats in online environments is a somewhat rare, yet remains highly important. We build a labeled dataset for dangerous speech and develop highly effective models to detect dangerous content. Our best model performs at 59.60% macro F1, significantly outperforming a competitive baseline.
Score: 9.904746542801837
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Social media communication has become a significant part of daily activity in modern societies. For this reason, ensuring safety in social media platforms is a necessity. Use of dangerous language such as physical threats in online environments is a somewhat rare, yet remains highly important. Although several works have been performed on the related issue of detecting offensive and hateful language, dangerous speech has not previously been treated in any significant way. Motivated by these observations, we report our efforts to build a labeled dataset for dangerous speech. We also exploit our dataset to develop highly effective models to detect dangerous content. Our best model performs at 59.60% macro F1, significantly outperforming a competitive baseline.

Related papers

ToxicTAGS: Decoding Toxic Memes with Rich Tag Annotations [3.708799808977489]
We introduce a first-of-its-kind dataset of 6,300 real-world meme-based posts annotated in two stages: (i) binary classification into toxic and normal, and (ii) fine-grained labelling of toxic memes as hateful, dangerous, or offensive.<n>A key feature of this dataset is that it is enriched with auxiliary metadata of socially relevant tags, enhancing the context of each meme.
arXiv Detail & Related papers (2025-08-06T07:46:14Z)
SWE2: SubWord Enriched and Significant Word Emphasized Framework for Hate Speech Detection [3.0460060805145517]
We propose a novel hate speech detection framework called SWE2, which only relies on the content of messages and automatically identifies hate speech. Experimental results show that our proposed model achieves 0.975 accuracy and 0.953 macro F1, outperforming 7 state-of-the-art baselines.
arXiv Detail & Related papers (2024-09-25T07:05:44Z)
Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights [50.89022445197919]
We propose a speech-specific risk taxonomy, covering 8 risk categories under hostility (malicious sarcasm and threats), malicious imitation (age, gender, ethnicity), and stereotypical biases (age, gender, ethnicity) Based on the taxonomy, we create a small-scale dataset for evaluating current LMMs capability in detecting these categories of risk.
arXiv Detail & Related papers (2024-06-25T10:08:45Z)
MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection [2.433983268807517]
Hate speech poses significant social, psychological, and occasionally physical threats to targeted individuals and communities. Current computational linguistic approaches for tackling this phenomenon rely on labelled social media datasets for training. We scrutinized over 60 datasets, selectively integrating those pertinent into MetaHate. Our findings contribute to a deeper understanding of the existing datasets, paving the way for training more robust and adaptable models.
arXiv Detail & Related papers (2024-01-12T11:54:53Z)
Developing Linguistic Patterns to Mitigate Inherent Human Bias in Offensive Language Detection [1.6574413179773761]
We propose a linguistic data augmentation approach to reduce bias in labeling processes. This approach has the potential to improve offensive language classification tasks across multiple languages.
arXiv Detail & Related papers (2023-12-04T10:20:36Z)
Countering Misinformation via Emotional Response Generation [15.383062216223971]
proliferation of misinformation on social media platforms (SMPs) poses a significant danger to public health, social cohesion and democracy. Previous research has shown how social correction can be an effective way to curb misinformation. We present VerMouth, the first large-scale dataset comprising roughly 12 thousand claim-response pairs.
arXiv Detail & Related papers (2023-11-17T15:37:18Z)
Factuality Challenges in the Era of Large Language Models [113.3282633305118]
Large Language Models (LLMs) generate false, erroneous, or misleading content. LLMs can be exploited for malicious applications. This poses a significant challenge to society in terms of the potential deception of users.
arXiv Detail & Related papers (2023-10-08T14:55:02Z)
ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information. To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles. Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z)
Countering Malicious Content Moderation Evasion in Online Social Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems. This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z)
Certifiably Robust Policy Learning against Adversarial Communication in Multi-agent Systems [51.6210785955659]
Communication is important in many multi-agent reinforcement learning (MARL) problems for agents to share information and make good decisions. However, when deploying trained communicative agents in a real-world application where noise and potential attackers exist, the safety of communication-based policies becomes a severe issue that is underexplored. In this work, we consider an environment with $N$ agents, where the attacker may arbitrarily change the communication from any $CfracN-12$ agents to a victim agent.
arXiv Detail & Related papers (2022-06-21T07:32:18Z)
Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions. To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles. In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z)
ETHOS: an Online Hate Speech Detection Dataset [6.59720246184989]
We present 'ETHOS', a textual dataset with two variants: binary and multi-label, based on YouTube and Reddit comments validated using the Figure-Eight crowdsourcing platform. Our key assumption is that, even gaining a small amount of labelled data from such a time-consuming process, we can guarantee hate speech occurrences in the examined material.
arXiv Detail & Related papers (2020-06-11T08:59:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.