Detect All Abuse! Toward Universal Abusive Language Detection Models
- URL: http://arxiv.org/abs/2010.03776v2
- Date: Fri, 9 Oct 2020 10:29:36 GMT
- Title: Detect All Abuse! Toward Universal Abusive Language Detection Models
- Authors: Kunze Wang, Dong Lu, Soyeon Caren Han, Siqu Long, Josiah Poon
- Abstract summary: We introduce a new generic ALD framework, MACAS, which is capable of addressing several types of ALD tasks across different domains.
Our framework covers multi-aspect abusive language embeddings that represent the target and content aspects of abusive language.
Then, we propose and use the cross-attention gate flow mechanism to embrace multiple aspects of abusive language.
- Score: 5.840117063192334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online abusive language detection (ALD) has become a societal issue of
increasing importance in recent years. Several previous works in online ALD
focused on solving a single abusive language problem in a single domain, like
Twitter, and have not been successfully transferable to the general ALD task or
domain. In this paper, we introduce a new generic ALD framework, MACAS, which
is capable of addressing several types of ALD tasks across different domains.
Our generic framework covers multi-aspect abusive language embeddings that
represent the target and content aspects of abusive language and applies a
textual graph embedding that analyses the user's linguistic behaviour. Then, we
propose and use the cross-attention gate flow mechanism to embrace multiple
aspects of abusive language. Quantitative and qualitative evaluation results
show that our ALD algorithm rivals or exceeds the six state-of-the-art ALD
algorithms across seven ALD datasets covering multiple aspects of abusive
language and different online community domains.
Related papers
- LexGen: Domain-aware Multilingual Lexicon Generation [40.97738267067852]
We propose a new model to generate dictionary words for 6 Indian languages in the multi-domain setting.
Our model consists of domain-specific and domain-generic layers that encode information.
We release a new benchmark dataset across 6 Indian languages that span 8 diverse domains.
arXiv Detail & Related papers (2024-05-18T07:02:43Z) - Vicinal Risk Minimization for Few-Shot Cross-lingual Transfer in Abusive
Language Detection [19.399281609371258]
Cross-lingual transfer learning from high-resource to medium and low-resource languages has shown encouraging results.
We resort to data augmentation and continual pre-training for domain adaptation to improve cross-lingual abusive language detection.
arXiv Detail & Related papers (2023-11-03T16:51:07Z) - How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have [58.23138483086277]
In this work we leverage datasets we already have, covering a wide range of tasks related to abusive language detection.
Our goal is to build models cheaply for a new target label set and/or language, using only a few training examples of the target domain.
Our experiments show that using already existing datasets and only a few-shots of the target task the performance of models improve both monolingually and across languages.
arXiv Detail & Related papers (2023-05-23T14:04:12Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - Data Bootstrapping Approaches to Improve Low Resource Abusive Language
Detection for Indic Languages [5.51252705016179]
We demonstrate a large-scale analysis of multilingual abusive speech in Indic languages.
We examine different interlingual transfer mechanisms and observe the performance of various multilingual models for abusive speech detection.
arXiv Detail & Related papers (2022-04-26T18:56:01Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - MACRONYM: A Large-Scale Dataset for Multilingual and Multi-Domain
Acronym Extraction [66.60031336330547]
Acronyms and their expanded forms are necessary for various NLP applications.
One limitation of existing AE research is that they are limited to the English language and certain domains.
Lacking annotated datasets in multiple languages and domains has been a major issue to hinder research in this area.
arXiv Detail & Related papers (2022-02-19T23:08:38Z) - MGD-GAN: Text-to-Pedestrian generation through Multi-Grained
Discrimination [96.91091607251526]
We propose the Multi-Grained Discrimination enhanced Generative Adversarial Network, that capitalizes a human-part-based Discriminator and a self-cross-attended Discriminator.
A fine-grained word-level attention mechanism is employed in the HPD module to enforce diversified appearance and vivid details.
The substantial improvement over the various metrics demonstrates the efficacy of MGD-GAN on the text-to-pedestrian synthesis scenario.
arXiv Detail & Related papers (2020-10-02T12:24:48Z) - Aggressive Language Detection with Joint Text Normalization via
Adversarial Multi-task Learning [31.02484600391725]
Aggressive language detection (ALD) is one of the crucial applications in NLP community.
In this work, we target improving the ALD by jointly performing text normalization (TN), via an adversarial multi-task learning framework.
arXiv Detail & Related papers (2020-09-19T06:26:07Z) - Joint Modelling of Emotion and Abusive Language Detection [26.18171134454037]
We present the first joint model of emotion and abusive language detection, experimenting in a multi-task learning framework.
Our results demonstrate that incorporating affective features leads to significant improvements in abuse detection performance across datasets.
arXiv Detail & Related papers (2020-05-28T14:08:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.