Related papers: WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for Detecting Toxic Spans

WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for Detecting Toxic Spans

URL: http://arxiv.org/abs/2104.04630v1
Date: Fri, 9 Apr 2021 22:52:26 GMT
Title: WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for Detecting Toxic Spans
Authors: Tharindu Ranasinghe, Diptanu Sarkar, Marcos Zampieri, Alex Ororbia
Abstract summary: In recent years, the widespread use of social media has led to an increase in the generation of toxic and offensive content on online platforms. Social media platforms have worked on developing automatic detection methods and employing human moderators to cope with this deluge of offensive content.
Score: 2.4737119633827174
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, the widespread use of social media has led to an increase in the generation of toxic and offensive content on online platforms. In response, social media platforms have worked on developing automatic detection methods and employing human moderators to cope with this deluge of offensive content. While various state-of-the-art statistical models have been applied to detect toxic posts, there are only a few studies that focus on detecting the words or expressions that make a post offensive. This motivates the organization of the SemEval-2021 Task 5: Toxic Spans Detection competition, which has provided participants with a dataset containing toxic spans annotation in English posts. In this paper, we present the WLV-RIT entry for the SemEval-2021 Task 5. Our best performing neural transformer model achieves an $0.68$ F1-Score. Furthermore, we develop an open-source framework for multilingual detection of offensive spans, i.e., MUDES, based on transformers that detect toxic spans in texts.

Related papers

ModelCitizens: Representing Community Voices in Online Safety [25.030166169972418]
We introduce MODELCITIZENS, a dataset of 6.8K social media posts and 40K toxicity annotations across diverse identity groups.<n>Our findings highlight the importance of community-informed annotation and modeling for inclusive content moderation.
arXiv Detail & Related papers (2025-07-07T20:15:18Z)
Evolving Hate Speech Online: An Adaptive Framework for Detection and Mitigation [18.459726677931023]
We present an adaptive approach that uses word embeddings to update lexicons and develop a hybrid model that adjusts to emerging slurs and new linguistic patterns. Our hybrid model, which combines BERT with lexicon-based techniques, achieves an accuracy of 95% for most state-of-the-art datasets.
arXiv Detail & Related papers (2025-02-15T22:46:50Z)
ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments [4.949881799107062]
ToxiSpanSE is the first tool to detect toxic spans in the Software Engineering (SE) domain. Our model achieved the best score with 0.88 $F1$, 0.87 precision, and 0.93 recall for toxic class tokens.
arXiv Detail & Related papers (2023-07-07T04:55:11Z)
Critical Perspectives: A Benchmark Revealing Pitfalls in PerspectiveAPI [0.0]
We focus on PERSPECTIVE from Jigsaw, a tool that promises to score the "toxicity" of text. We propose a new benchmark, Selected Adversarial Semantics, or SASS. We find that PERSPECTIVE exhibits troubling shortcomings across a number of our toxicity categories.
arXiv Detail & Related papers (2023-01-05T02:12:47Z)
Countering Malicious Content Moderation Evasion in Online Social Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems. This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z)
Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation [65.48908724440047]
We propose a method called emphreverse generation to construct adversarial contexts conditioned on a given response. We test three popular pretrained dialogue models (Blender, DialoGPT, and Plato2) and find that BAD+ can largely expose their safety problems.
arXiv Detail & Related papers (2022-12-04T12:23:41Z)
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection [33.715318646717385]
ToxiGen is a large-scale dataset of 274k toxic and benign statements about 13 minority groups. Controlling machine generation in this way allows ToxiGen to cover implicitly toxic text at a larger scale. We find that 94.5% of toxic examples are labeled as hate speech by human annotators.
arXiv Detail & Related papers (2022-03-17T17:57:56Z)
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw. At the heart of the approach is a single multilingual token-free Charformer model. We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z)
COLD: A Benchmark for Chinese Offensive Language Detection [54.60909500459201]
We use COLDataset, a Chinese offensive language dataset with 37k annotated sentences. We also propose textscCOLDetector to study output offensiveness of popular Chinese language models. Our resources and analyses are intended to help detoxify the Chinese online communities and evaluate the safety performance of generative language models.
arXiv Detail & Related papers (2022-01-16T11:47:23Z)
Cisco at SemEval-2021 Task 5: What's Toxic?: Leveraging Transformers for Multiple Toxic Span Extraction from Online Comments [1.332560004325655]
This paper describes the system proposed by team Cisco for SemEval-2021 Task 5: Toxic Spans Detection. We approach this problem primarily in two ways: a sequence tagging approach and a dependency parsing approach. Our best performing architecture in this approach also proved to be our best performing architecture overall with an F1 score of 0.6922.
arXiv Detail & Related papers (2021-05-28T16:27:49Z)
UPB at SemEval-2021 Task 5: Virtual Adversarial Training for Toxic Spans Detection [0.7197592390105455]
Semeval-2021, Task 5 - Toxic Spans Detection is based on a novel annotation of a subset of the Jigsaw Unintended Bias dataset. For this task, participants had to automatically detect character spans in short comments that render the message as toxic. Our model considers applying Virtual Adversarial Training in a semi-supervised setting during the fine-tuning process of several Transformer-based models.
arXiv Detail & Related papers (2021-04-17T19:42:12Z)
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models [93.151822563361]
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment. We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.
arXiv Detail & Related papers (2020-09-24T03:17:19Z)
Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models. Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.