Related papers: Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech Using BERToxic

Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech Using BERToxic

URL: http://arxiv.org/abs/2104.03506v1
Date: Thu, 8 Apr 2021 04:46:14 GMT
Title: Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech Using BERToxic
Authors: Yakoob Khan, Weicheng Ma, Soroush Vosoughi
Abstract summary: This paper describes our approach to the Toxic Spans Detection problem. We propose BERToxic, a system that fine-tunes a pre-trained BERT model to locate toxic text spans in a given text. Our system significantly outperformed the provided baseline and achieved an F1-score of 0.683, placing Lone Pine in the 17th place out of 91 teams in the competition.
Score: 2.4815579733050153
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper describes our approach to the Toxic Spans Detection problem (SemEval-2021 Task 5). We propose BERToxic, a system that fine-tunes a pre-trained BERT model to locate toxic text spans in a given text and utilizes additional post-processing steps to refine the boundaries. The post-processing steps involve (1) labeling character offsets between consecutive toxic tokens as toxic and (2) assigning a toxic label to words that have at least one token labeled as toxic. Through experiments, we show that these two post-processing steps improve the performance of our model by 4.16% on the test set. We also studied the effects of data augmentation and ensemble modeling strategies on our system. Our system significantly outperformed the provided baseline and achieved an F1-score of 0.683, placing Lone Pine in the 17th place out of 91 teams in the competition. Our code is made available at https://github.com/Yakoob-Khan/Toxic-Spans-Detection

Related papers

Adaptive Detoxification: Safeguarding General Capabilities of LLMs through Toxicity-Aware Knowledge Editing [49.85884082568318]
ToxEdit is a toxicity-aware knowledge editing approach.<n>It dynamically detects toxic activation patterns during forward propagation.<n>It then routes computations through adaptive inter-layer pathways to mitigate toxicity effectively.
arXiv Detail & Related papers (2025-05-28T12:37:06Z)
Multilingual and Explainable Text Detoxification with Parallel Corpora [58.83211571400692]
We extend parallel text detoxification corpus to new languages. We conduct the first of its kind an automated, explainable analysis of the descriptive features of both toxic and non-toxic sentences. We then experiment with a novel text detoxification method inspired by the Chain-of-Thoughts reasoning approach.
arXiv Detail & Related papers (2024-12-16T12:08:59Z)
Toxic Subword Pruning for Dialogue Response Generation on Large Language Models [51.713448010799986]
We propose textbfToxic Subword textbfPruning (ToxPrune) to prune the subword contained by the toxic words from BPE in trained LLMs. ToxPrune simultaneously improves the toxic language model NSFW-3B on the task of dialogue response generation obviously.
arXiv Detail & Related papers (2024-10-05T13:30:33Z)
Unveiling the Implicit Toxicity in Large Language Models [77.90933074675543]
The open-endedness of large language models (LLMs) combined with their impressive capabilities may lead to new safety issues when being exploited for malicious use. We show that LLMs can generate diverse implicit toxic outputs that are exceptionally difficult to detect via simply zero-shot prompting. We propose a reinforcement learning (RL) based attacking method to further induce the implicit toxicity in LLMs.
arXiv Detail & Related papers (2023-11-29T06:42:36Z)
Comprehensive Assessment of Toxicity in ChatGPT [49.71090497696024]
We evaluate the toxicity in ChatGPT by utilizing instruction-tuning datasets. prompts in creative writing tasks can be 2x more likely to elicit toxic responses. Certain deliberately toxic prompts, designed in earlier studies, no longer yield harmful responses.
arXiv Detail & Related papers (2023-11-03T14:37:53Z)
ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments [4.949881799107062]
ToxiSpanSE is the first tool to detect toxic spans in the Software Engineering (SE) domain. Our model achieved the best score with 0.88 $F1$, 0.87 precision, and 0.93 recall for toxic class tokens.
arXiv Detail & Related papers (2023-07-07T04:55:11Z)
Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts [57.38912708076231]
We introduce MaRCo, a detoxification algorithm that combines controllable generation and text rewriting methods. MaRCo uses likelihoods under a non-toxic LM and a toxic LM to find candidate words to mask and potentially replace. We evaluate our method on several subtle toxicity and microaggressions datasets, and show that it not only outperforms baselines on automatic metrics, but MaRCo's rewrites are preferred 2.1 $times$ more in human evaluation.
arXiv Detail & Related papers (2022-12-20T18:50:00Z)
Cisco at SemEval-2021 Task 5: What's Toxic?: Leveraging Transformers for Multiple Toxic Span Extraction from Online Comments [1.332560004325655]
This paper describes the system proposed by team Cisco for SemEval-2021 Task 5: Toxic Spans Detection. We approach this problem primarily in two ways: a sequence tagging approach and a dependency parsing approach. Our best performing architecture in this approach also proved to be our best performing architecture overall with an F1 score of 0.6922.
arXiv Detail & Related papers (2021-05-28T16:27:49Z)
UoT-UWF-PartAI at SemEval-2021 Task 5: Self Attention Based Bi-GRU with Multi-Embedding Representation for Toxicity Highlighter [3.0586855806896045]
We propose a self-attention-based gated recurrent unit with a multi-embedding representation of the tokens. Experimental results show that our proposed approach is very effective in detecting span tokens.
arXiv Detail & Related papers (2021-04-27T13:18:28Z)
UIT-ISE-NLP at SemEval-2021 Task 5: Toxic Spans Detection with BiLSTM-CRF and Toxic Bert Comment Classification [0.0]
This task aims to build a model for identifying toxic words in a whole posts. We use the BiLSTM-CRF model combining with Toxic Bert Classification to train the detection model. Our model achieved 62.23% by F1-score on the Toxic Spans Detection task.
arXiv Detail & Related papers (2021-04-20T16:32:56Z)
UIT-E10dot3 at SemEval-2021 Task 5: Toxic Spans Detection with Named Entity Recognition and Question-Answering Approaches [0.32228025627337864]
This task asks competitors to extract spans that have toxicity from the given texts, and we have done several analyses to understand its structure before doing experiments. We solve this task by two approaches, named entity recognition with spaCy library and Question-Answering with RoBERTa combining with ToxicBERT, and the former gains the highest F1-score of 66.99%.
arXiv Detail & Related papers (2021-04-15T11:07:56Z)
Challenges in Automated Debiasing for Toxic Language Detection [81.04406231100323]
Biased associations have been a challenge in the development of classifiers for detecting toxic language. We investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection. Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English)
arXiv Detail & Related papers (2021-01-29T22:03:17Z)
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models [93.151822563361]
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment. We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.
arXiv Detail & Related papers (2020-09-24T03:17:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.